as a PDF

Bounding Error Masking in Linear
Output Space Compression Schemes
Steen Tarnick
Max Planck Society
Fault-Tolerant Computing Group
at the University of Potsdam, Germany
Abstract
Linear output space compression (OSC), or group parity prediction, of a circuit
under check (CUC) is a well known method for concurrent error detection. Typically, errors caused by a circuit fault will be masked with a certain probability
which depends on the choice of the compression function of the OSC scheme. Based
on the principle of linear OSC we present a design method for concurrent checkers
such that the masking probability of errors caused by faults of a given set of circuit
faults is below a given bound, while keeping the space compression ratio, dened
as the ratio of the number of circuit outputs to the number of outputs of the space
compressor, as high as possible. Experiments performed on the ISCAS-85 benchmark circuits show that the compression ratios achieved with compression functions
computed with this method can be very high, even for very low bounds for the error
masking probability and large fault sets.
Index Terms: Combinatorial optimization, concurrent error detection, error mask-
ing, group parity prediction checker, linear codes, linear output space compression.
1 Introduction
Integrated circuits are designed to be totally self-checking [1] or strongly fault-secure [16]
with respect to a set of target faults in order to achieve the totally self-checking (TSC)
goal [16], i.e. the rst error caused by a fault of the respective fault set will be detected.
Partially self-checking circuits [19] achieve this goal only in part, but can be employed
in less critical applications where they are an alternative to the more expensive totally
self-checking circuits.
Output space compression (OSC) [4, 5, 13, 17] is a simple method to design circuits that
are partially self-checking. The basic principle of OSC is shown in Fig. 1. A space
compressor SC maps the n-dimensional output space of the circuit under check (CUC) to
a k-dimensional space, k < n. The predictor P is designed to generate the same output
responses as the compressor, or the bitwise complementary responses. The two responses
of the compressor and the predictor are then compared by a comparator C or a two-rail
checker. If the two responses are dierent an error is indicated by the comparator. CUC
and predictor together form a new circuit CUC', the outputs of which are now encoded
as a systematic code. The compressor and the comparator (or two-rail checker) together
form a new concurrent checker CC which is a conventional systematic code checker.
n
CUC
y
SC
k
z
k
P
2
z’
CUC’
C
CC
error indication
m
x
Figure 1 General OSC Scheme.
A special kind of OSC is the linear OSC where the predictor simply consists of XOR-trees.
In a linear OSC scheme the output responses of CUC' are therefore encoded as words of
a linear, or group code [14]. The most trivial cases of linear codes are the parity code
(k = 1) and the duplication code (k = n).
2
Very often concurrent checking schemes are designed to detect certain types of errors
(e.g. single errors, double errors, unidirectional errors) or all or a high percentage of
single stuck-at 0/1 faults. But in general, a single fault can cause dierent types of errors
to occur. Also, in large circuits, it is not feasible to check for each possible fault. It
is therefore sensible to consider only a small subset of faults that are reasonably likely
to occur. In [17] a linear OSC scheme was presented that is tailored to a predened
set of arbitrary faults. This scheme guarantees that each fault of the given set will be
detected, i.e., at least one error caused by each fault will be propagated to the outputs of
the compressor. However, this scheme does not allow to evaluate the quality of the error
detection. It is therefore useful to introduce a quality measure for the error detection.
A useful measure for the quality of the error detection is the error detection ability (EDA)
[4], dened as the ratio of the number of error patterns detected by the checker to the
total number of error patterns caused by every single stuck-at fault in the CUC. This
measure can easily be redened for a set of arbitrary faults, including nonclassical faults,
and can be interpreted as a degree for the partially self-checking property with respect to
the fault set. Another measure to evaluate the error detection quality is the probability of
achieving the TSC goal (PATG) [10]. Depending on the description level of the CUC, the
PATG takes into account the probability of the hypothesis that a fault is detected before
another fault occurs. Therefore, the EDA combined with the PATG would give a very
precise measure of the real error detection quality of a checker. We consider an EDA-like
measure for the error detection quality { the minimum error detection probability with
respect to a fault set, dened as the minimum ratio of the number of errors caused by any
fault of the given fault set detected by the checker to the total number of errors caused
by that fault.
In this paper we derive a method that allows to compute the compression function of
an OSC scheme in order to achieve a given minimum error detection probability for a
given fault set, while keeping the compression ratio as high as possible, i.e., the number
of outputs (or XOR-trees) of the compressor as low as possible in order to minimize the
required silicon area of the checker.
First we show how to compute the masking probability of errors caused by a given fault
with respect to a xed compression function. For the computation we use the concept of
weakly independent circuit outputs [17] and introduce the degree of dependence of circuit
outputs with respect to a fault. In the second part of the paper a procedure will be derived
3
to compute a compression function with a desired minimum error detection probability for
a set of faults. This problem can be mapped to a set covering problem. For the solution
of the problem we use a simple greedy strategy, combined with a threshold accepting
algorithm [3].
The derived algorithm will be discussed on the ISCAS-85 benchmark circuits [2]. Experimental results show the eectiveness of the developed procedure in terms of high output
space compression ratios.
In general, checkers are required to be strongly code-disjoint [12] or at least self-testing
with respect to a set of internal faults. However, the work of this paper is focused to
an optimal choice of the XOR-trees that dene the space compressor in order to achieve
low error masking for the given set of faults. Concerning the strongly code-disjoint and
self-testing property of the checker we refer to [4, 5, 8, 15]. We also do not pay attention
to the design of the predictor.
The paper is organized as follows. In Section 2 we provide the basic denitions and give the
problem statement. In Section 3 we show how we can compute the masking probability of
errors caused by a fault under a given linear OSC function. The algorithm for determining
an optimal linear OSC function for a given set of faults will be presented in Section 4.
Experimental results will be shown in Section 5. We conclude the paper with Section 6.
2 Denitions and Problem Statement
In this section we dene the basic notions which we will use throughout the paper.
Let F : X ! Y , X B m, Y B n, B = f0; 1g, be a Boolean function realised by the
CUC, and Fi : X ! B the function realised at output i of the circuit. Let denote a set
of physical faults in the circuit. For our purposes we consider a functional fault model
[6], i.e., each fault ' 2 is represented by a faulty circuit function, denoted by F (x; ').
The fault-free circuit function is denoted by F (x; ;).
Denition 1 The space compression function is a Boolean function h : Y ! Z that
maps the circuit output space Y B n to the space Z B k , k < n.
A linear space compression function can be expressed by a n k matrix C = (cij ),
cij 2 B , which is a submatrix of the generator matrix of the linear code. Therefore, the
4
space compressor SC is fully dened by the matrix C , and we have z = C T y, z 2 Z ,
y2Y.
The dimension dim(Z ) of the space Z corresponds to the number of outputs of the space
compressor SC, and therefore to the number of outputs of the predictor P. Under the
hypothesis that the complexity of the prediction function and the area required by the
predictor P (and also by the whole scheme) mainly depends on the number of outputs of
P, the goal is to minimize the dimension of Z .
Problem Statement Given a combinational circuit CUC, and a fault set . Find a
linear mapping h : Y ! Z with dim(Z ) = min such that the masking probability of an
error caused by a fault ' 2 is below a given bound 1 ? Pmin , i.e., the error is detected
with a probability P Pmin .
Denition 2 [17] Two circuit outputs Fi and Fj are called weakly independent with
respect to a fault ' if there is an input x such that
Fi(x; ') Fj (x; ') 6= Fi(x; ;) Fj (x; ;):
If two circuit outputs are not weakly independent with respect to ' then they are said to
be dependent with respect to '.
This denition can be easily extended to a set of circuit outputs [17]. If a set of outputs
is weakly independent with respect to a fault ' then there is at least one input x 2 X
such that the fault causes an error in the parity of this set of outputs.
Denition 3 Let Fi be a circuit output and ' a fault in the circuit. The error function
of Fi with respect to ' is dened as
(Fi; ') = Fi(x; ') Fi(x; ;):
As can easily be seen, a set H of circuit outputs is weakly independent with respect to a
fault ' if Fk 2H (Fk; ') 6= 0.
P
Let H be an additional circuit output dened by XORing the set H of circuit outputs (we
will use the notation H for a set of circuit outputs and an output H dened by XORing
the circuit outputs Fk 2 H interchangeably). The property of a set H of circuit outputs
to be weakly independent with respect to a fault ' indicates that ' can be detected at
5
the output H . However, the property to be weakly independent does not say anything
about the quality of the fault detection. It is therefore sensible to introduce a measure
that quanties the weak independence of circuit outputs.
Denition 4 The satisfying set Xi of (Fi; ') is the set of all inputs x that can detect
the fault ' at output Fi
Xi = fxj(Fi; ') = 1g:
Denition 5 The degree of dependence of two circuit outputs Fi and Fj with respect to
a fault ' is dened as
Xi \ Xj j = 1 ? jXi Xj j ;
%'(Fi; Fj ) = jjX
[X j
jX [ X j
i
j
i
j
where denotes the symmetric dierence of two sets.
%'(Fi; Fj ) = 0 implies that Fi and Fj are independent with respect to ', whereas
%'(Fi; Fj ) = 1 implies that Fi and Fj are dependent. We dene two circuit outputs
at which a fault ' cannot be detected to be dependent: Xi = Xj = ; ! %'(Fi; Fj ) := 1.
We can extend this denition to a set H of circuit outputs. The degree of dependence of
a set H of circuit outputs with respect to ' is
%'(H ) = 1 ? jFk 2H Xk j :
Fk 2H Xk
S
The degree of dependence is the probability that an error caused by ' will not be detected
at the output H under the condition that it causes an error at at least one output Fi 2 H .
%'(H ) = 0 therefore implies that an error caused by ' will always be detected at H if
one Fi 2 H is erroneous, %' (H ) = 1 indicates that an error caused by ' will never be
detected at the output H (also in the case where ' never causes an output Fi 2 H to be
erroneous).
After having introduced the degree of dependence as a measure for weakly independent
circuit outputs we are now able to compute the masking probability of errors caused by
a fault for a given OSC function.
6
3 Error Masking
If we perform a linear compression of the output space of the functional circuit then we
map the n-dimensional output space Y of the circuit to a k-dimensional space Z , k n.
For k < n there will be erroneous output vectors that will be mapped to the corresponding
correct vector in Z . Since the mapping is linear there can be up to 2n?k ? 1 such vectors.
The eect that an erroneous circuit response y0 = y e is mapped to the same vector
z 2 Z as the fault-free response y, h(y0) = h(y) = z, is called masking of the error e. The
goal of this section is to calculate the masking probability of an error caused by a given
fault ' for a given linear mapping h.
Denition 6 The probability that a fault ' can be detected at the output Fi under the
condition that ' causes an error is denoted by
j :
Pdetect (Fi; ') := P ((Fi; ') = 1 j9j : (Fj ; ') = 1) = j njXiX
k=1 k j
S
For the computation of the error masking probability for the mapping h we assume that
we know the probability Pdetect (Fi; ') for each circuit output Fi and the degree of dependence %' (Fi; Fj ) for each pair of outputs. From these values we have to compute the
corresponding probabilities of the outputs in Z and their respective degrees of dependence.
We rst consider two outputs Fi and Fj . The output Fi Fj has the error detection
probability
Pdetect (Fi Fj ; ') = jjXni XXj jj
S
where denotes the symmetric dierence of two
k=1 k
sets. Pdetect (Fi Fj ; ')
is given by
Pdetect (Fi Fj ; ') = Pdetect (Fi; ') + Pdetect (Fj ; ') ? 2Pdetect (Fi Fj ; ')
(1)
where Pdetect (Fi Fj ; ') denotes the probability that an error caused by ' is detected
simultaneously at Fi and Fj . Since Pdetect (Fi Fj ; ') is given by
Pdetect (Fi Fj ; ') = %'(Fi; Fj )(Pdetect (Fi; ') + Pdetect (Fj ; ') ? Pdetect (Fi Fj ; ')) (2)
we obtain
? %'(Fi; Fj ) (P (F ; ') + P (F ; ')):
Pdetect (Fi Fj ; ') = 11 +
detect j
% (F ; F ) detect i
'
i
j
7
(3)
Similarly we can compute the probability Pdetect (Fi; Fj ; ') that an error caused by ' is
detected at at least one of the outputs Fi and Fj :
Pdetect (Fi; Fj ; ') = Pdetect (Fi; ') + Pdetect (Fj ; ') ? Pdetect (Fi Fj ; ')
= 1 + % 1(F ; F ) (Pdetect (Fi; ') + Pdetect (Fj ; '))
' i j
= 1 ? % 1(F ; F ) Pdetect (Fi Fj ; '):
'
i
j
(4)
Given the probabilities (3) and (4) we are now able to compute the masking probablitity
of an error caused by a fault '.
Denition 7 The masking probability of an error caused by a fault ' at the output Fi Fj
is given by
Pmask (Fi Fj ; ') = Pdetect (Fi; Fj ; ') ? Pdetect (Fi Fj ; '):
(5)
Denition 7 can be easily extended for the computation of the error masking probability
of an arbitrary XOR-tree H = Fi1 Fij . As a special case we obtain Pmask (Fi; ') = 0.
The error masking probability for the mapping h, i.e., for the entire space compressor is
Pmask (H1; : : :; Hk ; ') = Pdetect (F1; : : :; Fn; ') ? Pdetect (H1; : : : ; Hk ; ')
= 1 ? Pdetect (H1; : : : ; Hk ; ');
(6)
where the Hi, i = 1; : : : ; k, are the XOR-trees that determine the mapping h.
We now want to derive the degree of dependence of two outputs Fi Fj and Fi Fk .
Since Fi Fj Fi Fk has the same error detection probability as Fj Fk we obtain
with (3)
1 ? %'(Fi Fj ; Fi Fk ) (P (F F ; ') + P (F F ; '))
detect i
k
1 + %' (Fi Fj ; Fi Fk ) detect i j
? %'(Fj ; Fk ) (P (F ; ') + P (F ); ')
= 11 +
detect k
%' (Fj ; Fk ) detect j
and therefore
Q1 ? Q2
%' (Fi Fj ; Fi Fk ) = Q
(7)
+Q
1
8
2
with
Q1 = (1 + %'(Fj ; Fk ))[(Pdetect(Fi Fj ; ') + (Pdetect (Fi Fk ; ')];
Q2 = (1 ? %'(Fj ; Fk ))[(Pdetect(Fj ; ') + (Pdetect (Fk ; ')]:
We easily derive
%'(Fi Fj ; Fi) = %'(Fi Fj ; Fi 0)
Fj ; ') + Pdetect (Fi; ') ? Pdetect (Fj ; ') :
= PPdetect ((FFi detect i Fj ; ') + Pdetect (Fi; ') + Pdetect (Fj ; ')
However, in general it is not possible to compute the probability Pdetect (Fi Fj Fk )
only with the error detection probabilities of single outputs and the dependence degrees
of each output pair. A simple example will illustrate this point.
Example 1 Consider three circuit outputs F1, F2, and F3, and a fault '. The error
functions of these outputs are shown in Table 1. Assume a fourth output that is erroneous
for x2. Then for the error detection probability of each output we obtain Pdetect (F1; ') =
Pdetect (F2; ') = Pdetect (F3; ') = 21 . Each output pair is a pair of weakly independent
outputs and the degree of dependence is %'(F1; F2) = %' (F1; F3) = %'(F2; F3) = 31 . For
the error detection probability of F1 F2 F3 we obtain Pdetect (F1 F2 F3; ') = 0 since
the error functions are linearly dependent.
Now assume a second fault with the error functions as shown in Table 2. The error
detection probablities of the outputs Fi are the same as above, the same holds for the
degree of dependence of two output pairs. However, for the error detection probability of
F1 F2 F3 we obtain Pdetect (F1 F2 F3; ') = 1 2.
(F1; ') (F2; ') (F3; ')
x1
1
0
1
x2
0
0
0
x3
1
1
0
x4
0
1
1
Table 1 Error functions for '.
(F1; ) (F2; ) (F3; )
x1
1
0
0
x2
1
1
1
x3
0
1
0
x4
0
0
1
Table 2 Error functions for .
For the same reason, it is not possible to compute the value of %'(Fi Fj ; Fk Fl) only
with the error detection probabilities and the dependence degrees of each output pair
9
F1
f1
x1
x2
F2
f2
x3
x4
F3
f3
F4
Figure 2 Example Circuit.
since for the computation we need the value of Pdetect (Fi Fj Fk Fl) which is not
available.
In [18] it was found that all outputs form an Abelian group G dened by the fault ' (linear
combinations of outputs are considered as outputs too). The subset of outputs at which
' cannot be detected forms a subgroup S0 of G. The elements of each coset of S0 have the
same error function and therefore the same satisfying set. Thus, the outputs of the same
coset have the same error detection probability Pdetect , and the degree of dependence of
two outputs Fi and Fj only depends on the cosets they belong to. It is therefore sucient
to know the error detection probability of only one element of each coset of S0 and the
degree of dependence of two representatives of each pair of cosets of S0.
Example 2 Consider the circuit of Fig. 2. Let '1 be a stuck-at 1 fault at the input x3.
The subgroup S0 and the cosets S1, S2, and S3 dened by '1 are given below.
S0= f0;
S1= fF1;
S2= fF3;
S3= fF1 F3;
F4 ;
F2 ;
F3 F4 ;
F2 F3 ;
F1 F 2 ;
F1 F 4 ;
F1 F2 F3 ;
F2 F3 F4 ;
F 1 F2 F 4
g
F 2 F4
g
F 1 F2 F 3 F 4 g
F 1 F3 F 4
g
Since F4 does not depend on x3, '1 cannot be detected at F4. This yields (F4; '1) =
0. Because F1 and F2 are simultaneously erroneous, ' cannot be detected at F1 F2.
Therefore we have (F1 F2; ') = 0. It is easily veried that '1 cannot be detected at
F1 F2 F4. Together with the constant output 0 these three outputs (or combinations
of outputs) form the subgroup S0.
10
Now consider the output F1. Since F1 does not belong to S0 it belongs to one of the cosets
of S0 which we denote S1. Because F2 is dependent on F1 with respect to '1, F2 belongs to
the same coset as F1. Since '1 cannot be detected at F4 we have (F1 F4; '1) = (F1; '1)
and (F2 F4; '1) = (F2; '1). Therefore, F1 F4 and F2 F4 belong to S1 too.
In the same way the other cosets S2 and S3 can be determined by choosing an output (or
a linear combination of outputs) that does not belong to any coset Si determined so far,
and XORing this output with all outputs of S0. 2
It is, however, not always possible to say to which coset an output (or a linear combination
of outputs) belongs to. If we would have (F4; '1) 6= 0 then it would not be possible to
say whether F1 F2 and F2 F3 F4 are dependent or not. In thoses cases we have
to compute the probabilities Pdetect in the same way as for a single circuit output, for
example through fault simulation. But with each computation of Pdetect for an output (or
a combination of outputs) we compute this probability for an entire coset of equivalent
outputs.
4 Algorithm
In this section we will show how to compute a linear mapping h : Y ! Z with dim(Z ) =
min such that for each fault ' 2 an error caused by ' will be detected with a probability
that is at least as high as a given value Pmin , i.e., for each fault the masking probability
of an error caused by this fault will be below a given upper bound. For the computation
of h we derive an algorithm that combines a greedy heuristic with a threshold accepting
algorithm.
We derive the algorithm step by step with the help of Example 2.
Example 2 (cont.) We consider the following faults: '1, '2: x4 stuck-at 0, '3: x1
stuck-at 0, '4: F3 stuck-at 1, and the double fault '5: x1 stuck-at 0/f2 stuck-at 1. Our
goal is to nd a mapping h such that an error caused by a fault ' 2 = f'1; : : :; '5g is
detected with probability P Pmin = 0:8.
As a rst step we construct an s t- matrix M1, where s = jj and t = 2n ? 1. The
number i of a row of M1 corresponds to the index of 'i 2 . The number j of a column
of M1 corresponds to the decimal encoded subset Hj of outputs. If for example n = 4
11
then j = 10 is the decimal equivalent to (F4; F3; F2; F1) = (1; 0; 1; 0) which indicates that
H10 = fF2; F4g.
The matrix M1 = (mij ) is determined by
if (Hj ; 'i) = Fk 2Hj (Fk ; 'i) 6= 0
mij = 01;; otherwise
(
P
which means that mij = 1 if Hj is a set of weakly independent outputs with respect to
'i, and mij = 0 if not.
For the faults '1; : : : ; '5 the matrix M1 is shown in Table 3.
'1
'2
'3
'4
'5
H1
1
0
1
0
1
H2
1
0
1
0
1
H3
0
0
1
0
1
H4
1
1
0
1
1
H5
1
1
1
1
1
H6 H7 H8 H9 H10 H11 H12 H13 H14 H15
1 1 0 1 1 0 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 0 0 0 0 1 1 1 1
1 1 1 1 1 1 1 1 1 1
Table 3 Matrix M1 for the circuit of Fig. 2.
In [18] it was shown that a linear mapping h with dim(Z ) = min is obtained by computing
a minimal column cover of M1. A simple greedy strategy is to take at each step that matrix
column that covers the most rows not yet covered, until a complete cover is found. The
obtained solution guarantees that each fault 'i will be detected.
In Table 3 we can see that there are 7 single columns that are already a cover. If we
choose H5, H6, or H12 the space compressor is a small as possible (the compressor only
needs one XOR-gate), whereas H15 additionally guarantees the detection of all faults that
inuence the parity of an output word.
For our purposes we have to modify this algorithm slightly. First we replace each element
in the matrix M1 by the respective probability Pdetect (Hj ; 'k ). The new matrix M2 is
shown in Table 4.
For each column mj of M2 we can compute a column weight bj = ji=1j mij that charactarizes the fault detection quality of the respective linear combination of outputs. Since
we don't consider an error detection probability Pdetect > Pmin to be better than Pmin we
replace each value mij in M2 that is greater than Pmin by Pmin and obtain the matrix M3.
P
12
'1
'2
'3
'4
'5
H1
0.66
0.00
0.25
0.00
0.14
H2
0.66
0.00
0.75
0.00
0.43
H3
0.00
0.00
1.00
0.00
0.57
H4
0.66
0.86
0.00
1.00
0.43
H5
0.66
0.86
0.25
1.00
0.57
H6
0.66
0.86
0.75
1.00
0.43
H7
0.66
0.86
1.00
1.00
0.57
H8 H9
0.00 0.66
0.29 0.29
0.75 0.75
0.00 0.00
0.43 0.43
Table 4 Matrix M2.
H10
0.66
0.29
0.25
0.00
0.86
H11
0.00
0.29
0.25
0.00
0.86
H12
0.66
0.86
0.75
1.00
0.57
H13
0.66
0.86
0.75
1.00
0.57
H14
0.66
0.86
0.25
1.00
0.57
H15
0.66
0.86
0.25
1.00
0.57
The reason why we do this is the following. Consider the matrix M in Table 5 and let
Pmin = 0:5. In M the column weight bi is greater than bj and leads to the conclusion that
Hi has a better minimum error detection probability than Hj . However, with Hi only fault
'2 will be detected with the required probability Pmin = 0:5, whereas Hj achieves Pmin
for both faults. Therefore the normalization to the value Pmin avoids these misleading
conclusions. Table 6 shows that Hj has a better minimum error detection probability
than Hi with respect to Pmin = 0:5. As we can see, this statement depends on the value
of Pmin . For Pmin = 0:9, the correct conclusion would be again bi > bj .
Hi Hj
'1 0.4 0.5
'2 0.5 0.5
b 0.9 1.0
Table 6 Matrix M 0.
Hi Hj
'1 0.4 0.6
'2 0.9 0.6
b 1.3 1.2
Table 5 Matrix M .
Table 7 shows the matrix M3 which corresponds to the normalization of M2 with respect
to Pmin = 0:8, together with the respective column weights b. A matrix element mij of
M3 is determined by
(
Pdetect (Hj ; 'i) < Pmin ;
mij = PPdetect; (Hj ; 'i); ifotherwise.
min
The normalized values correspond to the shadowed elds in Table 7.
The next step of the algorithm consists of choosing the matrix column with the largest
value b. In matrix M3 this column is column 7 with b7 = 3:63. Therefore the rst XORtree involves the set H7. Errors caused by the faults '2, '3, and '4 are already detected
with the required probability Pmin = 0:8. For the faults '1 and '5 the required minimum
error detection probability is still not achieved by H7. Therefore we have to choose the
next column i with the best value bi.
13
'1
'2
'3
'4
'5
b
H1
0.66
0.00
0.25
0.00
0.14
1.05
H2
0.66
0.00
0.75
0.00
0.43
1.84
H3
0.00
0.00
0.80
0.00
0.57
1.37
H4
0.66
0.80
0.00
0.80
0.43
2.69
'1
'2
'3
'4
'5
b
H1
0.80
0.80
0.80
0.80
0.57
3.77
H2
0.80
0.80
0.80
0.80
0.79
3.99
H3
0.66
0.80
0.80
0.80
0.79
3.85
H4
0.66
0.80
0.80
0.80
0.79
3.85
H5
0.66
0.80
0.25
0.80
0.57
3.08
H6 H7 H8 H9 H10 H11
0.66 0.66 0.00 0.66 0.66 0.00
0.80 0.80 0.29 0.29 0.29 0.29
0.75 0.80 0.75 0.75 0.25 0.25
0.80 0.80 0.00 0.00 0.00 0.00
0.43 0.57 0.43 0.43 0.80 0.80
3.44 3.63 1.47 2.13 2.00 1.34
Table 7 Matrix M3, (Pmin = 0:8).
H12
0.66
0.80
0.75
0.80
0.57
3.58
H13
0.66
0.80
0.75
0.80
0.57
3.58
H14
0.66
0.80
0.25
0.80
0.57
3.08
H15
0.66
0.80
0.25
0.80
0.57
3.08
H5 H6 H7 H8 H9 H10 H11 H12
0.80 0.80 0.66 0.66 0.80 0.80 0.66 0.66
0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80
0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80
0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80
0.79 0.57 0.57 0.79 0.79 0.80 0.80 0.80
3.99 3.77 3.63 3.85 3.99 4.00 3.86 3.86
Table 8 Matrix M4, (values mij for fH7; Hig).
H13
0.80
0.80
0.80
0.80
0.80
4.00
H14
0.80
0.80
0.80
0.80
0.79
3.99
H15
0.66
0.80
0.80
0.80
0.79
3.85
Due to the fact that H7 now belongs to the column cover, we have to recompute the
matrix values and column weights. The error detection probability for 'k with Hj is now
determined by Pdetect (H7; Hj ; 'k ). After normalization, matrix M4 (Table 8) is obtained.
The column weight bi = 4:0 = Pmin jj indicates that C = fH7; Hig is a minimum
column cover such that each fault of is detected with a probability P Pmin .
As a result, the smallest possible dimension of Z such that an error caused by a fault
' 2 is detected with probability P Pmin = 0:8 is dim(Z ) = 2. (For both solutions
it turns out that in fact for each fault 'i an error detection probability Pdetect = 1:00
has been achieved.) The algorithm for the computation of a mapping h : Y ! Z with
dim(Z ) = min is summarized below.
Procedure 1 (MIN COVER)
C = ;;
d = 0;
P = Pmin jj;
repeat
d := d + 1;
i := BEST COLUMN (C );
C := C [ Hi ;
until bi = P:
14
After termination of Procedure 1, the value d is the dimension of Z , and C denes a
mapping h : Y ! Z with dim(Z ) = d.
In Procedure 1, BEST COLUMN (C ) computes the matrix column i with the largest
value bi with respect to the partial column cover computed so far, where
bi =
jj
X
k=1
NPmin (Pdetect (C; Hi; 'k ))
(8)
and the function NPmin computes the normalized value with respect to Pmin .
Since the number of columns of a matrix M grows exponentially with the number of
primary circuit outputs, an exhaustive enumeration of all matrix columns, i.e., linear
combinations of circuit outputs, and the computation of the respective weights bi is not
feasable for circuits with a large number of outputs.
The computation of a matrix column i with the largest weight bi is equivalent to the
problem of determining a point Hi with the maximum value bi for the objective function
in a discrete search space with 2n elements. This search space is charactarized by two
important properties.
each element Hi is a possible (not necessarily optimal) solution
the value bj of a neighbor Hj of Hi is not much dierent from bi
Denition 8 A neighbor of a set Hi of circuit outputs is a set Hj with d(I; J ) = 1, where
d denotes the Hamming distance, and I and J are the binary representations of i and j ,
respectively.
In other words, a neighbor of Hi is obtained by adding/omitting one output to/from
Hi . The problem to be solved is a classical combinatorial optimization problem which
is known to be NP-complete. For solving such problems ecient algorithms are known,
for example simulated annealing [9], genetic algorithms [7], or threshold accepting [3].
For nding the best matrix column we use an algorithm that is derived from a threshold
accepting algorithm. This algorithm is listed below.
15
Procedure 2 (BEST COLUMN(C))
choose a set Hi (initial set);
choose an initial threshold T > 0;
repeat
repeat
choose a neighbor Hj of Hi ;
if bi ? bj < T then Hi := Hj ;
until a long time no increase of bi
or too many iterations;
T := T t, (0 < t < 1);
until no change of bi anymore.
5 Experimental Results
In this section we present the results for the OSC using the algorithm developed in Section 4. Experiments were performed on the ISCAS-85 benchmark circuits [2] to study the
eectiveness of the algorithm for varying values of Pmin and dierent fault set sizes.
The computation of a matrix column corresponding to an actually chosen set of XORtrees was done in the following way. First, the original circuit was fault simulated for
the given set of faults and a given number of random patterns. The simulation was
performed with the fault simulator COMSIM [11]. The simulation yields a list that
indicates how often each fault was detected. Then the netlist of the circuit was modied
according to the actual choice of the XOR-trees, i.e., the actual OSC function, and the
obtained circuit was resimulated. Then for each fault the frequency of detection after
modication of the circuit was devided by the respective value before modication, i.e.,
the fault detection frequency of the original circuit. After normalization with respect to
Pmin these values constituted the desired matrix column, and the respective value of bi
was computed according to (8).
For each circuit we considered a set of 100 randomly chosen single stuck-at faults.
Table 9 shows the results of the OSC algorithm for the benchmark circuits for dierent
values of Pmin . These values vary from Pmin > 0 (for each fault an error is detected with
probability Pdetect > 0) to Pmin = 1:00. Since for circuits with a large number of primary
inputs it is not feasible to simulate the circuit for every input pattern we have to choose a
sample of random patterns. Therefore, the values Pdetect obtained through the simulation
16
are estimations P^detect in reality. However, the results that would be obtained with the
exact values of Pdetect are expected to dier not very much from the results obtained
with the estimations P^detect . In the experiments, for each circuit we used sets of random
patterns of uniform size 105.
circuit
name
c432
c499
c880
c1355
c1908
c2670
c3540
c5315
c6288
c7552
# XOR-trees
#PO > 0 0.5 0.6 0.7 0.8 0.9 0.95
7
1 2 2 2 3 5 5
32
2 2 2 2 3 3 4
26
2 2 2 2 2 3 4
32
1 1 1 1 2 2 3
25
1 1 1 1 2 3 3
140 2 2 2 2 3 3 4
22
1 1 2 2 3 5 5
123 1 1 2 2 3 4 5
32
1 2 2 2 3 4 5
108 2 2 2 2 2 3 3
Table 9 OSC results for ISCAS-85 circuits.
0.99 1.0
6 7
5 6
5 8
4 5
3 4
5 5
7 14
5 6
6 9
4 5
Fig. 3 illustrates the working mechanism of the proposed algorithm. The algorithm
rst tries to nd the XOR-tree that yields the highest error detection probability. If the
found XOR-tree does not guarantee the required probability Pmin the algorithm xes this
XOR-tree and tries to nd a second XOR-tree which together with the rst one leads
to the highest error detection probability. This process is continued until the probability
Pmin is achieved. The search for the optimal XOR-tree is nished if either the value of
Pdetect does not change anymore or the threshold value is below a given small value. Fig. 3
illustrates the algorithm for the circuit c432 and Pmin = 0:95. The plotted values of Pdetect
correspond to the temporary values obatained when exiting the inner loop in Procedure 2.
It is interesting to observe that the overall error detection probability grows with increasing values of Pmin . This eect can be seen in Fig. 4. The diagram shows that the number
of faults for which errors are detected with probability P is in general higher for higher
values of Pmin . Large improvements are achieved if the number of XOR-trees increases
in order to achive the required value Pmin for the fault set . The diagram shows the
overall error detection probability for the circuit c1355 for varying values of Pmin . Another interesting fact is that for Pmin = 1:0 (5 XOR-trees) there are 1438 faults (of 1574
nonequivalent single stuck-at faults in c1355) for which each error is detected with the set
of 105 random patterns, although the compression function only was computed to detect
17
Pdetect
0.95
0.80
0.60
error detection probability
0.40
0
100
200
0.20
threshold value
0.00
XOR-tree 1 XOR-tree 2
XOR-tree 3
XOR-tree 4
Figure 3 Working mechanism of the algorithm.
XOR-tree 5
each error caused by a set of only 100 faults. The errors of 98% of all (nonequivalent)
single stuck-at faults are still detected with probability P 0:95.
In a second series of experiments we studied the inuence of the size of the fault set on
the dimension of the space Z . The fault set sizes vary from 100 faults to a complete fault
set (set of all single stuck-at faults). The experiments were again performed for values
Pmin from Pmin > 0 to Pmin = 1:0. Table 10 shows the results for the circuit c1355.
These examples show that high output space compression ratios can be achieved even for
very high values Pmin and large fault sets.
# XOR-trees
#faults > 0 0.5 0.6 0.7 0.8 0.9 0.95 0.99 1.0
100
1 1 1 1 2 2 3
4 5
500
1 1 1 1 2 3 4
6 6
1000
1 2 2 2 3 3 4
6 6
1300
2 3 3 3 4 4 5
6 6
1574
2 3 3 3 3 4 4
6 7
Table 10 Number of XOR-trees for dierent fault set sizes (circuit c1355).
18
# faults
1600
1400
1200
Pmin > 0
Pmin = 0.5
Pmin = 0.6
Pmin = 0.7
Pmin = 0.8
Pmin = 0.9
Pmin = 0.95
1000
800
600
400
Pmin = 0.99
Pmin = 1.0
200
0
0.0
circuit c1355
0.2
0.4
0.6
0.8
1.0
detection probability
Figure 4 Error detection for all single stuck-at faults.
6 Conclusions
In this paper we presented a method for designing concurrent checkers based on a linear
compression of the output space of the circuit. The checker design is tailored to a given
set of target faults. Errors caused by these faults have to be detected with a probability
that is equal to or above a given bound. To keep the complexity of the checker, especially
that of the prediction circuit, low, the main objective of the method is the minimization of
the number of outputs of the space compressor, i.e., that maximization of the compression
ratio.
In order to derive the design algorithm we rst discussed the computation of the masking
probability of errors caused by a single fault under a given compression function. Second,
we showed that the problem of minimizing the number of outputs of the compression
function can be mapped to a set covering problem. For the solution of the problem we
used a simple greedy strategy, combined with a threshold accepting algorithm.
The eectiveness of the algorithm was studied on the ISCAS-85 benchmark circuits for
varying minimum error detection probabilities and dierent fault set sizes. The results
19
show that high compression ratios can be achieved, even for high minimum error detection
probabilities and large fault sets.
The algorithm can easily be modied if dierent faults are required to be detected with
dierent minimum error detection probabilities.
References
[1] D.A. Anderson and G. Metze: Design of Totally Self-Checking Check Circuits for
m-Out-of-n Codes, IEEE Transactions on Computers, vol. C-22, no. 3, March 1973,
pp. 263-269.
[2] F. Brglez and H. Fujiwara: A Neutral Netlist of 10 Combinational Benchmark Circuits and a Target Translator in Fortran, Proc. IEEE Symp. on Circuits and Systems,
June 1985, pp. 663-698.
[3] G. Dueck and T. Scheuer: Threshold Accepting: A General Purpose Optimization
Algorithm Appearing Superior to Simulated Annealing, Journal of Computational
Physics, vol. 90, 1990, pp. 161-175.
[4] E. Fujiwara, N. Mutoh, and K. Matsuoka: A Self-Testing Group-Parity Prediction
Checker and its Use for Built-In Testing, IEEE Transactions on Computers, vol.
C-33, no. 6, June 1984, pp. 578-583.
[5] E. Fujiwara and K. Matsuoka: A Self-Checking Generalized Prediction Checker and
Its Use for Built-In Testing, IEEE Transactions on Computers, vol. 36, no. 1, Jan.
1987, pp. 86-93.
[6] M. Goessel and S. Graf: Error Detection Circuits, McGraw-Hill, London, 1993.
[7] D.E. Goldberg: Genetic Algorithms in Search, Optimization, and Machine Learning,
Addison-Wesley, Reading, MA, 1989.
[8] J. Khakbaz and E.J. McCluskey: Self-Testing Embedded Parity Checkers, IEEE
Transactions on Computers, vol. C-33, no. 8, Aug. 1984, pp. 753-756.
[9] P.J.M. van Laarhoven and E.H.L. Aarts: Simulated Annealing: Theory and Applications, D. Reidel Publishing Company, Dordrecht, 1987.
20
[10] J.-C. Lo and E. Fujiwara: A Probabilistic Measurement for Totally Self-Checking
Circuits, Proc. IEEE Int. Workshop on Defect and Fault Tolerance in VLSI Systems,
Venice, Oct. 1993, pp. 263-270.
[11] U. Mahlstedt and J. Alt: Simulation of non-classical Faults on the Gate Level | The
Fault Simulator COMSIM, Proc. IEEE Int. Test Conference, Baltimore, Oct. 1993,
pp. 883-892.
[12] M. Nicolaidis and B. Courtois: Strongly Code Disjoint Checkers, IEEE Transactions
on Computers, vol. 37, no. 6, June 1988, pp. 751-756.
[13] J. Ortega, A. Prieto, A. Lloris, and F.J. Pelayo: Generalized Hopeld Neural Network
for Concurrent Testing, IEEE Transactions on Computers, vol. 42, no. 8, August
1993, pp. 898-912.
[14] W.W. Peterson and E.J. Weldon: Error Correcting Codes, MIT Press, 1981.
[15] S.C. Seth and K.L. Kodandapani: Diagnosis of Faults in Linear Tree Networks, IEEE
Transactions on Computers, vol. C-26, no. 1, Jan. 1977, pp. 29-33.
[16] J.E. Smith and G. Metze: Strongly Fault Secure Logic Networks, IEEE Transactions
on Computers, vol C-27, no 6, June 1978, pp. 491-499.
[17] E.S. Sogomonyan and M. Goessel: Design of Self-Testing and On-Line Fault Detection Combinational Circuits with Weakly Independent Outputs, Journal of Electronic
Testing, vol. 4, pp. 267-281, 1993.
[18] S. Tarnick: Fault-Dependent Output Space Compaction for Concurrent Checking and
BIST, Max Planck Society Technical Report MPI-I-93-601, May 1993.
[19] J.F. Wakerly: Partially Self-Checking Circuits and Their Use in Performing Logical
Operations, IEEE Transactions on Computers, vol. C-23, no. 7, July 1974, pp. 658666.
21