as a PDF

Bounding Error Masking in Linear
Output Space Compression Schemes
Steen Tarnick
Max Planck Society
Fault-Tolerant Computing Group
at the University of Potsdam, Germany
Abstract
Based on the principle of linear output space compression we present a design method for concurrent
checkers such that the masking probability of errors
caused by faults of a given set of circuit faults is below a given bound, while keeping the space compression ratio, dened as the ratio of the number of circuit
outputs to the number of outputs of the space compressor, as high as possible. Experiments performed on the
ISCAS-85 benchmark circuits show that the compression ratios achieved with compression functions computed with this method can be very high, even for very
low bounds for the error masking probability and large
fault sets.
1 Introduction
Output space compression (OSC) [3, 4, 6] is a simple
method to design self-testing or partially self-checking
[7] circuits. The basic principle of OSC is shown in Fig.
1. A space compressor SC maps the n-dimensional
output space of the circuit under check (CUC) to a
k-dimensional space, k < n. The predictor P is designed to generate the same output responses as the
compressor. The two responses of the compressor and
the predictor are then compared by a comparator C.
If the two responses are dierent an error is indicated
by the comparator. CUC and predictor together form
a new circuit CUC', the outputs of which are now encoded as a systematic code. The compressor and the
comparator together form a new concurrent checker
CC which is a conventional systematic code checker.
A special kind of OSC is the linear OSC where the
compressor simply consists of XOR-trees. In a linear
OSC scheme the output responses of CUC' are therefore encoded as words of a linear code.
In general, concurrent checking schemes are designed to detect certain types of errors (e.g. single
errors, double errors, unidirectional errors) or all or
a high percentage of single stuck-at 0/1 faults. In [6]
a linear OSC scheme was presented that is tailored
to a predened set of arbitrary faults. This scheme
guarantees that each fault of the given set will be detected, i.e., at least one error caused by each fault
will be propagated to the outputs of the compressor.
However, this scheme does not allow to evaluate the
quality of the error detection. It is therefore useful to
introduce a quality measure for the error detection.
As measure for the error detection quality we consider the minimum error detection probability with respect to a fault set, dened as the minimum ratio of
the number of errors caused by any fault of the given
fault set detected by the checker to the total number
of errors caused by that fault.
In this paper we derive a method that allows to
compute the compression function of the OSC scheme
in order to achieve a given minimum error detection
probability for a given fault set, while keeping the compression ratio as high as possible, i.e., the number of
outputs (or XOR-trees) of the compressor as low as
possible.
First we show how to compute the masking probability of errors caused by a given fault with respect
to a xed compression function. In the second part
of the paper a procedure will be derived to compute
a compression function with a desired minimum error
detection probability for a set of faults. This problem
can be mapped to a set covering problem. For the solution of the problem we use a simple greedy strategy,
combined with a threshold accepting algorithm [2].
The derived algorithm will be discussed on the
ISCAS-85 benchmark circuits [1]. Experimental results show the eectiveness of the developed procedure
in terms of high output space compression ratios.
n
CUC
y
SC
k
z
k
P
CUC’
2
z’
C
CC
error indication
m
x
Fig. 1. General OSC Scheme.
2 Denitions and Problem Statement
Let F : X ! Y , X B m , Y B n , B = f0; 1g,
be a Boolean function realised by the CUC, and Fi :
X ! B the function realised at output i of the circuit.
Let denote a set of physical faults in the circuit. For
our purposes we consider a functional fault model, i.e.,
each fault ' 2 is represented as a faulty circuit
function, denoted by F(x; '). The fault-free circuit
function is denoted by F(x; ;).
Denition 1 The space compression function is a
Boolean function h : Y ! Z that maps the circuit
output space Y B n to the space Z B k , k < n.
A linear space compression function can be expressed by a n k matrix C = (cij ), cij 2 B, which is
a submatrix of the generator matrix of the linear code.
Therefore, the space compressor SC is fully dened by
the matrix C, and we have z = C Ty, z 2 Z, y 2 Y .
The dimension dim(Z) of the space Z corresponds
to the number of outputs of the space compressor SC,
and therefore to the number of outputs of the predictor P. Under the hypothesis that the complexity of
the predictior function and the area required by the
predictor P (and also by the whole scheme) mainly
depends on the number of outputs of P, the goal is to
minimize the dimension of Z.
Problem Statement: Given a combinational circuit CUC, and a fault set . Find a linear mapping
h : Y ! Z with dim(Z) = min such that the masking
probability of an error caused by a fault ' 2 is below a given bound 1 ? Pmin, i.e., the error is detected
with a probability P Pmin .
Denition 2 [6] Two circuit outputs Fi and Fj are
called weakly independent with respect to a fault ' if
there is an input x such that
Fi (x; ') Fj (x; ') 6= Fi (x; ;) Fj (x; ;):
If two circuit outputs are not weakly independent with
respect to ' then they are said to be dependent with
respect to '.
This denition can be easily extended to a set of
circuit outputs [6]. If a set of outputs is weakly independent with respect to a fault ' then there is at least
one input x 2 X such that the fault causes an error in
the parity of this set of outputs.
Denition 3 Let Fi be a circuit output and ' a fault
in the circuit. The error function of Fi with respect
to ' is dened as
(Fi ; ') = Fi(x; ') Fi(x; ;):
As can easily be seen, a set H of circuit outputs
isPweakly independent with respect to a fault ' if
Fk2H (Fk ; ') 6= 0.
Let H be an additional circuit output dened by
XORing the set H of circuit outputs (we will use the
notation H for a set of circuit outputs and an output H dened by XORing the circuit outputs Fk 2 H
interchangeably). The property of a set H of circuit
outputs to be weakly independent with respect to a
fault ' indicates that ' can be detected at the output
H. However, the property to be weakly independent
does not say anything about the quality of the fault
detection. It is therefore sensible to introduce a measure that quanties the weak independence of circuit
outputs.
Denition 4 The satisfying set Xi of (Fi ; ') is the
set of all inputs x that can detect the fault ' at the
output Fi
Xi = fxj(Fi; ') = 1g:
Denition 5 The degree of dependence of two circuit
outputs Fi and Fj with respect to a fault ' is dened
as
Xi \ Xj j = 1 ? jXi Xj j ;
%' (Fi; Fj ) = jjX
[X j
jX [ X j
i
j
i
j
where denotes the symmetric dierence of two sets.
We dene two circuit outputs at which a fault ' cannot be detected to be dependent: Xi = Xj = ; !
%' (Fi ; Fj ) := 1.
The degree of dependence of a set H of circuit outputs with respect to ' is
Fk2H Xk j
%' (H) = 1 ? jS
Fk2H Xk The degree of dependence is the probability that an
error caused by ' will not be detected at the output
H under the condition that it causes an error at at
least one output Fi 2 H.
3 Error Masking
If we perform a linear compression of the output
space of the functional circuit then we map the ndimensional output space Y of the circuit to a kdimensional space Z, k n. For k < n there will
be erroneous output vectors that will be mapped to
the corresponding correct vector in Z. The eect that
an erroneous circuit response y0 = y e is mapped
to the same vector z 2 Z as the fault-free response y,
h(y0 ) = h(y) = z, is called masking of the error e. The
goal of this section is to calculate the masking probability of an error caused by a given fault ' for a given
linear mapping h.
Denition 6 The probability that a fault ' can be
detected at the output Fi under the condition that '
causes an error is denoted
Pdetect(Fi ; ') := jSnjXi jX j :
k=1 k
For the computation of the error masking probability for the mapping h we assume that we know
the probability Pdetect(Fi; ') for each circuit output
Fi and the degree of dependence %' (Fi ; Fj ) for each
pair of outputs. From these values we have to compute
the respective probabilities of the outputs in Z.
We rst consider two outputs Fi and Fj . The output Fi Fj has the error detection probability
jXi Xj j :
Pdetect(Fi Fj ; ') = jS
n X j
k=1 k
Using set operations we obtain
Pdetect(Fi Fj ; ') =
1 ? %' (Fi ; Fj ) (P
1 + % (F ; F ) detect(Fi ; ') + Pdetect(Fj ; ')):
' i j
In a similar way we can compute the probability
Pdetect(Fi ; Fj ; ') that ' is detected at at least one of
the outputs Fi and Fj :
Pdetect(Fi ; Fj ; ') =
1
1 ? % (F ; F ) (Pdetect(Fi Fj ; '))
' i j
In the same way we can compute the probability
Pdetect(H1 ; : : :; Hk ; '), Hi = Fi1 Fij , if we know
the respective values of Pdetect and %' .
Denition 7 The masking probability of an error
caused by a fault ' for a given compression function
h is given by
Pmask (H1; : : :; Hk ; ') = 1 ? Pdetect(H1; : : :; Hk; ');
where h is determined by the XOR-trees Hi, i =
1; : : :; k.
However, in general it is not possible to compute the
values of Pdetect for each combination of circuit outputs only with the knowledge of the error detection
probabilities of single outputs and the dependence degrees of each output pair.
It can be shown that all outputs form an Abelian
group G dened by the the fault ' (linear combinations of outputs are considered as outputs too). The
subset of outputs at which ' cannot be detected forms
a subgroup S0 of G. The elements of each coset of S0
have the same error function and therefore the same
satisfying set. Thus, the outputs of the same coset
have the same error detection probability, and the degree of dependence of two outputs Fi and Fj only depends on the cosets they belong to. It is therefore
sucient to know the error detection probability of
only one element of each coset of S0 and the degree of
dependence between two representatives of each pair
of cosets of S0 .
Example 1 Consider the circuit of Fig. 2. Let '1 be
a stuck-at 1 fault at the input x3. The subgroup S0
and the cosets S1 , S2 , and S3 dened by '1 are given
below.
S0 = f0; F4; F1 F2; F1 F2 F4g
S1 = fF1; F2; F1 F4; F2 F4 g
S2 = fF3; F3 F4; F1 F2 F3; F1 F2 F3 F4g
S3 = fF1 F3; F2 F3; F2 F3 F4; F1 F3 F4g
F1
f1
x1
x2
x3
x4
F2
f2
F3
f3
F4
Fig. 2. Example Circuit.
It is, however, not always possible to say to which
coset an output (or a linear combination of outputs)
belongs to. In thoses cases we have to compute the
probabilities Pdetect in the same way as for a single
circuit output, for example through fault simulation.
But with each computation of Pdetect for an output (or
a combination of outputs) we compute this probability
for an entire coset of equivalent outputs.
'1
'2
'3
'4
'5
'1
'2
'3
'4
'5
b
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15
1 1 0 1 1 1 1 0 1 1
0
1
1
1
1
0 0 0 1 1 1 1 1 1 1
1
1
1
1
1
1 1 1 0 1 1 1 1 1 1
1
1
1
1
1
0 0 0 1 1 1 1 0 0 0
0
1
1
1
1
1 1 1 1 1 1 1 1 1 1
1
1
1
1
1
Table 1. Matrix M1 for the circuit of Fig. 1.
H1
0.66
0.00
0.25
0.00
0.14
1.05
H2
0.66
0.00
0.75
0.00
0.43
1.84
H3
0.00
0.00
0.80
0.00
0.57
1.37
H4
0.66
0.80
0.00
0.80
0.43
2.69
H5
0.66
0.80
0.25
0.80
0.57
3.08
H6
0.66
0.80
0.75
0.80
0.43
3.44
H7 H8 H9 H10 H11
0.66 0.00 0.66 0.66 0.00
0.80 0.29 0.29 0.29 0.29
0.80 0.75 0.75 0.25 0.25
0.80 0.00 0.00 0.00 0.00
0.57 0.43 0.43 0.80 0.80
3.63 1.47 2.13 2.00 1.34
Table 2. Matrix M2 , (Pmin = 0:8).
4 Algorithm
In this section we will show how to compute a linear
mapping h : Y ! Z with dim(Z) = min such that for
each fault ' 2 an error caused by ' will be detected
with a probability P Pmin. We derive the algorithm
step by step with the help of Example 1.
Example 1 (cont.) We consider the following
faults: '1 , '2 : x4 stuck-at 0, '3: x1 stuck-at 0, '4 :
F3 stuck-at 1, and the double fault '5 : x1 stuck-at
0/f2 stuck-at 1. Our goal is to nd a mapping h such
that an error caused by a fault ' 2 = f'1; : : :; '5g
is detected with probability P Pmin = 0:8.
As a rst step we construct an s t- matrix M1 ,
where s = jj and t = 2n ? 1. The number i of
a row of M1 corresponds to the index of 'i 2 .
The number j of a column of M1 corresponds to the
decimal encoded subset Hj of outputs. If for example n = 4 then j = 10 is the decimal equivalent
to (F4; F3; F2; F1) = (1; 0; 1; 0) which indicates that
H10 = fF2; F4g. The matrix M1 = (mij ) is determined by
mij =
1; if (Hj ; 'i ) = Fk2Hj (Fk ; 'i ) 6= 0
0; otherwise.
P
For the set the matrix M1 is shown in Table 1.
For P > 0 a linear mapping h with dim(Z) = min
is obtained by computing a minimal column cover of
M1 . A simple greedy strategy is to take at each step
that matrix column that covers the most rows not yet
covered, until a complete cover is found. In Table 1
we can see that there are 7 single columns that are
already a cover.
H12
0.66
0.80
0.75
0.80
0.57
3.58
H13
0.66
0.80
0.75
0.80
0.57
3.58
H14
0.66
0.80
0.25
0.80
0.57
3.08
H15
0.66
0.80
0.25
0.80
0.57
3.08
For our purposes we have to modify this algorithm
slightly. First we replace each matrix element by
mij =
Pdetect(Hj ; 'i ); if Pdetect(Hj ; 'i) < Pmin
Pmin;
otherwise.
and obtain the matrix M2 . For each column
mj of
Pjj
M2 we can compute a column weight bj = i=1 mij
that charactarizes the fault detection quality of the respective linear combination of outputs. Table 2 shows
the matrix M2 together with the column weights b.
The next step of the algorithm consists of choosing
the matrix column with the largest value b. In matrix
M3 this column is column 7 with b7 = 3:63. Therefore
the rst XOR-tree involves the set H7. Errors caused
by the faults '2, '3 , and '4 are already detected with
the required probability Pmin = 0:8. The required
error detection probability for '1 and '5 is still not
achieved by H7. Therefore we have to choose the next
column i with the best value bi.
Due to the fact that H7 now belongs to the column
cover, we have to recompute the matrix values and
column weights. The fault detection probability for
'k with Hj is now determined by Pdetect(H7; Hj ; 'k ).
The new matrix M3 is shown in Table 3.
The column weight bi = 4:0 = Pmin jj indicates
that C = fH7; Hig is a minimum column cover such
that each fault of is detected with a probability P Pmin. As a result, the smallest possible dimension of Z
such that an error caused by a fault ' 2 is detected
with probability P Pmin = 0:8 is dim(Z) = 2.
The algorithm for the computation of a mapping
h : Y ! Z with dim(Z) = min is summarized below.
'1
'2
'3
'4
'5
b
H1
0.80
0.80
0.80
0.80
0.57
3.77
H2
0.80
0.80
0.80
0.80
0.79
3.99
H3
0.66
0.80
0.80
0.80
0.79
3.85
H4
0.66
0.80
0.80
0.80
0.79
3.85
H5
0.80
0.80
0.80
0.80
0.79
3.99
H6 H7 H8 H9 H10 H11 H12
0.80 0.66 0.66 0.80 0.80 0.66 0.66
0.80 0.80 0.80 0.80 0.80 0.80 0.80
0.80 0.80 0.80 0.80 0.80 0.80 0.80
0.80 0.80 0.80 0.80 0.80 0.80 0.80
0.57 0.57 0.79 0.79 0.80 0.80 0.80
3.77 3.63 3.85 3.99 4.00 3.86 3.86
Table 3. Matrix M3 , (values mij for fH7; Hig).
Procedure 1 (MIN COVER)
C := ;; d := 0;
repeat
d := d + 1;
i := BEST COLUMN(C);
C := C [ Hi ;
until bi = Pminj_j:
After termination of Procedure 1, the value d is the
dimension of Z, and C denes a mapping h : Y ! Z
with dim(Z) = d.
In Procedure 1, BEST COLUMN(C) computes
the matrix column i with the largest value bi with
respect to the partial column cover computed so far.
Since the number of columns of a matrix M grows
exponentially with the number of primary circuit outputs, an exhaustive enumeration of all matrix columns
and the computation of the respective weights bi is not
feasable for circuits with a large number of outputs.
The computation of a matrix column i with the
largest weight bi is equivalent to the problem of determining a point Hi with the maximum value bi for the
objective function in a discrete search space with 2n
elements
The problem to be solved is a classical combinatorial optimization problem for which ecient algorithms are known. For nding the best matrix column
we use an algorithm that is derived from a threshold
accepting algorithm [2]. This algorithm is listed below.
Procedure 2 (BEST COLUMN(C))
choose a set Hi (initial set);
choose an initial threshold T > 0;
repeat
repeat
choose a neighbor Hj of Hi ;
if bi ? bj < T then Hi := Hj ;
until a long time no increase of bi
or too many iterations;
T := T t, (0 < t < 1);
until no change of bi anymore.
H13
0.80
0.80
0.80
0.80
0.80
4.00
H14
0.80
0.80
0.80
0.80
0.79
3.99
H15
0.66
0.80
0.80
0.80
0.79
3.85
5 Experimental Results
In this section we present the results for the OSC
using the algorithm developed in section 4. Experiments were performed on the ISCAS-85 benchmark
circuits [1] to study the eectiveness of the algorithm
for varying values of Pmin.
For the computation of a matrix column corresponding to an actually chosen set of XOR-trees the
circuit with these XOR-trees and the original circuit
were fault simulated for a given set of 100 faults and
a given number of random patterns. The simulation
was performed with the fault simulator COMSIM [5].
The values of Pdetect were obtained by dividing the
corresponding fault detection frequencies.
Table 4 shows the results of the OSC algorithm for
the benchmark circuits for dierent values of Pmin.
Since for circuits with a large number of primary inputs it is not feasible to simulate the circuit for every input pattern we have to choose a sample of random patterns. Therefore, the values Pdetect obtained
through the simulation are estimations P^detect in reality. In the experiments, for each circuit we used sets
of random patterns of uniform size 105.
Fig. 3 illustrates the working mechanism of the
proposed algorithm for the circuit c432 and Pmin =
0:95. The plotted values of Pdetect correspond to the
temporary values obtained when exiting the inner loop
in Procedure 2.
circuit
# XOR-trees
name #PO > 0 0.7 0.8 0.9 0.95 0.99 1.0
c432
7
1 2 3 5 5 6 7
c499 32 2 2 3 3 4 5 6
c880 26 2 2 2 3 4 5 8
c1355 32 1 1 2 2 3 4 5
c1908 25 1 1 2 3 4 3 4
c2670 140 2 2 3 3 4 5 5
c3540 22 1 2 3 4 5 7 14
c5315 123 1 2 2 3 4 5 6
c6288 32 1 2 3 4 5 6 9
c7552 108 2 2 2 3 3 4 5
Table 4. OSC results for ISCAS-85 circuits.
# faults
Pdetect
0.95
0.80
1600
1400
1200
Pmin > 0
1000
Pmin = 0.7
0.60
Pmin = 0.8
800
error detection probability
0.40
0
100
Pmin = 0.9
600
Pmin = 0.95
Pmin = 0.99
200
400
0.20
Pmin = 1.00
200
circuit c1355
treshold value
0.00
XOR-tree 1 XOR-tree 2
XOR-tree 3
XOR-tree 4
XOR-tree 5
0
0.0
0.2
0.4
0.6
0.8
1.0
error detection probability
Fig. 3. Working mechanism of the algorithm.
Fig. 4. Error detection for all single stuck-at faults.
It is interesting to observe that the overall error
detection probability grows with increasing values of
Pmin. This eect can be seen in Fig. 4. The diagram shows that the number of faults for which errors
are detected with probability P is in general higher
for higher values of Pmin . Large improvements are
achieved if the number of XOR-trees increases. The
diagram shows the overall error detection probability
for the circuit c1355 for varying values of Pmin . Another interesting fact is that for Pmin = 1:0 (5 XORtrees) there are 1438 faults (of 1566 testable nonequivalent stuck-at faults in c1355) for which each error is
detected with the set of 105 random patterns, although
the compression function was computed to detect each
error caused by a set of only 100 faults. The errors
of 98% of all (testable nonequivalent) single stuck-at
faults are still detected with probability P 0:95.
This example shows that high output space compression ratios can be achieved even for very high values Pmin and large fault sets.
the ISCAS-85 benchmark circuits. The results show
that high compression ratios can be achieved, even for
high minimum fault detection probabilities and large
fault sets.
6 Conclusions
In this paper we presented a method for designing
concurrent checkers based on a linear compression of
the output space of the circuit. The checker design is
tailored to a given set of target faults. Errors caused
by these faults have to be detected with a probability
that is equal to or above a given bound. To keep
the complexity of the checker low, the main objective
of the method is the minimization of the number of
outputs of the space compressor.
The design method is based on the solution of a set
covering problem for which a simple greedy strategy
was combined with a threshold accepting algorithm.
The eectiveness of the algorithm was studied on
References
[1] F. Brglez and H. Fujiwara: A Neutral Netlist of
10 Combinational Benchmark Circuits and a Target Translator in Fortran, Proc. IEEE Symp. on
Circuits and Systems, June 1985, pp. 663-698.
[2] G. Dueck and T. Scheuer: Threshold Accepting: A
General Purpose Optimization Algorithm Appearing Superior to Simulated Annealing, J. of Computational Physics, vol. 90, 1990, pp. 161-175.
[3] E. Fujiwara, N. Mutoh, and K. Matsuoka: A SelfTesting Group-Parity Prediction Checker and Its
Use for Built-In Testing, IEEE Trans. Comput.,
vol. C-33, no. 6, June 1984, pp. 578-583.
[4] E. Fujiwara and K. Matsuoka: A Self-Checking
Generalized Prediction Checker and Its Use for
Built-In Testing, IEEE Trans. Comput., vol. 36,
no. 1, Jan. 1987, pp. 86-93.
[5] U. Mahlstedt and J. Alt: Simulation of nonclassical Faults on the Gate Level | The Fault
Simulator COMSIM, Proc. IEEE Int. Test Conf.,
Baltimore, MD, Oct. 1993, pp. 883-892.
[6] E.S. Sogomonyan and M. Goessel: Design of SelfTesting and On-Line Fault Detection Combinational Circuits with Weakly Independent Outputs,
J. of Electronic Testing, vol. 4, pp. 267-281, 1993.
[7] J.F. Wakerly: Partially Self-Checking Circuits and
Their Use in Performing Logical Operations, IEEE
Trans. Comput., vol. C-23, no. 7, July 1974, pp.
658-666.