A Predicate-Based Software Testing Strategy

A Predicate-Based Software Testing Strategy*
K. C. Tai1 , M. A. Vouk1, A. Paradkar1, and P. Lu2
Abstract
In this paper, we describe the basic theory underlying BOR (boolean operator) testing and BRO
(boolean and relational operator) test-selection criteria, and discuss experimental evidence that
shows that BOR testing requires a far smaller number of test-cases than some more traditional
strategies, while retaining fault-detection capabilities that are as good as, or better than these
strategies. The approach we describe can be used to develop test cases based on software
specifications, or based on the implementation, that is, it can be used for both functional and
structural testing. We evaluate the BOR strategy with respect to branch testing, traditional causeeffect graph testing, and informal functional and random testing. Two simulation experiments
showed that BOR testing is very effective at detecting faults in predicates. We also performed two
experiments where we applied BOR strategy to software specification cause-effect graphs of a
real-time control system and a set of N-version programs. Our results indicate that BOR testing is
practical, and has detected faults that the previously applied strategies have not.
1. Introduction
There are two basic ways of showing that code is 100% correct — program proving and
exhaustive testing [Adr82, And74, How87]. Neither technique, as yet, is an option when it comes
to major real-life systems. Techniques for proving software correct are not mature enough, while
exhaustive testing is not practical because the costs are prohibitive. To reduce the exhaustive testing
process to a feasible testing process, we must find criteria for choosing representative test-cases
that are comprehensive enough to span the problem domain and provide us with the required
degree of confidence that the software works correctly and reliably. At the same time, this test set
must be small enough so that the testing can be performed within the available schedule and cost
constraints. A common approach to software testing, referred to as predicate testing, is to require
* Research supported in part by an IBM Center for Advanced Studies (Canada) research fellowship, NASA Grant No.
NAG-1-983 and NSF grant CCR-8907807
1Computer Science Department, Box 8206, North Carolina State University, Raleigh, NC 27659-8206, USA
2IBM Canada Ltd., Toronto Lab, M/S: 21/894/844, 844 Don Mills Rd., North York, Ontario, Canada M3C 1V7
1
certain types of tests for each predicate (or condition) in a program or software specification. A
number of predicate testing strategies have been proposed, including branch testing, domain
testing, and others [How87, Bei90].
The field of evaluation of testing strategies is very active and a number of theoretical [e.g., Whi80,
Cla85, Fra88, Kor88, Ham90, Ham92, Zei92, Fra93a] and experimental [e.g., Dur84, Nta84,
Nta88, Voa91, For93, Fra93b] studies are available. Most of the time, the results can not be
generalized, either because of the nature of the experiments, or because of the underlying
assumptions. For example, it has been shown in [Fra93a, Fra93b] that the structural metric of data
flow testing has better coverage and fault exposure properties than the decision metric of branch
testing. But the programs considered in these studies were small, and some of the defects in these
programs were deliberately manipulated to satisfy the constraints of the experiment. [For93]
reports results on the effectiveness of control and data flow testing strategies as applied to a subset
of defects observed in the development of the computer typesetting program TeX. It was shown
that even the simplest of the control flow or data flow testing strategies are effective at exposing a
reasonable number of faults. But it was also reported, that for those faults which could not be
exposed by these strategies, Extremal and Special (ESV) value testing was effective. The task of
generating these special and boundary value test cases is tedious as it involved careful inspection of
the code. On the other hand, Hamlet has put forward arguments which indicate that "in testing for
true reliability clever partition methods may be no better than random testing" [Ham92, Ham90].
In this paper we focus on evaluation of a particular set of predicate-based test generation and
selection criteria, called boolean operator (BOR) and boolean and relational operator (BRO)
testing, that have been proposed by Tai [Tai87, Tai93]. The BOR testing strategy for a predicate
requires a test set to guarantee the detection of boolean operator faults. The BRO testing strategy
for a predicate is to require a set of tests to guarantee the detection of boolean and relational
operator faults, where a relational operator fault refers to an incorrect relational operator. However,
we believe that in practice the strategy is capable of detecting a much broader class of problems
because a number of them may reflect directly or indirectly in the execution of incorrect domains
which is detectable by the predicate-based approaches. We are especially interested in two
properties of these test-selection strategies — their ability to detect faults within software, and their
stability, that is, their ability to consistently and reliably provide this fault detection capability for
different types of faults and across different software application areas.
We first present the basic theory underlying BOR/BRO test-selection criteria, and then discuss
some experiments. The experiments indicate that BOR testing requires a smaller number of test-
2
cases than, for example, informal functional and random testing, while retaining fault-detection
capabilities that are as good as, or better than these strategies. The experiments were, in part,
performed as an exploratory study of the applicability of predicate-based testing with the intent of
extending this work to large scale database and compiler software3.
Section 2 provides an introduction to the predicate testing strategies related to our empirical studies.
In section 3 we discuss the use of BOR selection criteria with specification and implementation
cause-effect graphs. Sections 4 through 7 discuss the results of four experimental studies of the
BOR strategy. Two studies are based on simulations, and two use actual software. Section 8
summarizes this paper and provides conclusions.
2. Predicate Testing
2.1 Definitions
A predicate can be simple or compound. A simple predicate is a boolean variable or a relational
expression, possibly with one or more NOT ("~") operators. A relational expression is of the
form
E <rop> E"
where E and E" are arithmetic expressions and <rop> is one of six possible relational operators:
"<", "<=", "=", "/=" (non-equality), ">", and ">=". (Non-arithmetic expressions, such as
character strings and sets, are not considered.) A compound predicate consists of at least one
binary boolean operator, two or more operands, and possibly NOT operators and parentheses. The
binary boolean operators allowed in a predicate include OR ("|") and AND ("&").
If a predicate is incorrect, then one or more of the following types of faults occur:
(1)
boolean operator fault (incorrect AND/OR operator or missing/extra NOT operator)
(2)
incorrect relational operator
(3)
incorrect parenthesis
(4)
incorrect boolean variable
(5)
incorrect arithmetic expression
(6)
extra binary operator (and operand)
(7)
missing binary operator (and operand)
In this paper, we focus on the detection of faults of types (1)-(3) and a special type of (4).
3Research supported in part by IBM Center for Advanced Studies (Canada)
3
A test set for a predicate C is said to detect the existence of faults in C if an execution of C on at
least one element of this test set produces (executes) an incorrect outcome (branch) of C. A test set
T for C is said to guarantee the detection of a certain type of faults in C if T detects the
existence of such faults in C, provided that C does not contain faults of other types. Assume that
predicate C* has the same set of variables as C and is not equivalent to C. A test set T is said to
distinguish C from C* if C and C* produce different outcomes on at least one element of T.
A test set T for a predicate C is said to satisfy a predicate testing strategy for C if the execution of
C using T satisfies the requirements of this strategy. For two predicate testing strategies S and S",
S (S") is said to be stronger (weaker) than S" (S) if any test set satisfying S for a predicate C
also satisfies S" for C, but not vice versa. S and S" are said to be incomparable if neither of
them is stronger then the other.
If any two test sets T1 and T2, chosen by S, are such that all elements of T1 are successful exactly
when all elements of T2 are successful, then S is reliable. In practice it may be that reliability of a
strategy is not 1, i.e., some test sets chosen by strategy S do not perform, on the same program P,
as well as other test sets chosen by the same strategy. If the criterion S can produce test sets that
uncover all faults in program P, then S is valid for P. An ideal strategy would be reliable and valid
for all P.
2.2 Strategies
In this section we provide an overview of several predicate testing strategies that are related to our
empirical studies. More details can be found in [Tai93].
The predicate
((E1<E2) & (E3>=E4)) | (E5=E6),
where Ei, 1<=i<=6, denotes an arithmetic expression and is shown in Figure 1. We will call it C#
and use it below to illustrate different predicate testing strategies. We distinguish the following
predicate testing strategies:
Branch Testing This strategy requires that both the true and false branches of each predicate be
executed (or covered) at least once.
4
((E1<E2)&(E3>=E4)) | (E5=E6)
E1<E2
t
f
E3>=E4
f
E5=E6
tt
t
t
f
f
Figure 1. Example of a compound expression
Complete Branch Testing For a compound predicate C, this strategy requires that the true
and false branches of every simple or compound operand in C (including C itself) be executed at
least once. Although complete branch testing is stronger than branch testing, it can be satisfied for
a compound predicate by using two tests. The test set {t1, t2} shown below satisfies complete
branch testing for C#.
t1
t2
((E1 < E2) & (E3 >= E4)) | (E5 = E6)
t
t
t
f
f
f
outcome of C#
t
f
In the above table, the values of t1 and t2 are not given. Instead, each of t1 and t2 is specified in
terms of the outcome ("t" for true and "f" for false) of each simple predicate in C#. t1 is a test to
make (E1<E2) true, (E3>=E4) true, and (E5=E6) true. Similarly, t2 is a test to make (E1<E2)
false, (E3>=E4) false, and (E5=E6) false. t1 and t2 are said to satisfy constraints (t,t,t) and
(f,f,f), respectively, for C#. The outcome of C# on t1 (t2) can be determined from C# and the
constraint satisfied by t1 (t2). The constraint set {(t,t,t), (f,f,f)} is said to satisfy complete branch
testing for C#. Note that {(t,t,t), (f,f,f)} cannot distinguish C# from the following predicates:
((E1<E2)
| (E3>=E4))
| (E5=E6),
((E1<E2)
| (E3>=E4))
& (E5=E6),
(~(E1<E2) | (E3>=E4))
& (E5=E6),
((E1<E2)
| ~(E3>=E4)) & (E5=E6), and
5
((E1<E2) & (E3>=E4)) & (E5=E6),
which differ from C# in boolean operators only.
Exhaustive Branch Testing For a compound predicate C, this strategy requires that all
combinations of true and false branches of every simple predicate in C be executed at least once. If
C consists of n, n>0, AND/OR operators, then exhaustive branch testing for C requires 2**(n+1)
constraints and is impractical when n is large.
Boolean Operator Testing (or BOR Testing) A test (constraint) set T for a compound
predicate C is said to be a BOR test (constraint) set if T guarantees the detection of boolean
operator faults in C, including incorrect AND/OR operators and missing or extra NOT operators.
The test set {t3, t4, t5, t6} shown below is a BOR test set for C#:
((E1 < E2) & (E3 >= E4)) | (E5 = E6)
outcome of C#
t3
t
t
f
t
t4
t
f
t
t
t5
t
f
f
f
t6
f
t
f
f
The constraint set {(t,t,f), (t,f,t), (t,f,f), (f,t,f)} is called a BOR constraint set for C#. An
algorithm, called BOR_GEN, for generating a minimum BOR constraint set for a compound
predicate was given in [Tai93]. For a predicate with n, n>0, AND/OR operators, its minimum
BOR constraint set contains at most n+2 constraints. For a compound predicate, BOR testing
requires the coverage of a minimum BOR constraint set for this predicate.
BDD Path Testing For a compound predicate C, this strategy requires that all paths in the BDD
(binary decision diagram) of C be covered at least once. The notion of BDDs has been used in
the representation, verification and testing of boolean functions and logical circuits [Ake78,
Bry86]. The BDD of a compound predicate is equivalent to the compound predicate modified by
replacing "AND" and "OR" with "AND THEN" and "OR ELSE", respectively. (For C1 AND
THEN C2, if C1 is false, then C2 is ignored. For C1 OR ELSE C2, if C1 is true, then C2 is
ignored.) Fig. 1 shows the BDDs of C#. The test set {t7, t8, t9, t10} shown below satisfies BDD
path testing for C#.
6
((E1 < E2) & (E3 >= E4)) | (E5 = E6)
outcome of C#
t7
t
t
*
t
t8
t
f
t
t
t9
t
f
f
f
t10
f
*
f
f
In the above table, "*" denotes "don't care". For example, the constraint (t,t,*) satisfied by t3 is to
makes both (E1<E2) and (E3>=E4) true, with no constraint on E5 and E6. Since "*" denotes
"don't care," the value of ("t"&"*") is uncertain and so is that of ("f"|"*"). The constraint set
{(t,t,*), (t,f,t), (t,f,f), (f,*,f)} does not guarantee to distinguish C# from the following two
predicate
((E1<E2) | (E3>=E4)) & (E5=E6)
~((E1<E2) | (E3>=E4)) | (E5=E6),
which differ from C# in boolean operators only. A constraint generation algorithm for BDD path
testing was given in [Tai93]. For a predicate with n, n>0, AND/OR operators, this testing strategy
requires (n+2) or more, up to O(2**n), constraints. BDD path testing requires at least as many as
tests as BOR testing, but these two testing strategies are incomparable. For C#, {t3, t4, t5, t6)
satisfies the constraint set satisfied by {t7, t8, t9, t10}, but not vice versa.
Elmendorf's Algorithm The notion of cause-effect graphs (CEGs) was developed for system
specification and test generation [Elm73, Mye79]. A CEG consists of causes, effects, and
graphical notations expressing logical relationships among causes and effects. A cause is an input
condition, an effect is an output condition, and logical operators include AND ("^"), OR ("V"),
and NOT ("~"). In [Mye79] a test generation algorithm for cause-effect graphs was described,
which was developed by Elmendorf [Elm73]. Since a cause-effect graph is a collection of
compound predicates, Elmendorf's algorithm can be applied to generate tests for a compound
predicate. For C#, Elmendorf's algorithm generates the test set {t11, ..., t17} shown below.
((E1 < E2) & (E3 >= E4)) | (E5 = E6)
outcome of C#
t11
t
t
f
t
t12
t
f
t
t
t13
f
t
t
t
t14
f
f
t
t
t15
t
f
f
f
t16
f
t
f
f
t17
f
f
f
f
7
It can be shown that for a predicate with n, n>0, AND/OR operators, Elmendorf's algorithm
generates a BOR constraint set, which contains (n+2) or more, up to O(2**n), constraints.
Relational Operator Testing For a relational expression, say (E <rop> E"), this strategy
requires three tests satisfying the following requirements [How82]:
(1) one test makes E > E",
(2) one test makes E < E", and
(3) one test makes E = E".
If <rop> is incorrect and E and E" are correct, then this strategy guarantees the detection of the
incorrect <rop>. Relation operator testing is stronger than branch testing. For a compound
predicate C, relational operator testing for each relational expression in C does not guarantee the
detection of boolean operator faults. The test set {t18, t19, t20} shown below for C# satisfies
relational operator testing for each relational expression in C#.
((E1 < E2) & (E3 >= E4)) | (E5 = E6)
outcome of C#
t18
=
>
=
t
t19
<
<
<
f
t20
>
=
>
f
In the above table, a constraint for a relational expression is "<", "=", or ">", indicating that the
left-hand side of the expression is less than, equal to, or greater than, respectively, the right-hand
side of the expression. For example, the constraint (=,>,=) satisfied by t18 is to makes E1=E2,
E3>E4, and E5=E6. The constraint set {(=,>,=), (<,<,<), (>,=,>)} satisfies relational operator
testing for each relational expression in C#, but it cannot distinguish C# from the following
predicates
((E1<=E2) & (E3>=E4)) | (E5=E6),
((E1= E2) & (E3>=E4)) | (E5=E6),
((E1< E2) & (E3> E4)) | (E5=E6), and
((E1= E2) & (E3> E4)) | (E5=E6),
which differ from C# in relational operators only.
Boolean and Relational Operator Testing (or BRO Testing) A test (constraint) set T for
a compound predicate C is said to be a BRO test (constraint) set for C if T guarantees the detection
of boolean and/or relational operator faults in C, including incorrect AND/OR operators, missing
or extra NOT operators, and incorrect relational operators. The test set {t21, ..., t26} shown
8
below satisfies BRO testing for C# and is called a BRO test set for C#.
((E1 < E2) & (E3 >= E4)) | (E5 = E6)
outcome of C#
t21
<
>
<
t
t22
<
=
<
t
t23
<
<
=
t
t24
<
<
<
f
t25
=
=
>
f
t26
>
=
>
f
The constraint set {(<,>,<), (<,=,<), (<,<,=), (<,<,<), (=,=,>), (>,=,>)} is called a B R O
constraint set for C#. A algorithm, called BRO_GEN for generating a minimum BRO constraint
set for a compound predicate was given in [Tai93]. For a predicate with n, n>0, AND/OR
operators, its minimum BRO constraint set contains at most 2*n+3 constraints. For a compound
predicate, BRO testing requires the coverage of a minimum BRO constraint set for this predicate.
It is important to point out the possible existence of infeasible constraints for a predicate.
Consider the predicate C#. If a constraint for C# can never be covered by any test for C#, it is said
to be an infeasible constraint for C#. More discussion on infeasible constraints can be found in
[Tai93].
3.
Testing Based on Cause-Effect Graphs
The CEG for a software system can be analyzed to determine its completeness and consistency
[Mye79]. Also, the CEG provides a basis for deriving specification-based test cases for the
software system. One approach to test generation is to consider all possible combinations of
causes. This approach, referred to as exhaustive cause testing, is impractical since the number
of tests for a CEG is an exponential function of the number of causes of the CEG. A more practical
test generation algorithm for CEGs is the already mentioned Elmendorf's algorithm.
We will distinguish specification and implementation CEGs. A specification CEG is the CEG
derived on the basis of the software requirements or design specification. Since the specifications
are often not formal, for a particular specification it may be possible to construct many structurally
different, but functionally equivalent, CEGs. On the other hand, an implementation-CEG is the
code (such as C, Pascal, Ada) that implements the specification. The implementation-CEG is very
specific and explicit in terms of predicates and their relationships. However, note that a particular
9
specification, if implemented by independent developers will result in diverse, but functionally
equivalent, implementations and thus in diverse implementation-CEGs.
3.1. Generation of A BOR Test Set for a Cause-Effect Graph
The BOR (boolean operator) testing strategy for a boolean expression requires a set of tests to
guarantee the detection of boolean operator faults, including incorrect AND and OR operators and
missing or extra NOT operators (with the assumption that the boolean expression does not contain
faults of other types). A test set T for a boolean expression B is said to be a BOR test set for B if
T satisfies the BOR testing strategy for B. In [Tai87] we showed that a BOR test set for a boolean
expression is very effective for detecting all types of boolean expression faults, including boolean
operator faults, incorrect boolean variables and parentheses, and their combinations.
The generation of tests for BOR or BRO testing of a predicate is based on constraints for this
predicate. For example, (t,f) and (<,=) are constraints for the predicate ((E1=E2) & (E3/=E4)),
where E1 through E4 are arithmetic expressions. The coverage of (t,f) for this predicate requires a
test making "E1=E2" to be true and "E3/=E4" false, and the coverage of (<,=) for this predicate
requires a test making E1<E2 and E3=E4. A constraint for BOR testing of a predicate consists of
"t" and "f", while a constraint for BRO testing of a predicate consists of "t", "f", ">", "<", and
"=".
A set S of constraints for a predicate C is said to be a BOR (BRO) constraint set for C
provided that if a test set T for C covers each constraint in S at least once, then T satisfies BOR
(BRO) testing of C. For a predicate with n, n>0, AND/OR operators, a minimum BOR (BRO)
constraint set can be easily constructed, which contains at most n+2 (2*n+3) constraints. For
((E1=E2) & (E3/=E4)), its minimum BOR constraint set is {(t,t), (f,t), (t,f)} and one of its
minimum BRO constraint sets is {(=,>), (=,<), (>,>), (<,>), (=,=)}. Experimental results indicate
that BOR and BRO testing are effective for the detection of various types of faults in a predicate
and provide more specific guidance than branch testing for test generation.
In a CEG, an effect node and its associated portion of the CEG denote a compound predicate in
terms of causes. Since a CEG represents a set of compound predicates, the BOR testing strategy
can be applied to generate tests. For the sake of simplicity, each cause node is viewed as a boolean
variable and has "t" or "f" as its value. In [Tai93], algorithm BOR_GEN produces a minimum
BOR constraint set for a compound predicate. Below we describe an adapted version of algorithm
BOR_GEN in order to produce a minimum BOR test set for a CEG.
10
The nodes in a CEG are visited from cause nodes to effect nodes. (Elemndorf's alogrithms does
the opposite.) When a node is visited, a test set for this node and its associated CEG is
constructed. After the visit of an effect node, a test for the CEG associated with the node is
available. Before we show the test generation algorithm, we need to introduce a number of
definitions.
Let N be a node in a CEG. The value of N on test X is denoted as N(X). Assume that S is a test set
for N. S can be divided into two sets S_t and S_f, where S_t(N) = {X in S | N(X) = t} and S_f(N)
= {X in S | N(X) = f}. Let u=(u1 , ..., um ) and v=(v1 ,...,vn ), where m,n >0, be two lists. The
concatenation of u and v, denoted as (u,v), is (u1 ,...,um ,v1 ,...,vn ). Let A and B be two sets.
A$B denotes the union of A and B, A*B the product of A with B, and |A| the size of A. A%B,
called the onto from A to B, is a minimal set of (u,v) such that u and v are in A and B,
respectively, every element in A appears as u at least once, and every element in B appears as v at
least once. Thus, |A%B| is the maximum of |A| and |B|. If both A and B have two or more
elements, A%B has several possible values and returns any one of them. For example, assume that
C={(a),(b)} and D={(c),(d)}. C%D has two possible values: {(a,c),(b,d)} and {(a,d),(b,c)}.
Algorithm BOR_GEN
For a cause node, its minimum BOR test set is {(t), (f)}. Assume that N1 and N2 are cause nodes.
The minimum BOR test set for an AND node with N1 and N2 as inputs is {(t,t), (f,t), (t,f)}. (Note
that (f,f) is not included.) The minimum BOR test set for an OR node with N1 and N2 as inputs
{(f,t), (t,f), (f,f)}. (Note that (t,t) is not included.) The following three rules show how to generate
a BOR test set for a node with one or both its inputs being intermediate nodes.
Assume that N1 and N2 are nodes in a CEG and that S1 and S2 are minimum BOR test sets for N1
and N2 respectively.
(1)
A minimum BOR test set S for an AND node with N1 and N2 as inputs is constructed as
follows:
S_t = S1_t % S2_t
S_f = (S1_f * {t2}) $ ({t1} * S2_f)
where t1 is in S1_t, t2 is in S2_t, and (t1,t2) is in S_t.
(2)
A minimum BOR test set S' for an OR node with N1 and N2 as inputs is constructed as
follows:
S'_f = S1_f % S2_f
S'_t = (S1_t * {f2}) $ ({f1} * S2_t)
11
(3)
where f1 is in S1_f, f2 is in S2_f, and (f1,f2) is in S'_f.
A minimum BOR test set S" for a NOT node with N1 as the input is constructed as
follows:
S"_t = S1_f
S"_f = S1_t
Figure 2 shows a CEG with nodes N1 through N4 denoting causes, nodes N5 and N6
intermediate nodes, and nodes N7 and N8 effects.
N7
N8
AND
OR
N5
N6
OR
N1
AND
N3
N2
N4
Fig. 2 A Cause-Effect Graph
Now we show how to apply the above algorithm to node N7 and its associated CEG in Figure 2,
which denote to the compound predicate ((N1|N2) & (N3&N4)). For 1<=i<=7, let Si be a
minimum BOR test set for node Ni.
For 1<=i<=4, Si_t = {(t)} and Si_f = {(f)}.
By applying rule (2) to node N5, we have
S5_t = {(t,f), (f,t)} and S5_f = {(f,f)}.
By applying rule (1) to node N6, we have
S6_t = {(t,t)} and S6_f = {(t,f), (f,t)}.
By applying rule (1) to node N7, we have
S7_t = {(t,f,t,t), (f,t,t,t)} and
S7_f = {(f,f,t,t), (t,f,t,f), (t,f,f,t)} or
{(f,f,t,t), (f,t,t,f), (f,t,f,t)}.
Algorithm BOR_GEN produces two minimum BOR test sets for node N7 in Figure 2, which are
12
referred to as N7_1 and N7_2. Each of N7_1 and N7_2 contains five tests. In section 2, we
applied algorithm CEG_Myers to produce a test set for node N7 in Fig. 1, which contains seven
tests and is referred to as N7_3. Below are N7_1, N7_2, and N7_3, each in one column, with
each row showing the same test. N7_2 is a subset of N7_3.
N7_1
--------(t,f,t,t)
(f,t,t,t)
(f,f,t,t)
(f,t,t,f)
(f,t,f,t)
N7_2
--------(t,f,t,t)
(f,t,t,t)
(f,f,t,t)
N7_3
--------(t,f,t,t)
(f,t,t,t)
(f,f,t,t)
(t,f,t,f)
(t,f,f,t)
(t,f,t,f)
(t,f,f,t)
(t,f,f,f)
(f,f,f,f)
In a CEG, an AND or OR node may have three or more inputs. Rules (1) and (2) above can be
easily extended for such nodes. Assume that an AND node N has nodes N1, N2, and N3 as
inputs, that is, N is (N1 AND N2 AND N3), which is equivalent to ((N1 AND N2) AND N3). By
applying rule (1) twice to node N, we have {(t,t,t), (f,t,t), (t,f,t), (t,t,f)} as its only minimum BOR
test set. (Elmendorf's Algorithm requires these four tests as well as (f,f,t), (t,f,f), (f,t,f), (f,f,f) for
the "t" and "f" outputs of node N.) Similarly, the only minimum BOR test set for (N1 OR N2 OR
N3) is {(f,f,f), (t,f,f), (f,t,f), (f,f,t)}. (Elmendorf's requires the same four tests for the "t" and "f"
outputs of node N.)
The experiment discussed in the next section shows that algorithm BOR_GEN is almost as
effective as Elmendorf's algorithm for detection of a class of predicate faults, and that the average
number of tests produced by algorithm BOR_GEN is about one-half of that by Elmendorf's
algorithm.
Definition. A cause-effect graph G is said to be a singular cause-effect graph (or singular
CEG, or SBE) if for every pair of cause node N and effect node N' in G, there exists at most one
path between N to N'.
Consider the application of algorithm BOR_GEN to the non-singular graph shown in Figure 3.
13
N7
AND
N5
N6
OR
AND
N3
N1
N4
Fig. 3 A Non-Singular Cause-Effect Graph
As shown earlier for node N7 in Fig. 1, we have
S5_t = {(t,f), (f,t)} and S5_f = {(f,f)}, and
S6_t = {(t,t)} and S6_f = {(t,f), (f,t)}.
By applying rule (1) to node N7, we have two tests for S7_t, one is the merge of (f,t) for N5 and
(t,t) for N6 and the other is the merge of (t,f) for N5 and (t,t) for N6. Since N3 is both the second
input of N5 and the first input of N6, the first merge results in (f,t,t) for N7. The second merge,
however, fails since N3 has conflicting values in the two tests to be merged. Because of the
conflict, no test is resulted from the second merge. Hence, S7_t contains only (f,t,t). Note that
algorithm BOR_GEN needs to be modified slightly in order to handle conflicts when tests are
merged. Whenever a conflict occurs during the merge of two or more tests, no test is resulted from
the merge. Thus, we have the following theorem.
Theorem 1: Algorithm BOR_GEN produces a minimum BOR test set for a singular cause-effect
graph.
3.2. Measurement of BOR Coverage of a Cause-Effect Graph by a Test Set
Assume that a test set T for a CEG G already exists. We want to determine whether T is a BOR test
set for G, and if not, we want to construct a minimum set of additional tests in order to satisfy the
BOR testing strategy.
One intuitive solution is to apply algorithm BOR_GEN to generate a minimum BOR test set, say
T*, for G and then determine the BOR coverage of G by T as the coverage of T* by T. This
solution, however, is not desirable. If G has two or more minimum BOR test sets, the choice of
14
T* may significantly affect the coverage of T* by T. Furthermore, it possible that the use of a nonminimum BOR test set for G as T* gives T a higher BOR coverage. To illustrate this, consider the
three BOR test sets N7_1, N7_2, and N7_3 for node N7 in Figure 2. Both N7_1 and N7_2 have
five tests and N7_3 has seven tests. For each of N7_1, N7_2, and N7_3 as T*, below we show
the coverage of T* by each of the other two test sets.
T*
--------------N7_1
N7_2
N7_2
N7_3
N7_3
T
--------------N7_2 or N7_3
N7_1
N7_3
N7_1
N7_2
coverage
-------3/5
3/5
5/5
3/7
5/7
The first author of this paper has developed an algorithm† , which determines whether a test set T
for a CEG G is a BOR test set, and if not, produces a minimum set T' of additional tests needed
for satisfying the BOR testing strategy. Let m and n be the sizes of T and T', respectively. The
BOR coverage of G by T is defined as n/(m+n). This algorithm also detects redundant tests in
T, which can be eliminated without any impact on the BOR coverage of G by T.
4.
A Comparison between Algorithm BOR_GEN and Elmendorf's
Algorithm by Using Boolean Expressions
For a predicate C with n, n>0, AND/OR operators algorithm BOR_GEN produces a minimum
BOR constraint set, say St, and Elmendorf's algorithm generates a BOR constraint set, say Se. As
mentioned in section 2, |St| <= n+2 and |Se| >= n+2. (Note that St is not necessarily a subset of Se
since C may have several minimum BOR constraint sets.) Intuitively, Se is more effective than St
for detecting faults in C. We want to compare the sizes of Se and St as well as the effectiveness of
Se and St.
We conducted an experiment to compare algorithms BOR_GEN and Elmendorf's algorithm using
SBEs in order to focus on the detection of boolean operator faults, incorrect parentheses, and
interchanges of boolean variables4. A constraint set for a boolean expression is called a test set
since each constraint is actually a test. A set, called SBE_4, of fifty-one non-equivalent SBEs with
four boolean variables was constructed. For each boolean expression B in SBE_4,
(1)
algorithm BOR_GEN was applied to generate a test set, say Tb, and
(2)
we determined the size of Tb and the fault detection rate of Tb, which is defined as Db/50,
† This algorithm is being reviewed for patent application.
4Note that a BOR constraint set for a predicate only guarantees the detection of boolean operator faults.
15
where Db is the number of the remaining boolean expressions in SBE_4 that can be
distinguished by Tb.
Then the average size of Tb and the average fault detection rate of Tb for boolean expressions in
SBE_4 were computed. Elmendorf's algorithm was applied similarly. Below we present, for each
of algorithm BOR_GEN and Elmendorf's algorithm, the average size of a test set for a boolean
expression in SBE_4 and the average fault detection rate of such a test set.
size
fault detection
BOR_GEN
4.9
99.73%
Elmendorf's
9.2
100.00%
The above results show that
(1)
Elmendorf's algorithm almost guarantees the detection of boolean operator faults, incorrect
parentheses, and interchanges of boolean variables. (The 100% fault detection rate does not
hold for any SBE.).
(2)
Algorithm BOR_GEN is almost as effective as Elmendorf's algorithm for fault detection.
(3)
The average number of tests produced by algorithm BOR_GEN is about one-half of that by
Elmendorf's algorithm.
(4)
A BOR constraint set for a predicate is very sensitive to the existence of boolean operator
faults and incorrect parentheses in the predicate.
5.
A Comparison between BOR, BDD Path, and Branch Testing by
Using Boolean Expressions
For a predicate with n, n>0, AND/OR operators, branch testing requires 2 tests, BOR testing n+2
or less tests, and BDD path testing n+2 or more, up to O(2**n), tests. BOR and BDD path testing
are stronger than branch testing and incomparable with each other.
An empirical study was conducted to compare the effectiveness of these three predicate testing
strategies. The following sets of SBEs were constructed:
(a)
a set of 48 mutually non-equivalent SBEs with three variables,
(b)
a set of 366 mutually non-equivalent SBEs with four variables, and
(c)
a set of 2624 mutually non-equivalent SBEs with five variables.
In each of (a), (b), and (c), these SBEs differ from each other in boolean operators and/or
parentheses. The following table shows, for each of (a), (b), and (c), the average rates of fault
detection by using BOR, BDD path, and branch testing, respectively. (For branch testing of a
16
boolean expression, two tests were chosen to satisfy complete branch testing.)
(a)
(b)
(c)
______________________________________________________________
BOR testing
99.3%
99.7%
99.9%
BDD path testing
90.1%
90.4%
91.9%
branch testing
72.2%
72.5%
72.9%
Our results show that
(1)
Branch testing is far less effective than BOR or BDD path testing for detection of faults in
SBEs.
(2)
However, BOR testing performs better than the more complex BDD path testing. Although
BDD path testing requires at least as many tests as BOR testing, its use of "don't care"
reduces its effectiveness for fault detection. One advantage of BDD path testing, however,
is that the construction of a BDD for a predicate and the selection of a constraint set from
this BDD can be easily done by hand.
(3)
Similar to the experiment reported in the previous section, BOR testing almost guarantees
the detection of boolean operator faults and incorrect parentheses.
6.
Case Study: A Boiler Control and Monitoring System
In this sub-section, we describe an application of BOR testing to control software for a simplified
real-time boiler control and monitoring system. The specifications were developed as part of the the
generic problem exercise conducted for the 1993 International Workshop on the Design and
Review of Software Controlled Safety-Related Systems [IRR93]. A version of the boiler control
and monitoring system was developed at North Carolina State University [Vou93a]. This software
was used in our empirical study. A brief description of the boiler system is given below. The
motivation, objective, and procedure for our empirical study are described in section 6.1, and the
details are provided in the sections that follow.
Figure 4 shows the context diagram for the boiler control and monitoring system. This simplified
boiler system consists of a natural-gas fired water-tube boiler producing saturated steam. The
steam flow may vary rapidly and irregularly between zero and maximum, following a varying
external demand. The water level in the boiler is regulated by the control of the inflow of
feedwater. The water level must be kept between an upper and lower limits. If the water level is
above the upper limit, water will be carried over into the steam flow and cause damage. If the water
17
level is below the lower limit, boiler tubes will dry out and may overheat and burst. If the control
of water level is lost, the boiler is shut down.
Figure 4 Context Diagram for Boiler Control and Monitoring System
The water level and the steam flow are measured by an instrumentation system, which reports
sensor values. The readings from sensors are transmitted over an intrinsically unreliable
communication link to the control program. This control program is expected to perform the
following tasks:
(a)
to regulate the water level by controlling the inflow of feedwater by appropriately turning
pumps on, or off, at required instances,
(b)
to diagnose and isolate all the potential errors and issue a correction/repair request if any are
discovered,
(c)
to display at all times "best estimates" of various readings for the boiler operator, and
(d)
to accept any appropriate operator commands.
6.1 Motivation, Objectives, and Procedures
During the development of the boiler system at NCSU, the original, informal, specification of the
18
system [IRR93] was re-written in terms of a number of extended finite-state machines5 (EFSMs),
and test suites for the unit, integration and system testing of the boiler system were constructed
according to the boiler's EFSM specification [Par93]. The EFSM specification-based test suites
were then developed to ensure thorough testing of the boiler system. In addition to the coverage of
every state and branch of individual EFSMs, great effort was made to construct additional test
cases to cover special event situations. However, no well defined strategies were used for testing
combinations of EFSM predicates, that is combinations of states.
The objective of the study was
i)
to evaluate the previously developed test suites for the boiler system against the BOR
testing criterion and then, if additional test cases are needed, determine whether such test
cases provide any benefits, and
ii)
to evaluate the application of BOR strategy to specification CEGs in terms of the
effectiveness for fault-detection, and in terms of the number of test cases needed to satisfy
the strategy in comparison with more informal EFSM testing.
We considered both specification- and program-based BOR testing of the boiler system. Since the
complete boiler system has about 4,500 lines of C code and there was no automatic tool available
for measuring the BOR coverage of the C code6, we chose experiment only with the most critical
effect, the "boiler shutdown" effect, in the boiler system for which we derived a specification
cause-effect graph (CEG) and analyzed its implementation (code) CEG (see section 6.2). From the
EFSM test cases developed previously, we selected those related to the shutdown effect. The
selected test set, referred to as the shutdown test set, contained 372 test cases. We then
measured the BOR coverage that this set offered for the specification-CEG (see section 6.3) and
for the implementation-CEG of one module, or C program (see section 6.4). The implementation
module has about 360 C statements and deals with the shutdown effect.7
6.2 Derivation of a Cause-Effect Graph for the Shutdown Effect
The specification-CEG for the shutdown effect, referred to as the shutdown CEG, is organized
in five levels. The level 1 (the highest level) specification-CEG for boiler shutdown is shown in
Figure 5. The annotations for nodes in level 1 CEG are given below.
5An extended FSM is a FSM with the use of variables and predicates.
6A semi-automatic tool for analysis of Pascal BOR and BRO coverage is available.
7Note that the selected module is not the only module dealing with the shutdown effect.
19
Figure 5 Level 1 Specification Cause-Effect Graph for Boiler Control and Monitoring System
E
C221
C220
C202
C203
C201
C200
C197
C198
C196
C195
C194
C193
C192
-
boiler shutdown
externally initiated
internally initiated
operator initiated
instrumentation system initiated
bad startup
operational failure
confirmed keystroke entry
confirmed "shutnow" message
multiple pumps failure (more than one)
water level meter failure during startup
steam rate meter failure during startup
communication link failure
instrumentation system failure
20
C191
C190
C180
C181
-
C180 and C181
water level out of range
water level meter failure during operation
steam rate meter failure during operation
The cause nodes of the level 1 specification-CEG, including C180, C181, C190, and C192
through C198, are effect nodes of level 2 CEGs. Similarly, some of the cause nodes of level 2
CEGs are effect nodes of level 3 CEGs, and so on. CEGs of level 2 through 5 are not shown in
this paper.
6.3 Measurement of BOR Coverage of the Shutdown Cause-Effect Graph
When we attempted to measure the BOR coverage of the shutdown CEG by the shutdown test set,
we encountered a problem. Although algorithm BOR_GEN generates a minimum BOR constraint
set for a compound predicate, such a minimum BOR constraint set, is not unique. As a result, the
selection of a minimum BOR constraint set for a compound predicate may have impact on the BOR
coverage of the predicate by a given test set. To solve this problem, a new algorithm called
BOR_COV was developed.
For a given test set T and a predicate C, algorithm BOR_COV
(1)
identifies the set T' of redundant tests in T, which can be removed from T without any
impact on the BOR coverage of C by T.
(2)
produces a minimum set T" of additional constraints that are needed for BOR testing of C,
(3)
computes the BOR coverage of C by T, which is defined as
(|T|ּ-ּ|T'|)
.
(|T|ּ-ּ|T'|ּ+ּ|T"|)
We used BOR_COV algorithm to measure the BOR coverage of the shutdown CEG by the
shutdown test set. Of the 372 tests in the shutdown test set, 59 tests (about 1/6 of the total) were
found to be redundant. Also, 24 more constraints are needed for BOR testing. So the BOR
coverage of the shutdown CEG by the shutdown test set is (372-59)/(372-59+24) = 0.928. Most
of the redundant tests deal with pump/flow monitor combinations. However, most of the additional
tests needed for BOR testing also deal with pump/flow monitor combinations. The reason is that
the tests in the shutdown test set for combinations of pump and flow monitors were selected in an
ad hoc fashion. The additional tests needed for 100% BOR coverage of the shutdown CEG
detected a bug in a module in the boiler's implementation (see section 6.4).
21
The shutdown test set was constructed earlier by three persons with a total of approximately 100
person-hours. In our study, the shutdown CEG was constructed by one person in about 20 hours,
and CEG-based test-case generation can be automated. Also, CEGs can be analyzed for the
detection of ambiguities and inconsistencies in system specification [Mye79, Bei90]. It is obvious,
that use of CEGs for software specification and test generation has significant advantages.
6.4 Measurement of BOR Coverage of a Module in the Boiler's Implementation
As mentioned earlier, we chose one module in the boiler control software implementation to sample
the BOR coverage. The selected module deals with the shutdown effect and contains 360
C-language statements and 34 predicates, of which 21 are simple predicates (that is, there are no
AND/OR operators), and the remainder are compound predicates with one AND/OR operator. We
manually transformed this module for the measurement of BOR coverage, and generated 81 BOR
constraints for the predicates of this module. (Two constraints are needed for each simple
predicate, and three constraints for each compound predicate with one AND/OR operator.)
The shutdown test set was used to execute the selected module8. Two of the 81 constraints were
not covered by the shutdown test set. So the BOR coverage of the selected module by the
shutdown test set is 79/81 = 0.975. When the two uncovered constraints were further investigated,
a bug in the selected module was discovered. This bug would have been uncovered had the
selected module been tested with 100% BOR coverage.
7.
Case Study: BOR Testing of a N-Version Program Set
In this section, we report an empirical study in which we applied BOR testing to a set of N-version
programs 9 written in Pascal. Our reason for using N-version programs is that they provide
information about the behavior of a population of functionally equivalent versions written to the
same specification, and thus provide an insight in to the self-consistency and validity of the
strategy. A tool called BGG, developed at North Carolina State University to measure the test
coverage of statements, branches, and various types of data flow metrics for Pascal programs
[Vou89], was extended to generate BOR and BRO constraint sets for predicates in a Pascal
program, and to measure the coverage of BOR and BRO constraints in a Pascal program according
8Remember that only a portion of the shutdown test set is meant for the selected module.
9 N-version programming is a technique for increasing software reliability, is to develop multiple versions of a
software system for a given specification and then execute these versions at the same time to compare their results.
22
to a given test set.
Six functionally equivalent programs were produced as part of another study [Vou86a, Vou86b].
Programs solve a navigational problem, an extended version of the "Earth Satellite Problem" used
by Nagle et al. [Nag82]. They were written independently by graduate level students to the same
specification. The programming language was Pascal. The size of the Pascal programs ranged
from 400 to 800 Pascal statements. Acceptance testing of the developed software involved both
random and special functional test cases. "White-box", or structural, testing was not part of the
acceptance testing. All acceptance test data were generated on the basis of functional problem
specifications. Special consideration was given to extremal and special values, explicit or implicit,
in the specification (boundaries, singularities, etc.), as well as in the known problem solution
algorithms. Random data were generated using a uniform distribution for all input parameters.
Failures observed in the components are described in [Vou86b]. For technical reasons, only five of
those program were used in the current experiment.
7.1
A Comparison of BRO Testing and Branch Testing
The purpose of this part of the experiment was to evaluate functional and random testing with
respect to the coverage they provide for different predicate metrics. This experiment also
established a baseline for evaluation of CEG-based test strategies. Each of the five programs (using
individual drivers) was tested by two different test sets, one with 1,000 randomly selected tests the
other with 103 manually selected functional tests (i.e., special value functional tests). For each of
these five programs, the coverages of p-uses [Fra88, Wey88], BRO constraints, branches and
statements was measured.
The results are summarized in Figures 6 to 9. The programs are marked L1 through L5. The
figures show the BRO and branch code coverage achieved by functional and random
specification-based test cases. We see the usual metric saturation behavior characteristic of single
strategy approach, and one other interesting phenomenon — considerable variance in the measured
coverages.
23
1.0
1.0
0.9
0.9
Random Testing - Branch Coverage
Random Testing - BRO Coverage
0.8
Coverage
0.7
Coverage
L4
0.8
L4
L1
0.6
0.5
L2
0.7
0.6
0.5
L5
L3
0.4
L5
L1
0.4
0.3
L2
L3
0.3
0.2
1
10
0.2
1000
100
1
10
Number of Test Cases
1000
100
Number of Test Cases
Figures 6 and 7. BRO and branch coverage provided by random testing.
1.0
1.0
L2
L4
L5
0.9
0.9
L3
L4
0.8
0.8
L1
0.7
Coverage
Coverage
L3
0.7
L5
0.6
L2
0.5
0.4
L1
0.6
0.5
0.4
Functional Testing - Branch Coverage
Functional Testing - BRO Coverage
0.3
0.3
0.2
0
20
40
60
80
100
Number of Test Cases
0.2
0
20
40
60
80
100
Number of Test Cases
Figures 8 and 9. BRO and branch coverage provided by functional (ESV) testing.
The design and implementation transformations of the single requirements specifications that all
versions used, resulted in diverse but functionally equivalent programs, and, as would be
24
expected, these programs have different implementation CEGs. From the graphs, we see that this
reflects as considerable variability in the coverage achieved by a particular test-set over a population
of functionally equivalent programs — for example, random testing results in BRO coverage that
differs by as much as 20% between L2 and L4 versions. Similar variance in coverage metrics has
been seen in experiments with other software [Vou93b]. This indicates that any type of reliability
modeling that is based on coverage metrics will have to account for this variability. For instance,
an implication is that a test evaluation of strategy based solely on achievement of partial coverage
(e.g., 90% of branches) may have no practical meaning in terms of fault detection guarantees and
reliability, unless it is transformed into a canonical form that is valid across the population.
The following table summarizes coverage achieved for a typical program. The table also shows an
entry for the BOR testing experiment discussed in the next sub-section.
Method
Random
ESV
CEG-BOR
Statement
Coverage
0.848
0.960
0.963
Branch
Coverage
0.634
0.896
0.896
p-Use
coverage
0.608
0.872
0.881
BRO coverage
0.48
0.69
Our results show that
(1)
The coverages of BRO constraints and branches for a program increase very rapidly by the
first ten random tests or the first twenty functional tests. Afterwards, these coverages
increase very slowly. In fact, the last 900 tests of the 1000 random tests provide
insignificant increase of coverage.
(2)
Functional testing provides 20% or more coverage of BRO constraints or branches than
random testing. This result indicates that although random testing is easy to do, it must be
supplemented with functional testing.
(3)
For the same test set for a program, its coverage of BRO constraints is about 15-20% less
than that of branches. This result provides some indication of relative "strength" between
BRO and branch testing.
7.2
An Evaluation of BOR Testing
We used the same set of programs to evaluate the test selection strategy we will call CEG-BOR.
We built the CEG using the requirements specifications for the N-version system and then selected
test cases using the BOR approach. Satisfaction of the BOR criterion required generation of 43 test
case. Full CEG coverage would have required 65 test cases. Some of the test cases dealing with
precision in the software could not be derived from CEG of the specification alone, but the
25
requirements were implicitly mentioned and thus they were deliberately worked into the CEG.
(This is the subjective/educated development part of the CEG and accounts for possible diversity in
specification-CEGs). The coverage provided by the 43 CEG-BOR test cases was practically
identical to that provided by the functional test suite of 103 test cases (see the table in sub-section
7.1). Since CEG-BOR required about half as many test cases, the specification level use of BOR
test selection may have considerable advantages in terms of cost.
1.0
L4
L2
0.9
L3
L5
Coverage
0.8
L1
0.7
0.6
0.5
CEG-BOR Testing - Branch Coverage
0.4
0
10
20
30
40
50
Number of Test Cases
Figure 10. Branch coverage provided by specification-based CEG-BOR strategy.
Figures 10 and 11 illustrate the observed branch and p-use coverage dynamics, respectively. It is
interesting to note how diverse the coverage is across the five versions. But even more interesting
is how deceptive metric counting strategies can be. For example, program L1 branch coverage is
quite low compared to other version for all three testing strategies, but its p-use and BRO coverage
are among the best. The reason is that L1 program contains a lot of debug conditionals which are
either set to TRUE or FALSE at compile time, and thus have not been accounted for in our p-use
or BRO counts. It is also interesting to note that the p-use metric shows the least variance because
the total number of p-uses is typically several larger than that of branches or BROs.
26
1.0
L1
L4
0.8
Coverage
L2
L5
L3
0.6
0.4
CEG-BOR Testing - P-Use Coverage
0.2
0
10
20
30
40
50
Number of Test Cases
Figure 11. BRO coverage provided by specification-based CEG-BOR strategy.
Of course, unless the fault-detection power of the CEG-BOR approach at least matches that of the
more informal strategies, the CEG-BOR strategy would not be very useful in practice. In our
experiments the CEG-BOR data set detected all known faults in all five programs. In fact, similar
to the boiler experiment, CEG-BOR generated test cases excited a new error in 4 of the 5
programs. This error was not detected by either the random or the functional test suites. In
addition, in the case of program L2 and L3 neither functional testing alone nor random testing
alone detected all other faults found in these programs. The experiment specific fault-detection
dynamics is illustrated in Figure 12 for program L2 data. It shows the plot of the total number of
faults detected by the CEG-BOR alone, functional testing alone and random testing alone versus
the number of test cases run (in the order they were generated and used in the experiments).
However, the combination of functional and random testing detected all but one fault [Vouk86b].
Similar results were obtained for other programs.
27
Number of Detected Unique Faults
10
Total Number of Known Faults
9
CEG-BOR Testing
8
Fault Detection
Program L2
7
Dynamics
Functional
6
Testing
5
4
3
2
Random
Testing
1
0
1
10
100
1000
Test Cases
Figure 12 Fault detection dynamics for program L2.
8. Summary
The BOR strategy can be used at all levels of granularity where a cause-effect graph can be
constructed. This means that in practice it can be used to develop test cases based on software
specifications, as well as on the code structure. Our simulation experiments indicate that the BOR
approach can detect predicate faults nearly as well as the Elmendorf's algorithm can, and better
than branch testing. In addition, BOR generated test suites are about half the size of those
generated using Elmendorf's algorithm.
The preliminary experiments with actual software indicate that BOR based test sets, when applied
to software specifications, may have fault detection properties which are at least comparable to
those of thorough, but less formal, functional and random testing. For example the shutdown test
set based on limited EFSM testing strategy provides 92.8% BOR coverage of the shutdown CEG
28
and 97.5% BOR coverage of the selected module in the boiler's implementation and failed to detect
one program fault. Of the 372 tests in the EFSM shutdown test set, 59 tests were found to be
redundant, but 24 more constraints were needed for BOR testing. So the BOR coverage of the
shutdown CEG required 337 test cases as opposed to 372 that were generated using the EFSM
approach. In the case of the N-version software CEG-BOR detected more faults than combined
functional and random acceptance tests did. Fulfillment of CEG-BOR criteria required only 43 test
cases. In contrast, the functional test suite had 103 cases.
Future plans include statistical experiments to confirm the fault exposure properties of CEG-BOR
strategy, and experiments with real large-scale database and compiler software.
References
[Adr82]
[Ake78]
[Anr74]
[Bei90]
[Bry86]
[Cla85]
[Dur84]
[Elm73]
[For93]
[Fra88]
[Fra93a]
[Fra93b]
[Ham90]
[Ham92]
[How82]
[How87]
[IRR93]
[Kor88]
W.R. Adrion, M.A. Branstad and J.C. Cherniavsky, "Validation, verification, and
testing of computer software", ACM Computing Survey, Vol 14(2), 159-192, 1982.
Akers, S. B., "Binary decision diagrams", IEEE Trans. Computers, Vol. C-27, No.
6, June 1978, 509-516.
R. Anderson, "Proving programs correct", J. Wiley, 1974
B. Beizer, Software Testing Techniques, 2nd edition, Van Nostrand, 1990.
Bryant, R. E., Graph-based algorithms for boolean function manipulation", IEEE
Trans. Computers, C-35, No. 8, August 1986, 677-691.
L.A. Clarke, A. Podgurski, D. Richardson, and S. Zeil, "A comparison of data flow
path selection criteria," in Proc. 8th ICSE, pp 244-251, 1985.
J.W. Duran and S.C. Ntafos, "An Evaluation of Random Testing," IEEE Trans.
Software Eng., vol. SE-10, pp. 438-444, 1984.
Elmendorf, W. R., "Cause-effect graphs in functional testing", TR-00.2487, IBM
Systems Development Division, Poughkeepsie, NY, 1973.
Foreman L. M., and Zweben S. H., "A study of the effectiveness of control and data
flow testing strategies", J. Systems Software, No. 21, 1993, 215-228.
P.G. Frankl and E.J. Weyuker, "An applicable family of data flow testing criteria,"
IEEE Trans. Soft. Eng., Vol. 14 (10), pp 1483-1498, 1988.
Frankl, P. G., and Weyuker, E. J., "A formal analysis of the fault-detecting ability of
testing methods", IEEE TSE, Vol. 19, No. 3, Mar. 1993, 202-213.
Frankl, P. G., and Weiss S. N., "An experimental comparison of the effectiveness of
branch testing and data flow testing", IEEE TSE, Vol. 19, No. 8, Aug. 1993, 774787.
Hamlet, D., and Taylor, R., "Partition testing does not inspire confidence", IEEE
Trans. Software Engineering, Vol. 16, No. 12, Dec. 1990, 1402-1411.
Hamlet, D., "Are we testing for true reliability", IEEE Software, Jul. 1992, 21-27.
Howden, W. E., "Weak mutation testing and completeness of test cases," IEEE
Trans. Software Engineering, Vol. SE-8, No. 4, July 1982, 371-379.
W.E. Howden, "Functional Program Testing and Analysis", McGraw-Hill Book
Co., 1987.
Inst. for Risk Research, "Generic problem competition", International Symposium
Design and Review of Software Controlled Safety-Related Systems, Waterloo,
Canada, May 1993.
B. Koren and J. Laski, "STAD - A system for testing and debugging: user
perspective," Proc. Second Workshop on Software Testing, Verification, and
29
[Mye79]
[Nag82]
[Nta84]
[Nta88]
[Par93]
[Tai87]
[Tai93]
[Voa91]
[Voa93]
[Vou86a]
[Vou86b]
[Vou89]
[Vou93a]
[Vou93b]
[Wey88]
[Whi80]
[Zei92]
Analysis, Banff, Canada, Computer Society Press, pp 13 - 20, 1988.
Myers, G. J., The Art of Software Testing, Wiley, 1979.
Nagel, P. M., and Skrivan J. A., "Software reliability: Repetitive run
experimentation and modelling", BSC-40336, Boeing, Seattle, Wa., 1982.
S.C. Ntafos, "On Required Element Testing", IEEE Trans. Soft. Eng., Vol. SE-10,
pp 793-803, 1984.
S.C. Ntafos, "A Comparison of Some Structural Testing Strategies", IEEE Trans.
Soft. Eng., Vol. SE-14 (6), pp 868-874, 1988.
A. Paradkar, I. Shields, and J. Waters, "The NCSU Solution to the Generic Problem
Exercise: Boiler Control and Monitoring System,", NCSU, May 1993.
Tai, K. C., and Su, H. K., "Test generation for boolean expressions," Proc.
COMPSAC (Computer Software and Applications) '87, 1987, 278-283.
Tai, K. C., "Predicate-based test generation for computer programs", Proc. Inter.
Conf. on Software Engineering, May 1993, 267-276.
Voas, J. M., "Preliminary observations on program testability", Proc. Pacific
Northwest Quality Conf., PNQC, Portland, 1991, 235-247.
Voas, J. M., and Miller, K. W., "Improving the software development process using
testability research", Proc. Third International Symposium on Software Reliability
Engineering, 1993, 114-121.
M.A. Vouk, D.F. McAllister, and K.C. Tai, "An Experimental Evaluation of the
Effectiveness of Random Testing of Fault-tolerant Software", Proc. Workshop on
Software Testing, Banff, Canada, IEEE CS Press, 74-81, July 1986.
Vouk, M. A., Helsabeck, M. L., McAllister, D. F, and Tai, K. C., "On testing of
functionally equivalent components of fault-tolerant software", Proc. COMPSAC
(Computer Software and Applications) '86, Oct. 1986, 414-419.
Vouk, M. A., Coyle, R. E., "BGG: A testing coverage tool " Proc. 7th Northwest
Software Quality Conference, 1989, 212-233
Vouk, M. A., and Paradkar, A., "Design and Review of Software Controlled
Safety-Related Systems: The NCSU Experience With the Generic Problem Exercise,"
Proc. Inter. Invitational Workshop on the Design and Review of Software Controlled
Safety-Related Systems, Ottawa, June 1993.
M.A. Vouk, "Fault and Failure Analysis," In Proc. of Third Workshop on Issues in
Software Reliability Workshop Program, U.S. West Advanced Technologies,
Boulder Colorado, November 1-2, 1993, 39 pages.
E.J. Weyuker, "An empirical study of the complexity of data flow testing", Proc.
Second Workshop on Software Testing, Verification, and Analysis, Banff, Canada,
Computer Society Press, pp 188-195, 1988.
White, L. J., and Cohen, E. I., "A domain strategy for computer program testing,"
IEEE Trans. Software Engineering, Vol. SE-6, May 1980, 247-257.
S.J. Zeil, F.H. Afifi, and L.J. White, "Detection of Linear Errors via Domain
Testing," ACM Trans. Soft. Eng., Vol 1(4), pp. 422-451, October 1992.
30