How to Do Tie-breaking in Prioritization of Interaction Test

How to Do Tie-breaking in Prioritization of Interaction Test Suites?
Rubing Huang*, Jinfu Chen
School of Computer Science and Telecommunication Engineering
Jiangsu University
Zhenjiang, Jiangsu 212013, P.R.China
Rongcun Wang, Deng Chen
School of Computer Science and Technology
Huazhong University of Science and Technology
Wuhan, Hubei 430074, P.R.China
as earlier as possible [1]. The process of determining the
order of test cases in T is generally called test case prioritization. A prioritized interaction test suite is also called an
interaction test sequence. As shown in Figure 1, the testing
process involved in the dashed box is the traditional CIT,
and an enhanced testing process adds test case prioritization
between test suite construction and test suite execution in
the traditional CIT [2].
To date, many strategies have been proposed to guide the
prioritization of interaction test suite according to different
evaluation measures, for example interaction coverage based
prioritization [2–8] and incremental interaction coverage
based prioritization [9, 10]. However, most of prioritization
strategies may face a challenge1 : During the prioritization
process, there may exist more than one “best” candidate such
that they have the same largest evaluation measure value. In
this case, there is a tie among all “best” candidates. How to
do tie-breaking? In this paper, we investigate different tiebreaking techniques such as random tie-breaking [5, 7, 10],
first-element tie-breaking, last-element tie-breaking, higherstrength tie-breaking, and lower-strength tie-breaking, and
also conduct experiments on the ICBP prioritization algorithm to analyze the effectiveness of tie-breaking techniques, so
as to present a guideline of choosing tie-breaking techniques
for testers in practical testing. The experimental results
indicate that although there is no best tie-breaking technique,
in many cases random tie-breaking and last-element tiebreaking have best performance, so that they would be best
Abstract—The prioritization of interaction test suites has
received more attention in the field of combinatorial interaction
testing, especially when testing resources are limited to allow
the part of combinatorial test cases to be executed. Many
strategies have been proposed to prioritize interaction test
suites according to different evaluation measures. However,
most of these strategies may face a challenge to choose more
than one “best” candidate with the largest evaluation measure
value. In this case, there is a tie among all “best” candidates.
How to do tie-breaking? Intuitively speaking, random tiebreaking could be a reasonable choice, which has also been
applied to many research papers. In this paper, we investigate different tie-breaking techniques including random tiebreaking, first-element tie-breaking, last-element tie-breaking,
higher-strength tie-breaking, and lower-strength tie-breaking,
and also conduct experiments on a well-known prioritization
strategy of interaction test suites, namely interaction coverage
based prioritization, in order to present a guideline of choosing
tie-breaking techniques for testers in practical testing. The
experimental results show that although no tie-breaking technique always performs best, in many cases random tie-breaking
and last-element tie-breaking have best performance, so that
they would be best choices for testers in the prioritization of
interaction test suites.
Keywords-Combinatorial interaction testing, interaction test
suite, test case prioritization, tie-breaking, guideline
I. I NTRODUCTION
Combinatorial interaction testing (CIT) [1] aims at generating an interaction test suite [1], in order to identify failures
that are caused by parameter interactions. Intuitively speaking, combinatorial interaction testing presents a tradeoff
between testing effectiveness and testing efficiency, because
it only focuses on interaction coverage of fixed strengths
(the levels of interaction among parameters) rather than that
of all strengths.
When an interaction test suite T has been already constructed by CIT, traditional CIT would directly run T but
does not consider the execution order of test cases in T .
However, due to limited test resources in practice, only part
of test cases in T could be executed. In such case, the
execution order of test cases in T would be critical for the
whole testing process, because a well-prioritized order of
test case execution may be able to identify failures earlier,
and thus enable fault characterization, diagnosis and revision
1 Some prioritization strategies do not face this challenge, for example
random test case prioritization [6], because it only prioritizes an interaction
test suite at a random manner. In this paper, therefore, we assume that our
study focus on the prioritization strategies with such challenge.
Traditional CIT process
Ă
Generating T
Executing T
Prioritizing T
Figure 1.
*Corresponding author: [email protected].
121
Traditional CIT with test case prioritization [2].
Ă
choices for testers in the prioritization of interaction test
suites.
The remaining part of this paper is organized as follows:
Section 2 simply introduces some background information
about CIT and test case prioritization. Section 3 presents
some related work about our study. Section 4 investigates
various methods that can be used to prioritize arbitrary interaction test suites. Section 5 conducts some empirical studies
to evaluate different prioritization methods investigated in
Section 4. Section 6 analyzes the application range of each
method. Finally, Section 7 concludes and discusses future
work.
B. Test Case Prioritization
Test case prioritization seeks to schedule test cases so that
those with the highest priority, according to some criterion,
are executed earlier in testing than lower priority test cases.
When testing resources are limited or insufficient to execute
all test cases in a test suite, a well-designed execution order
of test cases seems especially significant. A prioritized test
suite is generally called a test sequence. The problem of test
case prioritization is defined as follows [11].
Definition 5 (Test case prioritization). Given a tuple
(T, Ω, f ), where T is a test suite, Ω is the set of all possible
permutations of T , and f is a function from Ω to real
numbers, the test case prioritization problem is to find a
test sequence S ∈ Ω such that:
II. BACKGROUND
In this section, some background information will be
described, including CIT, interaction test suite, and test case
prioritization.
(∀S ′ )(S ′ ∈ Ω)(S ′ ̸= S)[f (S) ≥ f (S ′ )].
(1)
According to Rothermel’s investigations [11], there are
many possible goals of prioritization (that is, different functions of f ). For example, a well-known function, namely
average percentage of faults detected (APFD), is related to
fault detection, which is widely used in regression testing.
The APFD function needs to obtain the fault-detection
capability of each executed test case.
A. CIT and Interaction Test Suite
CIT aims at covering some specific combinations of
parametric values by using an interaction test suite with the
small number of test cases, so as to detect failures that are
triggered by parameter interactions.
To clearly describe some notions, we first model the
test object in CIT. Assume the SUT has k parameters that
constitute a parameter set P = {p1 , p2 , · · · , pk }, and each
parameter pi has some valid values or levels from the
finite set Vi (i = 1, 2, · · · , k). In practice, parameters may
represent any factors that influence the performance of the
SUT, such as components, configuration options, user inputs,
etc. Suppose C be the set of constraints on combinations of
parameter values.
Definition 1 (Test profile). A test profile, denoted as
T P (k, |V1 ||V2 | · · · |Vk |, C ), is the model of the SUT, including k parameters, |Vi | parameter values for each i-th
parameter, and value combination constraints C .
Unless specifically stated, the definitions illustrated in this
paper are based on a T P (k, |V1 ||V2 | · · · |Vk |, C ).
Definition 2 (Covering array). A covering array (CA),
denoted CA(N ; τ, k, |V1 ||V2 | · · · |Vk |), is an N × k matrix,
which satisfies the following properties: (1) each column
i (1 ≤ i ≤ k) contains elements only from the set Vi ; and
(2) the rows of each N × τ sub-matrix cover all possible
τ -tuples (referred to as τ -wise value combinations or τ -wise
value schemas [1]) from the τ columns at least once.
Here, τ is called strength. Since the strength of a covering
array is fixed, a covering array at strength τ is also usually
called a τ -wise covering array.
In a covering array, each row represents a test case; while
each column represents a parameter. Generally speaking,
in the field of combinatorial interaction testing, a covering
array represents an interaction test suite. In this paper,
therefore, we assume that a covering array is equivalent to
an interaction test suite.
III. T IE - BREAKING T ECHNIQUES IN P RIORITIZATION OF
I NTERACTION T EST S UITES
In this section, we investigate different tie-breaking techniques used in the prioritization of interaction test suites. In
order to describe techniques clearly, we apply different tiebreaking techniques to a well-known prioritization strategy
of interaction test suites, namely interaction coverage based
prioritization (in short ICBP). Figure 2 presents the detailed
algorithm of ICBP, from which the “best” element is chosen
from candidates in T as the next test case in S such that it
covers the largest number of τ -wise value combinations that
have not yet covered by S.
Consider an interaction test suite T (its original test
sequence T ′ is obtained according to its generation order),
the prioritization strength τ for algorithm ICBP, that is,
ICBP uses the τ -wise interaction coverage, or uncovered τ wise value combinations distance (in short UVCDτ ) [12] to
guide the prioritization, and an interaction test sequence of T
Input: A τ -wise CA, denoted as T .
Output: An interaction test sequence S of T .
1: Initialize S;
2: while (T is not empty)
3:
Select the “best” element e from T .
4:
Remove e from T ;
5:
Insert e into S;
6: end while
7: return S.
Figure 2.
122
The detailed algorithm description of ICBP.
prioritized by ICBP. We will describe different tie-breaking
techniques as follows.
remaining “best” candidates as the next test case at a random
manner.
Among all tie-breaking techniques, first-element tiebreaking and last-element tie-breaking are deterministic,
which means that given an original interaction test sequence,
each of them could obtain a unique interaction test sequence.
However, other three tie-breaking techniques are nondeterministic, because they may randomly choose an element
from all “best” candidates as the next test case.
A. Random Tie-breaking
Obviously, random tie-breaking is an intuitive technique,
because it randomly breaks the tie among all “best” candidates. For example, two combinatorial test cases tc1 and
tc2 , in T , have the same UVCDτ value, ICBP will randomly
choose one from tc1 and tc2 as the next test case in S. In
other words, tc1 and tc2 have the same probability to be
selected.
IV. E MPIRICAL C ASE S TUDY
In this section, an empirical case study is presented to
analyze the effectiveness of different tie-breaking techniques
in the prioritization of interaction test suites, according to
fault detection. We have designed the empirical study to
answer the following research questions:
RQ1: Among all tie-breaking techniques, which technique is best for the prioritization of interaction test suites?
RQ2: How to choose the tie-breaking technique in practical testing?
B. First-element Tie-breaking
The first-element tie-breaking technique denotes that among all “best” candidates, ICBP would select a candidate
as the next test case such that it occurs in the original test
sequence T ′ first.
C. Last-element Tie-breaking
Similar to the first-element tie-breaking technique, Lastelement Tie-breaking chooses the element as the next test
case such that is occurs in the original test sequence T ′
lastly when facing the challenge of more than one “best”
candidate.
As we know, ICBP uses the fixed strength τ to guide the
prioritization of interaction test suites. When there are two
candidates tc1 and tc2 of which have the largest UVCDτ
value, the strength τ is unavailable to distinguish tc1 and
tc2 . In this case, it is reasonable to consider other strength
to assist ICBP in choosing the next test case. Therefore,
we consider the following two cases: (1) the tie-breaking
technique using the strength higher than τ , namely Higherstrength tie-breaking; and (2) the tie-breaking technique
using the strength lower than τ , namely lower-interactioncoverage tie-breaking. We will describe them in turn.
A. Setup
When the strength τ is unavailable to assist ICBP in
selecting the next test case from the “best” candidates,
the Higher-strength tie-breaking technique uses the strength
(τ + 1) for ICBP to further calculate each “best” candidate,
and then choose one as the next test case in S. It can be noted
that if strength (τ +1) is also unavailable (that is, after using
higher interaction coverage, there still exists more than one
“best” candidate), ICBP will randomly choose an element
from the remaining “best” candidates as the next test case
in S.
We use a medium-sized real-life program from a lexical analyzer system (flex), which is obtained from the
Software Infrastructure Repository (SIR) [13]. This program
involves five versions, and contains 9,581 sin 11,470 uncommented lines of C language code, which are also augmented
with a seeded fault library. In this paper, we have used 34
seeded faults. According to Petke’s investigation [8], the
test profile of program flex is T P (9, 26 32 51 , C ), where
C ̸= ∅.
The original covering arrays were generated by two
widely-used tools: Advanced Combinatorial Testing System
(ACTS) [14]; and Pairwise Independent Combinatorial Testing (PICT) [15], both of which are supported by greedy algorithms, and respectively implemented by the In-ParameterOrder (IPO) method and the one-test-at-a-time approach. We
focused on covering arrays with strength τ = 2, 3, 4, 5. Since
some tie-breaking techniques involve randomization such as
random tie-breaking, we ran the experiment 100 times for
each interaction test suite and report the average.
In practical testing, testing resources may be limited to
allow only part of interaction test sequence to be executed.
In this study, therefore, we consider different budgets by
considering different percentages (p) of each interaction test
sequence, for instance, p = 5%, 10%, 25%, 50%, 75%, and
100%.
E. Lower-strength Tie-breaking
B. Metrics
Compared to the Higher-strength tie-breaking technique,
Lower-strength tie-breaking uses the strength (τ − 1) to further guide the prioritization when strength τ is unavailable.
Similar to Higher-strength tie-breaking, if strength (τ − 1)
is also unavailable, ICBP chooses one element from the
Generally speaking, APFD has been used to evaluate
different prioritization techniques. However, it has two requirements [2, 6, 9, 10], so as to fail to be used as the
evaluation metric. Therefore, in this study we used an
alternative of APFD, namely Normalized APFD (or NAPFD)
D. Higher-strength Tie-breaking
123
2) How to choose the tie-breaking technique in practical
testing? According to the definition of each tie-breaking
technique, higher-strength and lower-strength tie-breaking
techniques are more time-consuming than other tie-breaking
techniques, because they need to count information at higher
strength (or lower strength). As discussed before, random
tie-breaking and last-element tie-breaking would be better
choices according to fault detection rates. As a consequence,
in practical testing when testing resources are sufficient,
testers would choose the higher-strength tie-breaking technique in the prioritization of interaction test suites, because
it needs more prioritization time; when testing resources are
limited, random tie-breaking and last-element tie-breaking
would be better alternatives, because they has less prioritization time. On the other hand, random tie-breaking
and higher-strength tie-breaking are un-deterministic; while
last-element tie-breaking is deterministic. Therefore, when
testers need to compare their methods with deterministic
algorithms, last-element tie-breaking is a better choice; otherwise, random tie-breaking or higher-strength tie-breaking
would be better.
[6], in order to evaluate the fault detection rate for each tiebreaking technique.
C. Results and Analysis
Table I presents the NAPFD metric values for ICBP
with different tie-breaking techniques when executing the
certain percentage of each interaction test sequence. It can
be noted that the bold datum in this table is largest in each
sub-column. From this table, we can have the following
observations.
1) Among all tie-breaking techniques, which technique is
best for the prioritization of interaction test suites? According to data shown in Table I, it can be clearly seen that
there is no best tie-breaking technique for the prioritization
of interaction test suites. In other words, each tie-breaking
technique performs best in some cases. For example, random
tie-breaking obtains the best NAPFD metric values for
ACTS covering arrays at strengths τ = 2 and τ = 4 when p
is high; first-element tie-breaking performs best for PICT
covering array at strength τ = 5 regardless of p value;
last-element tie-breaking has the best performance when
prioritizing PICT covering array at strength τ = 3; higherstrength tie-breaking has the best rates of fault detection
for ACTS covering arrays at strength τ = 3 when p is low;
and lower-strength tie-breaking behaves best NAPFD metric
values for PICT covering array at τ = 2 when p is high.
However, on the whole, first-element tie-breaking performs worst in many cases. Therefore, first-element tiebreaking is the last choice. Higher-strength tie-breaking
could be a better choice when executing fewer number of
combinatorial test cases in the interaction test sequence. Additionally, random tie-breaking and last-element tie-breaking
could be best in many cases.
D. Threats to Validity
Despite our best efforts, our experiments may face some
threats to validity.
The first threat is the selection of experimental data – in
this paper only a medium-sized subject program has been
used to investigate the effectiveness of different tie-breaking
techniques. Additionally, two tools used to construct interaction test suites are widely used but both of them are
greedy. Finally, we applied different tie-breaking techniques
to only ICBP algorithm. To address this threat, additional
Table I
T HE NAPFD
METRIC
(%)
Method
Random Tie-breaking
First-element Tie-breaking
Last-element Tie-breaking
Higher-strength Tie-breaking
Lower-strength Tie-breaking
Random Tie-breaking
First-element Tie-breaking
Last-element Tie-breaking
Higher-strength Tie-breaking
Lower-strength Tie-breaking
Random Tie-breaking
First-element Tie-breaking
Last-element Tie-breaking
Higher-strength Tie-breaking
Lower-strength Tie-breaking
Random Tie-breaking
First-element Tie-breaking
Last-element Tie-breaking
Higher-strength Tie-breaking
Lower-strength Tie-breaking
FOR
ICBP
Strength
τ =2
τ =3
τ =4
τ =5
WITH DIFFERENT TIE - BREAKING TECHNIQUES FOR SUBJECT PROGRAM F L E X WHEN EXECUTING THE
PERCENTAGE OF INTERACTION TEST SEQUENCE .
p of ACTS Interaction Test Sequence Executed
5%
10%
25%
50%
75%
100%
15.12
31.10
62.96 76.10 80.29
82.35
5.88
21.32
59.56 74.66 79.41
81.70
13.24
25.74
55.15 72.86 78.24
80.83
15.82
31.29
62.15 75.13 79.37
82.26
15.38
31.18 62.83 75.87 80.14
82.24
43.57
62.09
77.66
83.66 85.97
87.88
35.29
57.84
76.84 82.71 84.87
87.25
33.82
57.60
76.47 82.75 85.50
87.70
43.60
62.10
77.66
83.66 85.95
87.95
41.84
61.06 77.39
83.67 86.10
88.02
59.98
74.36
83.96 88.33 90.54
92.72
54.66
70.81
82.22 88.01 90.54
92.94
59.56
72.40
81.80 86.49 88.04
90.84
60.41
74.69
83.95 88.22 90.53
92.79
59.96
74.35 83.67 88.07 90.20
92.48
71.39
80.69 87.50 91.03 93.02
94.67
65.37
76.73
86.44 90.31 91.95
93.98
70.05
81.46 89.18 91.67 93.26
94.96
70.56
80.14
87.28 90.71 92.53
94.32
70.51
80.18 87.40 90.98 92.91
94.63
124
p of PICT Interaction Test Sequence Executed
10%
25%
50%
75%
100%
5%
82.66
14.94 29.69 61.31 75.12 79.62
8.83
21.32 57.60 73.98 78.48
81.62
7.35
19.85 56.62 74.32
79.64
82.75
75.13
79.55
82.49
14.24 28.96 61.12
15.29 30.27 61.68
75.12 79.54
82.57
41.41 60.42 76.65 82.44 84.65
86.18
25.00 47.55 70.96 79.60 82.48
84.32
49.51 65.20 79.60 83.92 86.00
87.35
41.74 60.95 77.06 82.64 84.62
85.95
41.71 60.71 76.96 82.65 84.78
86.28
61.04 73.40
82.97 86.72 88.86
90.10
90.54
61.03 73.16 82.77 86.97 89.31
84.33 87.82 89.92
90.99
62.01 73.04
59.47 72.73 82.86 86.57 88.64
89.89
60.63 73.36 83.05 86.80 88.82
90.00
69.31 79.40 86.58 90.49 92.95
94.71
70.00 79.55 87.42 91.41 94.28
95.72
70.00 79.55
86.11 90.15 92.44
94.24
69.25 79.25 86.67 90.75 93.16
94.85
68.79 78.86 86.34 90.13 92.47
94.33
studies will be conducted in the future using more real-life
programs, more interaction test suite construction tools, and
more prioritization algorithms of interaction test suites.
Another threat is the evaluation of experimental results –
a metric named APFD or NAPFD was used to evaluate the
rates of fault detection for different tie-breaking techniques.
The NAPFD metric is commonly used in the study of test
case prioritization.
[5]
[6]
V. C ONCLUSIONS AND F UTURE W ORK
The prioritization of interaction test suites has been widely
studied in recent years, especially when testing resources are
limited. During the process of prioritization of interaction
test suites, the corresponding algorithm may face a challenge
that there may exist more than one “best” candidate, how to
choose it from them (or do tie-breaking)? In this paper, we
investigate different techniques used to choose the next test
case from “best” candidates, including random tie-breaking,
first-element tie-breaking, last-element tie-breaking, higherstrength tie-breaking, and lower-strength tie-breaking. An
empirical study shows that random tie-breaking and lastelement tie-breaking would be the better choices than other
tie-breaking techniques for testers.
This study only investigates different tie-breaking techniques on the certain prioritization strategy (that is, interaction coverage based prioritization), it is necessary to apply
these tie-breaking techniques to other prioritization strategies
of interaction test suites. We will study it in the future.
[7]
[8]
[9]
[10]
VI. ACKNOWLEDGMENTS
We would like to thank D. R. Kuhn for providing us the
ACTS tool, and the Software-artifact Infrastructure Repository (SIR) [13] which provided the source code and fault
data for the subject program (flex). This work is in part
supported by the National Natural Science Foundation of
China (Grant No. 61202110), the Natural Science Foundation of Jiangsu Province (Grant No. BK2012284), and the
Senior Personnel Scientific Research Foundation of Jiangsu
University (Grant No. 14JDG039).
[11]
[12]
R EFERENCES
[1] C. Nie and H. Leung, “A survey of combinatorial
testing,” ACM Computer Survey, vol. 43, no. 2, pp.
11:1–11:29, January 2011.
[2] R. Huang, J. Chen, Z. Li, R. Wang, and Y. Lu, “Adaptive random prioritization for interaction test suites,”
in Proceedings of the 29th Symposium On Applied
Computing (SAC’14), 2014, pp. 1058–1063.
[3] R. C. Bryce and C. J. Colbourn, “Test prioritization
for pairwise interaction coverage,” in Proceedings of
the 1st International Workshop on Advances in Modelbased Testing (A-MOST’05), 2005, pp. 1–7.
[4] ——, “Prioritized interaction testing for pairwise coverage with seeding and constraints,” Information and
[13]
[14]
[15]
125
Software Technology, vol. 48, no. 10, pp. 960–970,
October 2006.
R. C. Bryce and A. M. Memon, “Test suite prioritization by interaction coverage,” in Proceedings of the
Workshop on Domain Specific Approaches to Software
Test Automation (DoSTA’07), 2007, pp. 1–7.
X. Qu, M. B. Cohen, and K. M. Woolf, “Combinatorial
interaction regression testing: A study of test case
generation and prioritization,” in Proceedings of the
23rd International Conference on Software Maintenance (ICSM’07), 2007, pp. 255–264.
Z. Wang, L. Chen, B. Xu, and Y. Huang, “Costcognizant combinatorial test case prioritization,” International Journal of Software Engineering and Knowledge Engineering, vol. 21, no. 6, pp. 829–854, September 2011.
J. Petke, S. Yoo, M. B. Cohen, and M. Harman,
“Efficiency and early fault detection with lower and
higher strength combinatorial interaction testing,” in
Proceedings of the 12th Joint Meeting on European
Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering (ESEC/FSE’13), 2013, pp. 26–36.
R. Huang, J. Chen, T. Zhang, R. Wang, and Y. Lu,
“Prioritizing variable-strength covering array,” in Proceedings of the IEEE 37th Annual Computer Software
and Applications Conference (COMPSAC’13), 2013,
pp. 502–601.
R. Huang, X. Xie, D. Towey, T. Y. Chen, Y. Lu, and
J. Chen, “Prioritization of combinatorial test cases by
incremental interaction coverage,” International Journal of Software Engineering and Knowledge Engineering, vol. 23, no. 10, pp. 1427–1457, 2014.
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold,
“Prioritizing test cases for regression testing,” IEEE
Transactions on Software Engineering, vol. 27, no. 10,
pp. 929–948, October 2001.
R. Huang, X. Xie, T. Y. Chen, and Y. Lu, “Adaptive
random test case generation for combinatorial testing,” in Proceedings of the IEEE 36th Annual Computer Software and Applications Conference (COMPSAC’12), 2012, pp. 52–61.
H. Do, S. G. Elbaum, and G. Rothermel, “Supporting
controlled experimentation with testing techniques: An
infrastructure and its potential impact,” Empirical Software Engineering, vol. 10, no. 4, pp. 405–435, 2005.
Y. Lei, R. Kacker, D. R. Kuhn, and V. Okun, “Ipog/ipod: Efficient test generation for multi-way software testing,” Software Testing, Verification, and Reliability, vol. 18, no. 3, pp. 125–148, 2008.
J. Czerwonka, “Pairwise testing in real world: Practical
extensions to test case generators,” in Proceedings of
the 24th Pacific Northwest Software Quality Conference (PNSQC’06), 2006, pp. 419–430.