How to Do Tie-breaking in Prioritization of Interaction Test Suites? Rubing Huang*, Jinfu Chen School of Computer Science and Telecommunication Engineering Jiangsu University Zhenjiang, Jiangsu 212013, P.R.China Rongcun Wang, Deng Chen School of Computer Science and Technology Huazhong University of Science and Technology Wuhan, Hubei 430074, P.R.China as earlier as possible [1]. The process of determining the order of test cases in T is generally called test case prioritization. A prioritized interaction test suite is also called an interaction test sequence. As shown in Figure 1, the testing process involved in the dashed box is the traditional CIT, and an enhanced testing process adds test case prioritization between test suite construction and test suite execution in the traditional CIT [2]. To date, many strategies have been proposed to guide the prioritization of interaction test suite according to different evaluation measures, for example interaction coverage based prioritization [2–8] and incremental interaction coverage based prioritization [9, 10]. However, most of prioritization strategies may face a challenge1 : During the prioritization process, there may exist more than one “best” candidate such that they have the same largest evaluation measure value. In this case, there is a tie among all “best” candidates. How to do tie-breaking? In this paper, we investigate different tiebreaking techniques such as random tie-breaking [5, 7, 10], first-element tie-breaking, last-element tie-breaking, higherstrength tie-breaking, and lower-strength tie-breaking, and also conduct experiments on the ICBP prioritization algorithm to analyze the effectiveness of tie-breaking techniques, so as to present a guideline of choosing tie-breaking techniques for testers in practical testing. The experimental results indicate that although there is no best tie-breaking technique, in many cases random tie-breaking and last-element tiebreaking have best performance, so that they would be best Abstract—The prioritization of interaction test suites has received more attention in the field of combinatorial interaction testing, especially when testing resources are limited to allow the part of combinatorial test cases to be executed. Many strategies have been proposed to prioritize interaction test suites according to different evaluation measures. However, most of these strategies may face a challenge to choose more than one “best” candidate with the largest evaluation measure value. In this case, there is a tie among all “best” candidates. How to do tie-breaking? Intuitively speaking, random tiebreaking could be a reasonable choice, which has also been applied to many research papers. In this paper, we investigate different tie-breaking techniques including random tiebreaking, first-element tie-breaking, last-element tie-breaking, higher-strength tie-breaking, and lower-strength tie-breaking, and also conduct experiments on a well-known prioritization strategy of interaction test suites, namely interaction coverage based prioritization, in order to present a guideline of choosing tie-breaking techniques for testers in practical testing. The experimental results show that although no tie-breaking technique always performs best, in many cases random tie-breaking and last-element tie-breaking have best performance, so that they would be best choices for testers in the prioritization of interaction test suites. Keywords-Combinatorial interaction testing, interaction test suite, test case prioritization, tie-breaking, guideline I. I NTRODUCTION Combinatorial interaction testing (CIT) [1] aims at generating an interaction test suite [1], in order to identify failures that are caused by parameter interactions. Intuitively speaking, combinatorial interaction testing presents a tradeoff between testing effectiveness and testing efficiency, because it only focuses on interaction coverage of fixed strengths (the levels of interaction among parameters) rather than that of all strengths. When an interaction test suite T has been already constructed by CIT, traditional CIT would directly run T but does not consider the execution order of test cases in T . However, due to limited test resources in practice, only part of test cases in T could be executed. In such case, the execution order of test cases in T would be critical for the whole testing process, because a well-prioritized order of test case execution may be able to identify failures earlier, and thus enable fault characterization, diagnosis and revision 1 Some prioritization strategies do not face this challenge, for example random test case prioritization [6], because it only prioritizes an interaction test suite at a random manner. In this paper, therefore, we assume that our study focus on the prioritization strategies with such challenge. Traditional CIT process Ă Generating T Executing T Prioritizing T Figure 1. *Corresponding author: [email protected]. 121 Traditional CIT with test case prioritization [2]. Ă choices for testers in the prioritization of interaction test suites. The remaining part of this paper is organized as follows: Section 2 simply introduces some background information about CIT and test case prioritization. Section 3 presents some related work about our study. Section 4 investigates various methods that can be used to prioritize arbitrary interaction test suites. Section 5 conducts some empirical studies to evaluate different prioritization methods investigated in Section 4. Section 6 analyzes the application range of each method. Finally, Section 7 concludes and discusses future work. B. Test Case Prioritization Test case prioritization seeks to schedule test cases so that those with the highest priority, according to some criterion, are executed earlier in testing than lower priority test cases. When testing resources are limited or insufficient to execute all test cases in a test suite, a well-designed execution order of test cases seems especially significant. A prioritized test suite is generally called a test sequence. The problem of test case prioritization is defined as follows [11]. Definition 5 (Test case prioritization). Given a tuple (T, Ω, f ), where T is a test suite, Ω is the set of all possible permutations of T , and f is a function from Ω to real numbers, the test case prioritization problem is to find a test sequence S ∈ Ω such that: II. BACKGROUND In this section, some background information will be described, including CIT, interaction test suite, and test case prioritization. (∀S ′ )(S ′ ∈ Ω)(S ′ ̸= S)[f (S) ≥ f (S ′ )]. (1) According to Rothermel’s investigations [11], there are many possible goals of prioritization (that is, different functions of f ). For example, a well-known function, namely average percentage of faults detected (APFD), is related to fault detection, which is widely used in regression testing. The APFD function needs to obtain the fault-detection capability of each executed test case. A. CIT and Interaction Test Suite CIT aims at covering some specific combinations of parametric values by using an interaction test suite with the small number of test cases, so as to detect failures that are triggered by parameter interactions. To clearly describe some notions, we first model the test object in CIT. Assume the SUT has k parameters that constitute a parameter set P = {p1 , p2 , · · · , pk }, and each parameter pi has some valid values or levels from the finite set Vi (i = 1, 2, · · · , k). In practice, parameters may represent any factors that influence the performance of the SUT, such as components, configuration options, user inputs, etc. Suppose C be the set of constraints on combinations of parameter values. Definition 1 (Test profile). A test profile, denoted as T P (k, |V1 ||V2 | · · · |Vk |, C ), is the model of the SUT, including k parameters, |Vi | parameter values for each i-th parameter, and value combination constraints C . Unless specifically stated, the definitions illustrated in this paper are based on a T P (k, |V1 ||V2 | · · · |Vk |, C ). Definition 2 (Covering array). A covering array (CA), denoted CA(N ; τ, k, |V1 ||V2 | · · · |Vk |), is an N × k matrix, which satisfies the following properties: (1) each column i (1 ≤ i ≤ k) contains elements only from the set Vi ; and (2) the rows of each N × τ sub-matrix cover all possible τ -tuples (referred to as τ -wise value combinations or τ -wise value schemas [1]) from the τ columns at least once. Here, τ is called strength. Since the strength of a covering array is fixed, a covering array at strength τ is also usually called a τ -wise covering array. In a covering array, each row represents a test case; while each column represents a parameter. Generally speaking, in the field of combinatorial interaction testing, a covering array represents an interaction test suite. In this paper, therefore, we assume that a covering array is equivalent to an interaction test suite. III. T IE - BREAKING T ECHNIQUES IN P RIORITIZATION OF I NTERACTION T EST S UITES In this section, we investigate different tie-breaking techniques used in the prioritization of interaction test suites. In order to describe techniques clearly, we apply different tiebreaking techniques to a well-known prioritization strategy of interaction test suites, namely interaction coverage based prioritization (in short ICBP). Figure 2 presents the detailed algorithm of ICBP, from which the “best” element is chosen from candidates in T as the next test case in S such that it covers the largest number of τ -wise value combinations that have not yet covered by S. Consider an interaction test suite T (its original test sequence T ′ is obtained according to its generation order), the prioritization strength τ for algorithm ICBP, that is, ICBP uses the τ -wise interaction coverage, or uncovered τ wise value combinations distance (in short UVCDτ ) [12] to guide the prioritization, and an interaction test sequence of T Input: A τ -wise CA, denoted as T . Output: An interaction test sequence S of T . 1: Initialize S; 2: while (T is not empty) 3: Select the “best” element e from T . 4: Remove e from T ; 5: Insert e into S; 6: end while 7: return S. Figure 2. 122 The detailed algorithm description of ICBP. prioritized by ICBP. We will describe different tie-breaking techniques as follows. remaining “best” candidates as the next test case at a random manner. Among all tie-breaking techniques, first-element tiebreaking and last-element tie-breaking are deterministic, which means that given an original interaction test sequence, each of them could obtain a unique interaction test sequence. However, other three tie-breaking techniques are nondeterministic, because they may randomly choose an element from all “best” candidates as the next test case. A. Random Tie-breaking Obviously, random tie-breaking is an intuitive technique, because it randomly breaks the tie among all “best” candidates. For example, two combinatorial test cases tc1 and tc2 , in T , have the same UVCDτ value, ICBP will randomly choose one from tc1 and tc2 as the next test case in S. In other words, tc1 and tc2 have the same probability to be selected. IV. E MPIRICAL C ASE S TUDY In this section, an empirical case study is presented to analyze the effectiveness of different tie-breaking techniques in the prioritization of interaction test suites, according to fault detection. We have designed the empirical study to answer the following research questions: RQ1: Among all tie-breaking techniques, which technique is best for the prioritization of interaction test suites? RQ2: How to choose the tie-breaking technique in practical testing? B. First-element Tie-breaking The first-element tie-breaking technique denotes that among all “best” candidates, ICBP would select a candidate as the next test case such that it occurs in the original test sequence T ′ first. C. Last-element Tie-breaking Similar to the first-element tie-breaking technique, Lastelement Tie-breaking chooses the element as the next test case such that is occurs in the original test sequence T ′ lastly when facing the challenge of more than one “best” candidate. As we know, ICBP uses the fixed strength τ to guide the prioritization of interaction test suites. When there are two candidates tc1 and tc2 of which have the largest UVCDτ value, the strength τ is unavailable to distinguish tc1 and tc2 . In this case, it is reasonable to consider other strength to assist ICBP in choosing the next test case. Therefore, we consider the following two cases: (1) the tie-breaking technique using the strength higher than τ , namely Higherstrength tie-breaking; and (2) the tie-breaking technique using the strength lower than τ , namely lower-interactioncoverage tie-breaking. We will describe them in turn. A. Setup When the strength τ is unavailable to assist ICBP in selecting the next test case from the “best” candidates, the Higher-strength tie-breaking technique uses the strength (τ + 1) for ICBP to further calculate each “best” candidate, and then choose one as the next test case in S. It can be noted that if strength (τ +1) is also unavailable (that is, after using higher interaction coverage, there still exists more than one “best” candidate), ICBP will randomly choose an element from the remaining “best” candidates as the next test case in S. We use a medium-sized real-life program from a lexical analyzer system (flex), which is obtained from the Software Infrastructure Repository (SIR) [13]. This program involves five versions, and contains 9,581 sin 11,470 uncommented lines of C language code, which are also augmented with a seeded fault library. In this paper, we have used 34 seeded faults. According to Petke’s investigation [8], the test profile of program flex is T P (9, 26 32 51 , C ), where C ̸= ∅. The original covering arrays were generated by two widely-used tools: Advanced Combinatorial Testing System (ACTS) [14]; and Pairwise Independent Combinatorial Testing (PICT) [15], both of which are supported by greedy algorithms, and respectively implemented by the In-ParameterOrder (IPO) method and the one-test-at-a-time approach. We focused on covering arrays with strength τ = 2, 3, 4, 5. Since some tie-breaking techniques involve randomization such as random tie-breaking, we ran the experiment 100 times for each interaction test suite and report the average. In practical testing, testing resources may be limited to allow only part of interaction test sequence to be executed. In this study, therefore, we consider different budgets by considering different percentages (p) of each interaction test sequence, for instance, p = 5%, 10%, 25%, 50%, 75%, and 100%. E. Lower-strength Tie-breaking B. Metrics Compared to the Higher-strength tie-breaking technique, Lower-strength tie-breaking uses the strength (τ − 1) to further guide the prioritization when strength τ is unavailable. Similar to Higher-strength tie-breaking, if strength (τ − 1) is also unavailable, ICBP chooses one element from the Generally speaking, APFD has been used to evaluate different prioritization techniques. However, it has two requirements [2, 6, 9, 10], so as to fail to be used as the evaluation metric. Therefore, in this study we used an alternative of APFD, namely Normalized APFD (or NAPFD) D. Higher-strength Tie-breaking 123 2) How to choose the tie-breaking technique in practical testing? According to the definition of each tie-breaking technique, higher-strength and lower-strength tie-breaking techniques are more time-consuming than other tie-breaking techniques, because they need to count information at higher strength (or lower strength). As discussed before, random tie-breaking and last-element tie-breaking would be better choices according to fault detection rates. As a consequence, in practical testing when testing resources are sufficient, testers would choose the higher-strength tie-breaking technique in the prioritization of interaction test suites, because it needs more prioritization time; when testing resources are limited, random tie-breaking and last-element tie-breaking would be better alternatives, because they has less prioritization time. On the other hand, random tie-breaking and higher-strength tie-breaking are un-deterministic; while last-element tie-breaking is deterministic. Therefore, when testers need to compare their methods with deterministic algorithms, last-element tie-breaking is a better choice; otherwise, random tie-breaking or higher-strength tie-breaking would be better. [6], in order to evaluate the fault detection rate for each tiebreaking technique. C. Results and Analysis Table I presents the NAPFD metric values for ICBP with different tie-breaking techniques when executing the certain percentage of each interaction test sequence. It can be noted that the bold datum in this table is largest in each sub-column. From this table, we can have the following observations. 1) Among all tie-breaking techniques, which technique is best for the prioritization of interaction test suites? According to data shown in Table I, it can be clearly seen that there is no best tie-breaking technique for the prioritization of interaction test suites. In other words, each tie-breaking technique performs best in some cases. For example, random tie-breaking obtains the best NAPFD metric values for ACTS covering arrays at strengths τ = 2 and τ = 4 when p is high; first-element tie-breaking performs best for PICT covering array at strength τ = 5 regardless of p value; last-element tie-breaking has the best performance when prioritizing PICT covering array at strength τ = 3; higherstrength tie-breaking has the best rates of fault detection for ACTS covering arrays at strength τ = 3 when p is low; and lower-strength tie-breaking behaves best NAPFD metric values for PICT covering array at τ = 2 when p is high. However, on the whole, first-element tie-breaking performs worst in many cases. Therefore, first-element tiebreaking is the last choice. Higher-strength tie-breaking could be a better choice when executing fewer number of combinatorial test cases in the interaction test sequence. Additionally, random tie-breaking and last-element tie-breaking could be best in many cases. D. Threats to Validity Despite our best efforts, our experiments may face some threats to validity. The first threat is the selection of experimental data – in this paper only a medium-sized subject program has been used to investigate the effectiveness of different tie-breaking techniques. Additionally, two tools used to construct interaction test suites are widely used but both of them are greedy. Finally, we applied different tie-breaking techniques to only ICBP algorithm. To address this threat, additional Table I T HE NAPFD METRIC (%) Method Random Tie-breaking First-element Tie-breaking Last-element Tie-breaking Higher-strength Tie-breaking Lower-strength Tie-breaking Random Tie-breaking First-element Tie-breaking Last-element Tie-breaking Higher-strength Tie-breaking Lower-strength Tie-breaking Random Tie-breaking First-element Tie-breaking Last-element Tie-breaking Higher-strength Tie-breaking Lower-strength Tie-breaking Random Tie-breaking First-element Tie-breaking Last-element Tie-breaking Higher-strength Tie-breaking Lower-strength Tie-breaking FOR ICBP Strength τ =2 τ =3 τ =4 τ =5 WITH DIFFERENT TIE - BREAKING TECHNIQUES FOR SUBJECT PROGRAM F L E X WHEN EXECUTING THE PERCENTAGE OF INTERACTION TEST SEQUENCE . p of ACTS Interaction Test Sequence Executed 5% 10% 25% 50% 75% 100% 15.12 31.10 62.96 76.10 80.29 82.35 5.88 21.32 59.56 74.66 79.41 81.70 13.24 25.74 55.15 72.86 78.24 80.83 15.82 31.29 62.15 75.13 79.37 82.26 15.38 31.18 62.83 75.87 80.14 82.24 43.57 62.09 77.66 83.66 85.97 87.88 35.29 57.84 76.84 82.71 84.87 87.25 33.82 57.60 76.47 82.75 85.50 87.70 43.60 62.10 77.66 83.66 85.95 87.95 41.84 61.06 77.39 83.67 86.10 88.02 59.98 74.36 83.96 88.33 90.54 92.72 54.66 70.81 82.22 88.01 90.54 92.94 59.56 72.40 81.80 86.49 88.04 90.84 60.41 74.69 83.95 88.22 90.53 92.79 59.96 74.35 83.67 88.07 90.20 92.48 71.39 80.69 87.50 91.03 93.02 94.67 65.37 76.73 86.44 90.31 91.95 93.98 70.05 81.46 89.18 91.67 93.26 94.96 70.56 80.14 87.28 90.71 92.53 94.32 70.51 80.18 87.40 90.98 92.91 94.63 124 p of PICT Interaction Test Sequence Executed 10% 25% 50% 75% 100% 5% 82.66 14.94 29.69 61.31 75.12 79.62 8.83 21.32 57.60 73.98 78.48 81.62 7.35 19.85 56.62 74.32 79.64 82.75 75.13 79.55 82.49 14.24 28.96 61.12 15.29 30.27 61.68 75.12 79.54 82.57 41.41 60.42 76.65 82.44 84.65 86.18 25.00 47.55 70.96 79.60 82.48 84.32 49.51 65.20 79.60 83.92 86.00 87.35 41.74 60.95 77.06 82.64 84.62 85.95 41.71 60.71 76.96 82.65 84.78 86.28 61.04 73.40 82.97 86.72 88.86 90.10 90.54 61.03 73.16 82.77 86.97 89.31 84.33 87.82 89.92 90.99 62.01 73.04 59.47 72.73 82.86 86.57 88.64 89.89 60.63 73.36 83.05 86.80 88.82 90.00 69.31 79.40 86.58 90.49 92.95 94.71 70.00 79.55 87.42 91.41 94.28 95.72 70.00 79.55 86.11 90.15 92.44 94.24 69.25 79.25 86.67 90.75 93.16 94.85 68.79 78.86 86.34 90.13 92.47 94.33 studies will be conducted in the future using more real-life programs, more interaction test suite construction tools, and more prioritization algorithms of interaction test suites. Another threat is the evaluation of experimental results – a metric named APFD or NAPFD was used to evaluate the rates of fault detection for different tie-breaking techniques. The NAPFD metric is commonly used in the study of test case prioritization. [5] [6] V. C ONCLUSIONS AND F UTURE W ORK The prioritization of interaction test suites has been widely studied in recent years, especially when testing resources are limited. During the process of prioritization of interaction test suites, the corresponding algorithm may face a challenge that there may exist more than one “best” candidate, how to choose it from them (or do tie-breaking)? In this paper, we investigate different techniques used to choose the next test case from “best” candidates, including random tie-breaking, first-element tie-breaking, last-element tie-breaking, higherstrength tie-breaking, and lower-strength tie-breaking. An empirical study shows that random tie-breaking and lastelement tie-breaking would be the better choices than other tie-breaking techniques for testers. This study only investigates different tie-breaking techniques on the certain prioritization strategy (that is, interaction coverage based prioritization), it is necessary to apply these tie-breaking techniques to other prioritization strategies of interaction test suites. We will study it in the future. [7] [8] [9] [10] VI. ACKNOWLEDGMENTS We would like to thank D. R. Kuhn for providing us the ACTS tool, and the Software-artifact Infrastructure Repository (SIR) [13] which provided the source code and fault data for the subject program (flex). This work is in part supported by the National Natural Science Foundation of China (Grant No. 61202110), the Natural Science Foundation of Jiangsu Province (Grant No. BK2012284), and the Senior Personnel Scientific Research Foundation of Jiangsu University (Grant No. 14JDG039). [11] [12] R EFERENCES [1] C. Nie and H. Leung, “A survey of combinatorial testing,” ACM Computer Survey, vol. 43, no. 2, pp. 11:1–11:29, January 2011. [2] R. Huang, J. Chen, Z. Li, R. Wang, and Y. Lu, “Adaptive random prioritization for interaction test suites,” in Proceedings of the 29th Symposium On Applied Computing (SAC’14), 2014, pp. 1058–1063. [3] R. C. Bryce and C. J. Colbourn, “Test prioritization for pairwise interaction coverage,” in Proceedings of the 1st International Workshop on Advances in Modelbased Testing (A-MOST’05), 2005, pp. 1–7. [4] ——, “Prioritized interaction testing for pairwise coverage with seeding and constraints,” Information and [13] [14] [15] 125 Software Technology, vol. 48, no. 10, pp. 960–970, October 2006. R. C. Bryce and A. M. Memon, “Test suite prioritization by interaction coverage,” in Proceedings of the Workshop on Domain Specific Approaches to Software Test Automation (DoSTA’07), 2007, pp. 1–7. X. Qu, M. B. Cohen, and K. M. Woolf, “Combinatorial interaction regression testing: A study of test case generation and prioritization,” in Proceedings of the 23rd International Conference on Software Maintenance (ICSM’07), 2007, pp. 255–264. Z. Wang, L. Chen, B. Xu, and Y. Huang, “Costcognizant combinatorial test case prioritization,” International Journal of Software Engineering and Knowledge Engineering, vol. 21, no. 6, pp. 829–854, September 2011. J. Petke, S. Yoo, M. B. Cohen, and M. Harman, “Efficiency and early fault detection with lower and higher strength combinatorial interaction testing,” in Proceedings of the 12th Joint Meeting on European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’13), 2013, pp. 26–36. R. Huang, J. Chen, T. Zhang, R. Wang, and Y. Lu, “Prioritizing variable-strength covering array,” in Proceedings of the IEEE 37th Annual Computer Software and Applications Conference (COMPSAC’13), 2013, pp. 502–601. R. Huang, X. Xie, D. Towey, T. Y. Chen, Y. Lu, and J. Chen, “Prioritization of combinatorial test cases by incremental interaction coverage,” International Journal of Software Engineering and Knowledge Engineering, vol. 23, no. 10, pp. 1427–1457, 2014. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, “Prioritizing test cases for regression testing,” IEEE Transactions on Software Engineering, vol. 27, no. 10, pp. 929–948, October 2001. R. Huang, X. Xie, T. Y. Chen, and Y. Lu, “Adaptive random test case generation for combinatorial testing,” in Proceedings of the IEEE 36th Annual Computer Software and Applications Conference (COMPSAC’12), 2012, pp. 52–61. H. Do, S. G. Elbaum, and G. Rothermel, “Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact,” Empirical Software Engineering, vol. 10, no. 4, pp. 405–435, 2005. Y. Lei, R. Kacker, D. R. Kuhn, and V. Okun, “Ipog/ipod: Efficient test generation for multi-way software testing,” Software Testing, Verification, and Reliability, vol. 18, no. 3, pp. 125–148, 2008. J. Czerwonka, “Pairwise testing in real world: Practical extensions to test case generators,” in Proceedings of the 24th Pacific Northwest Software Quality Conference (PNSQC’06), 2006, pp. 419–430.
© Copyright 2026 Paperzz