Learning based Software Testing --Marriage between “learning” and “testing” Dan Hao ([email protected]) Peking University 2015.5.28 About Me • Associate Professor, Peking University • Education Background – 2002.9-2008.1, Peking University, Ph.D. – 1998.9-2002.7, Harbin Institute of Technology, B.S. • Research Interest: Software Testing Homepage: sei.pku.edu.cn/~haod 2 Software contains bugs 3 Software Testing SOFTWARE Software Everywhere 4 Simplified Software-Testing Process SUT Execute Test Input Actual Output Expected Output Test Case Compare Test Oracle revealed faults no revealed faults 5 Important Problems in Software Testing Test Input Generation Test Repair Test Oracle Generation …… Test Selection/reducti on/prioritization 6 Important Problems in Software Testing Test Input Generation Test Repair Test Oracle Generation …… Test Selection/reducti on/prioritization How to solve these problems? Program Analysis, Machine Learning, Searching Algorithms, … 7 Learning --- Outsider’s perspective • Machine Learning – learn rules from history data – apply rules to new data Learning Algorithms • Search-based Algorithms – find the optimum quickly in the solution space • …… 8 Marriage between Learning & Testing Learning Algorithms 9 Marriage between Learning & Testing • search-based test generation • search-based test prioritization/reduction/selection • automated program repair • machine learning based bug prediction • …… Learning Algorithms 10 Marriage between Learning & Testing Learning Algorithms 11 Our Work in the Marriage between Learning & Testing • Test oracle generation[ASE14] • Obsolete test identification[ECOOP12] • Test effectiveness measurement[ISSTA16] • …… Learning Algorithms 12 In: Test Oracle Generation[ASE14] • Metamorphic relation inference + PSO What is the test oracle problem? Test oracles are widely recognized as a difficult problem!!! [ASE14] Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, Hong Mei, Searchbased Inference of Polynomial Metamorphic Relations, ASE 2014. 13 Metamorphic Relation --- A specific type of oracles MR: a particular change to the input “changes” the output RI(I1,I2)=> RO(O1,O2) I1 I2 O1 O2 Example: • Testing Cosine Function cos(x): cos (x) + cos (𝝅 - x) = 0? • Testing Minimum Function min(x,y): min (x, y) - min (y, x) = 0 14 Statistics on Metamorphic Relations RI:linear equation RO:linear RI(I1,I2)=> RO(O1,O2) 60% 𝒑𝒐𝒍𝒚𝒏𝒐𝒎𝒊𝒂𝒍 Metamorphic Relations 1-MR equation 50% RI:linear equation RO:quadratic 2-MR equation 15 RI(I1,I2)=> RO(O1,O2) RI: I2=αI1+β RO: c1O1+c2O2+c3=0 1-MR P(I1) P(αI1+β) c1P(I1)+c2P(αI1+β)+c3=0 2-MR c1P2(I1)+c2P2(αI1+β)+c3P(I1)P(αI1+β)+c4P (I1) +c5P(αI1+β)+c6=0 16 PSO MR Inference 1-MR: c1P(I1)+c2P(αI1+β)+c3=0 2-MR: c1P2(I1)+c2P2(αI1+β)+c3P(I1)P(αI1+β)+c4P (I1)+c5P(αI1+β)+c6=0 • PSO algorithm: – Each candidate solution is called a particle – Each particle has a velocity and a location which keep changing – A fitness function is used to evaluate how close the location of a particle is to an optimal location • Why PSO : • Very effective to search in continuous space • Can lead a particle to escape local optimal locations • Simple, no many parameters to adjust PSO algorithm (Particle Swarm Optimization) 17 simulating the birds foraging behavior M Fitness(L) = å f (L, k) k=1 Y PSO based MR Inference N a. b. N c. Y In the beginning, the N particles are assigned with locations L (initial values of the parameters). The N particles keep updating their velocities and locations. When reaching the termination threshold, over. 18 MR Filtering 1-MR: c1P(I1)+c2P(αI1+β)+c3=0 2-MR: c1P2(I1)+c2P2(αI1+β)+c3P(I1)P(αI1+β)+c4P (I1)+c5P(αI1+β)+c6=0 • Motivation: – quality of MRs: dependent on the number of test inputs →costly. – Instead, we adopted another approach to assure the quality of inferred MRs, that is MR filtering. • Approach: – remove low-quality MRs via statistics based filtering: If an MR is violated by an unignorable of test inputs 19 Evaluation Results(1/2) Applied to 189 scientific functions of JDK, Apache, Matlab, and GSL 20 In: Obsolete Test Identification[ECOOP12] • Obsolete Tests P public class Testcases Account a; protected void setUp() a=new Account(100.0,"user1"); protected void tearDown() T P’ public void test1() a.transfer(50.0,"user2"); a.withdraw(40.0); assertEquals(9.5,a.getBalance()); public void test2() a.withdraw(40.0); assertEquals(56,a.getBalance();//should be 60 ... [ECOOP12] Dan Hao,Tian Lan, Hongyu Zhang, Chao Guo, Lu Zhang, Is This a Bug or an Obsolete Test? ECOOP 2012. 21 Problem Description • Obsolete test identification Given a failing execution, is it caused by a bug in the source code or an obsolete test case? • Importance – Without knowing the cause of a failure, how to decide whether repairing a test[Daniel:ASE09,Daniel:ISSTA10] or debugging in the source code[Jones: ASE06,Liblit: PLDI03,Weimer:ICSE09,Kim: ICSE13] ? “Determining this reason for failure is the critical first step before any corrective action can be taken…” Krishna Ratakonda (IBM Fellow @FOSE2014) 22 Best-First Decision Tree Algorithm Obsolete Test Identification • Binary classification problem (T v.s. P’) public class Testcases Account a; protected void setUp() a=new Account(100.0,"user1"); protected void tearDown() T P’ public void test1() a.transfer(50.0,"user2"); a.withdraw(40.0); assertEquals(9.5,a.getBalance()); public void test2() a.withdraw(40.0); assertEquals(56,a.getBalance();//should be 60 ... 23 Learning based Obsolete Test Identification failureinducing test collection feature collection classifier building 24 Features Complexity Features Change Features Testing Features • Maximum depth of the call graph • Number of methods in the call graph • File change • Type of failure • Count of plausible nodes in the graph • Existence of highly fault-prone node • Product innocence 25 Evaluation Results within the same version between versions Effective when being applied within the same versions, or between versions across projects 26 In: Test Effectiveness Measurement[ISSTA16] • Mutation Testing: – Whether a mutant is killed by a test suite – Mutation score Program 1 begin 2 int x,y; 3 input(x,y); 4 if(x<y) 5 output(x+y); 6 else 7 output(x*y); 8 end Mutant 1 begin 2 int x,y; 3 input(x,y); 4 if(x<=y) 5 output(x+y); 6 else 7 output(x*y); 8 end [ISSTA16] Jie Zhang, Yiling Lou, Lingming Zhang, Dan Hao, Lu Zhang, Hong Mei, Predictive Mutation Testing, ISSTA 2016. 27 Challenge in Mutation Testing • Costly – E.g., 512LOC Program has 23848 mutants • Mutant generation • Mutant execution • Literature – Do Fewer – Do Faster Don’t run at all 28 Predictive Mutation Testing • Mutation testing results – Whether a mutant is killed by a test suite – Mutation score: percentage of killed mutants • Prediction: Whether a mutant is killed or survived 29 Random Forest Mutation Testing • Binary classification problem (killed v.s. survived) Whether a test kills a mutant? Program 1 begin 2 int x,y; 3 input(x,y); 4 if(x<y) 5 output(x+y); 6 else 7 output(x*y); 8 end Mutant 1 begin 2 int x,y; 3 input(x,y); 4 if(x<=y) 5 output(x+y); 6 else 7 output(x*y); 8 end PIE Theory: Propagation Infection Execution 30 Features 31 Evaluation Results Besides, in the cross-version scenario, the precision is mostly about 90%. 32 Evaluation Results Besides, in the cross-version scenario, the precision is mostly about 90%. • Prediction results with high precision • Make prediction quickly 33 Commonality Analysis Test Oracle Generation Obsolete Test Identification Sampling Test Effectiveness Measurement Learning Algorithms PSO Decision-Tree Random Forest 34 Learning-based Software Testing Problems: • test generation • test-execution optimization • defect prediction • bug fixing • …… Algorithms: • genetic algorithms • PSO • hill climbing • random forest • …… Learning Algorithms 35 Challenges in Learning-based Software Testing Testing Perspective: Transform a testing problem into a typical learning problem Learning Perspective: • Design of fitness function • Influence of Imbalance data • Choice of algorithms and parameter values • …… 36 Summary 37 Thanks!
© Copyright 2026 Paperzz