Lecture 05 Test Coverage

CS 4723: Lecture 5
Test Coverage
Test Coverage


The most straightforward: input coverage

# of inputs tested / # of possible inputs


2
After we have done some testing, how do we know
the testing is enough?
Unfortunately, # of possible inputs is typically
infinite
Not feasible, so we need approximations…
Test Coverage
3

Code Coverage

Input Combination Coverage

Specification Coverage

Mutation Coverage
Code Coverage

Basic idea:



So the test suite is definitely not sufficient
Definition:


4
Bugs in the code that has never been executed will not be
exposed
Divide the code to elements
Calculate the proportion of elements that are executed by
the test suite
Control Flow Graph
How many test cases to
achieve full statement
coverage?
5
Statement Coverage in Practice



6
Microsoft reports 80-90% statement coverage
Safely-critical software must achieve 100%
statement coverage
Usually about 85% coverage, 100% for large
systems is usually very hard
Statement Coverage: Example
7
Branch Coverage



8
Cover the branches in a program
A branch is consider executed when both (All)
outcomes are executed
Also called multiple-condition coveage
Control Flow Graph
How many test cases to
achieve full branch
coverage?
9
Branch Coverage: Example
10
Branch Coverage: Example
An untested flow of data from
an assignment to a use of the
assigned value, could hide an
erroneous computation
Even though we have 100%
statement and branch
coverage
11
Data Flow Coverage

Cover all def-use pairs in a software

Def: write to a variable

Use: read of a variable

Use u and Def d are paired
when d is the direct
precursor of u in certain
execution
12
Data Flow Coverage

Formula

Not easy to locate all use-def pairs

Easy for inner-procedure (inside a method)

Very difficult for inter-procedure

13
Consider the write to a field var in one method, and the
read to it in another method
Path coverage

14
The strongest code coverage criterion

Try to cover all possible execution paths in a program

Covers all previous coverage criteria?

Usually not feasible

Exponential paths in acyclic programs

Infinite paths in some programs with loops
Path coverage

N conditions

2N paths

Many are not feasible

e.g., L1L2L3L4L6
X = 0 => L1L2L3L4L5L6
X = -1 => L1L3L4L6
X = -2 => L1L3L4L5L6
15
Control Flow Graph
How many paths?
How many test cases
to cover?
16
Path coverage, not enough
1. main() {
2.
int x, y, z, w;
3.
read(x);
4.
read(y);
5.
if (x != 0)
6.
z = x + 10;
7.
else
8.
z = 1;
9.
if (y>0)
10.
w = y / z;
10. else
11.
w = 0;
12.}
17
Test Requirements:
– 4 paths
• Test Cases
– (x = 1, y = 22)
– (x = 0, y = 10)
– (x = 1, y = -22)
– (x = 1, y = -10)
• We are still not exposing
the fault !
• Faulty if x = -10
– Structural coverage cannot
reveal this error
Code Coverage

Questions


18
Statement (basic block) coverage, are they the same?
Branch coverage (cover all edges in a control flow graph),
same with basic block coverage?
Method coverage

So far, all examples are inner-method



19
Quite useful in unit testing
It is very hard to achieve 100% statement
coverage in system testing

Need higher level code element

Method coverage
Similar to statements

Node coverage : method coverage

Edge coverage : method invocation coverage

Path coverage : stack trace coverage
Method coverage
20
Code coverage: summary

Coverage of code elements and their connections

Node coverage:


Edge coverage:


Branch/Dataflow/MethodInvok
Path coverage:

21
Class/method/statement/predicate coverage
Path/UseDefChain/StackTrace
Code coverage: limitations

Not enough


22
Some bugs can not be revealed even with full path
coverage
Cannot reveal bugs due to missing code
Code coverage: practice

23
Though not perfect, code coverage is the most
widely used technique for test evaluation

Also used for measure progress made in testing

The criteria used in practice are mainly:

Method coverage

Statement coverage

Branch coverage

Loop coverage with heuristic (0, 1, many)
Code coverage: practice

Far from perfect



A lot of corner (they are not so corner if just not found
by statement coverage) cases can never be found
100% code coverage is rarely achieved



24
The commonly used criteria are the weakest, recall our
examples
Mature commercial software products released with 85%
to 90% statement coverage
Some commercial software products released with around
60% statement coverage
Many open source software even lower than 50%
Input Combination Coverage

Basic idea




25
Origins from the most straightforward idea
In theory, proof of 100% correctness when achieve 100%
coverage in theory
In practice, on very trivial cases
Main problems

Combinations are exponential

Possible values are infinite
Input Combination Coverage

An example on a simple automatic sales machine

Accept only 1$ bill once and all beverages are 1$

Coke, Sprite, Juice, Water

Icy or normal temperature

Want receipt or not

All combinations = 4*2*2 = 16 combinations

26
Try all 16 combinations will make sure the system works
correctly
Input Combination Coverage

Sales Machine Example
Input 1
Input 2
Input 3
Coke
Sprite
Normal
Receipt
Juice
Icy
No-Receipt
Water
27
Combination Explosion




28
Combinations are exponential to the number of
inputs
Consider an annual tax report system with 50
yes/no questions to generate a customized form
for you
250 combinations = about 1015 test cases
Running 1000 test case for 1 second -> 30,000
years
Observation


29
When there are many inputs, usually a relationship
among inputs usually involve only a small number of
inputs
The previous example: Maybe only icy coke and
sprite, but receipt is independent
Example of Tax Report
30

Input 1: Family combined report or Single report

Input 2: Home loans or not

Input 3: Receive gift or not

Input 4: Age over 60 or not

…

Input 1 is related to all other inputs

Other inputs are independent of each other
Studies

A long term study from NIST (national institute of
standardization technology)

31
A combination width of 4 to 6 is enough for detecting
almost all errors
N-wise coverage


32
Coverage on N-wise combination of the possible values of all
inputs
Example: 2-wise combinations

(coke, icy), (sprite, icy), (water, icy), (juice, icy)

(coke, normal), (sprite, normal), …

(coke, receipt), (sprite, receipt), …

(coke, no-receipt), (sprite, no-receipt), …

(icy, receipt), (normal, receipt)

(icy, no-receipt), (normal, no-receipt)

20 combinations in total

We had 16 3-wise combinations, now we have 20, get worse??
N-wise coverage

Note: One test case may cover multiple N-wise
combinations

E.g., (Coke, Icy, Receipt) covers 3 2-wise combinations



(Coke, Icy), (Coke, Receipt), (Icy, Receipt)
100% N-wise coverage will fully cover 100% (N-1)wise coverage, is this true?
For K Boolean inputs

Full combination coverage = 2k combinations: exponential

Full n-wise coverage = 2n*k*(k-1)* … *(k-n+1)/n!
combinations: polynomial, for 2-wise combination, 2*k*(k-1)
33
N-wise coverage: Example

34
How many test cases for 100% 2-wise coverage of
our sales machine example?

(coke, icy, receipt), covers 3 new 2-wise combinations

(sprite, icy, no-receipt), cover 3 new …

(juice, icy, receipt), covers 2 new …

(water, icy, receipt), covers 2 new …

(coke, normal, no-receipt), covers 3 new …

(sprite, normal, receipt), cover 3 new …

(juice, normal, no-receipt), covers 2 new …

(water, normal, no-receipt), covers 2 new …

8 test cases covers all 20 2-wise combinations
Combination Coverage in Practice


35
2-wise combination coverage is very widely used

Pair-wise testing

All pairs testing
Mostly used in configuration testing

Example: configuration of gcc

All lot of variables

Several options for each variable

For command line tools: add or remove an option
Input model

What happened if an input has infinite possible
values

Integer

Float

Character

String


36
Note: all these are actually finite, but the possible value
set is too large, so that they are deemed as infinite
Idea: map infinite values to finite value baskets
(ranges)
Input model

Equivalent class partition



Partition the possible value set of a input to several value
ranges
Transform numeric variables (integer, float, double,
character) to enumerated variables
Example:

int exam_score => {less than -1}, {0, 59}, {60,69}, {70,79},
{80,89}, {90, 100}, {100+}

37
char c => {a, z}, {A,Z}, {0,9}, {other}
Input model

Feature extraction

For string and structure inputs

Split the possible value set with a certain feature

Example:
String passwd => {contains space}, {no space}

It is possible to extract multiple features from one input

Example:
String name => {capitalized first letter}, {not}
=> {contains space}, {not}
=> {length >10}, {2-10}, {1}, {0}
One test case may cover multiple features
38
Input model

Feature extraction: structure input

A Word Binary Tree (Data at all nodes are strings)

Depth : integer -> partition {0, 1, 1+}

Number of leaves : integer -> partition {0, 1, <10, 10+}

Root: null / not

A node with only left child / not

A node with only right child / not

Null value data on any node / not

Root value: string -> further feature extraction

39

Value on the left most leaf: string -> further feature
extraction
…
Input model

Infeasible feature combination?

Example:
String name => {capitalized first letter}, {not}
=> {contains space}, {not}
=> {length >10}, {2-10}, {1}, {0}
Length = 0 ^ contains space
Length = 0 ^ capitalized first letter
Length = 1 ^ contains space ^ capitalized first letter
40
Input combination coverage

Summary:

Try to cover the combination of possible values of inputs

Exponential combinations:


41

N-wise coverage

2-wise coverage is most popular, all pairs testing
Infinite possible values

Input partition

Input feature extraction
Coverage is usually 100% once adopted

It is easy to achieve, compared with code coverage

Models are not easy to write
Specification Coverage



A type of input coverage
Covers the written formal specification in the
requirement document
Example



When a number smaller than 0 is fed in, the system should
report error => testcase: -1
Sometimes can be a sequence of inputs
When you input correct user name, a passwd prompt is
shown, after you input the correct passwd, the user
profile will be shown, …
=> testcase: xiaoyin, xxxxx, …
42
Specification Coverage

Widely used in industry

Advantages


Target at the specification

No need for writing oracles

Usually can achieve 100% coverage
Disadvantages

Very hard to automate

43
can only be automated with formal specifications

No guarantee to be complete

Quality highly depend on the specification
Test coverage

So far, covering inputs and code

The final goal of testing



44
Find all bugs in the software
So there should be a bug coverage
The coverage best represents the adequacy of a
test suite

50% bug coverage = half done!

100% bug coverage = done!
But it is impossible

Bugs are unknown



Otherwise we do not need testing
So we have the number of bugs found, we do not
know what to divide
One possible solution

Estimation



45
1-10 bugs in 1 KLOC
Depends on the type of software and the stage of
development, imprecise
When you find many bugs, do you think all bugs are there
or the code is really of low quality?
Mutation coverage



46
How can we know how many bugs there are in the
code?
If only we plant those bugs!
Mutation coverage checks the adequacy of a test
suite by how many human-planted bugs it can
expose
Concepts


Mutant

A software version with planted bugs

Usually each mutant contains only one planted bug, why?
Mutant Kill


47
Given a test suite S and a mutant m, if there is a test case
t in S, so that execute(original, t) != execute(m, t), we
state that S can kill m
Basically, a test suite can kill a mutant, meaning that the
test suite is able to detect the planted bug represented
by the mutant
48
Illustration
Original
Oracles
same
Mutant 1
Test Cases
Results
different
Mutant 2
Results
...
Mutant n
Survived
Results
Killed
Concepts

Mutation coverage
# of mutants killed
# of mutants generated
49
Mutant generation

50
Traditional mutation operators

Statement deletion

Replace Boolean expression with true/false

Replace arithmetic operators (+, -, *, /, …)

Replace comparison relations (>=, ==, <=, !=)

Replace variables

…
Mutation Example: Operator
Mutant operator
51
In original
In mutant
Statement Deletion
z=x*y+1;
Boolean expression to
true | false
if (x<y)
if(true)
If(false)
Replace arithmetic
operators
z=x*y+1;
z=x*y-1
z=x+y-1
Replace comparison
operators
if(x<y)
if(x<=y)
if(x==y)
Replace variables
z=x*y+1;
z = z*y+1
z = x*x+1
Mutant generation

52
Object-oriented mutation operators

Insert/Delete overriding method

Add/delete “this”

Instantiation as child class

Cast to subtype

…
Mutation Example: Object-Oriented

Insert/Delete overriding method
class Shape{
public void setID(String id){
this.id = id;
}
public void draw(){
...
}
}
class Circle extends Shape{
public void draw(){
...
}
}
53
class Shape{
class Shape{
public void setID(String id){
public void setID(String id){
this.id = id;
this.id = id;
}
}
public void draw(){
protected void draw(){
...
...
}
}
}
}
class Circle extends Shape{
class Circle extends Shape{
public void setID(String id){
}
}
public void draw(){
...
}
}
Problems of mutation testing


Large amount of time overhead

Need to run the test suite over large number of mutants

Cause extra burden for collecting test coverage
Equivalent mutants

54
A mutant that will not affect the behavior of the
software
Time overhead
55

For n mutants, requires n times of overhead

How to reduce time overhead?

Reuse execution info

Early rule out

Mutants that are not covered

Mutants that cannot be killed
Reduce Time Overhead
original
m1
m2
int index = read;
int index = read;
int index = read;
int index = read;
while (…)
{
…;
index++;
while (…)
{
…;
index++;
while (…)
{
…;
index++;
while (…)
{
…;
index++;
if (index == 10) {
break;
}
}
return value > 0;
56
if (index == 10) {
break;
}
if (index == 10) {
return true;
}
m3
if (index == 10) {
break;
}
}
return value < 0;
}
return value > 0;
}
return value +1 >0;
reuse the program
states before
return statement
If index reads 20,
The mutant is
not covered
If value is not 0,
nothing is changed
Equivalent mutants

Another main problem in mutation coverage is
equivalent mutants

A mutant is an equivalent mutant if its semantics is
identical with the original software
int index = 0;
int index = 0;
while (…)
{
…;
index++;
while (…)
{
…;
index++;
=>
if (index == 10) {
break;
}
57
}
if (index >= 10) {
break;
}
}
Equivalent mutants

Another main problem in mutation coverage is
equivalent mutants


58
Equivalent mutants cause mutation coverage to never
reach 100%
So you do not know whether there are too many equivalent
mutants, or the test suite is not adequate
Reduce equivalent mutants

Using compiler optimization


Check whether the compiled bytecode is the same with
the original software

Mutating dead code

Mutating unused variable
After the mutation code, write a conditional path, and
check whether the path is feasible
//result = a + b;
result = a - b;
59
//result = a + b;
result = a - b;
=>
if(a + b != a - b){
not equivalent;
}
Mutant testing tools

MILU
http://www0.cs.ucl.ac.uk/staff/Y.Jia/#tools

MuJava
http://cs.gmu.edu/~offutt/mujava/

Javalanche
https://github.com/david-schuler/javalanche/
60
Summary on all coverage measures

61
Code coverage

Target: code

Adequacy: no -> 100% code coverage != no bugs

Approximation: dataflow, branch, method/statements

Usability: medium (require code for instrumentation)

Preparation: none

Overhead: low (instrumentation cause some overhead)
Summary on all coverage measures

Input combination coverage

Target: inputs

Adequacy: yes -> 100% input coverage == no bugs

62
Approximation: n-wise coverage, input partition, input
feature extraction

Usability: none

Preparation: hard (require equivalent class partition)

Overhead: none
Summary on all coverage measures

Mutation coverage

Target: bugs

Adequacy: no -> 100% mutant coverage != no bugs

Approximation: mutation is already approximation

Usability: medium (require code change for mutants)

Preparation: none

63
Overhead: very high (execution on instrumented mutated
versions)