Testing and Verification
Methods for Many-Core
Concurrency
Part 2: Metamorphic Testing for
(Graphics) Compilers
Halmstad Summer School on Testing, 2016
Alastair Donaldson, Imperial College London
www.doc.ic.ac.uk/~afd
Importance of compiler reliability
Everyone depends on reliable compilers
Formal verification folks (like me!) often implicitly
assume compiler reliability
Two successful methods for compiler
testing
Random differential testing
Equivalence modulo inputs testing
Random differential testing
Generate random
programs
Try them with many
compilers
Result mismatches
indicate bugs
Random.c
gcc
clang
Microsoft
compiler
24
42
24
Pioneered by Csmith, University of Utah
(PLDI’11)
Intel
compiler
24
Compiler testing and undefined behaviour
Random.c
gcc
clang
24
42
The mismatch is not erroneous
if Random.c exercises
undefined behaviour
If an execution exercises undefined behaviour, the
entire execution has no meaning
Any result is acceptable
Example: saturating add
Let’s try to implement saturating addition for signed
integers in C
x + y is clamped to an extreme value if it falls outside
the signed integer range
int saturating_add(int x, int y) {
if(x > 0 && y > 0 && x + y < 0)
return INT_MAX;
if(x < 0 && y < 0 && x + y > 0)
return INT_MIN;
return x + y;
}
The compiler’s “thought process”
“I will assume this program does not
exhibit undefined behaviours, because if it
does then it matters not what code I emit.”
int saturating_add(int x, int y) {
if(x > 0 && y > 0 && x + y < 0)
return INT_MAX;
if(x < 0 && y < 0 && x + y > 0)
return INT_MIN;
return x + y;
}
“By similar reasoning, this
condition is equivalent to false.”
“I know x + y does not
overflow: this would be
an UB.
So if x and y are
positive, x + y must be
positive.
The condition is
equivalent to false.
Excellent!!”
The compiler’s “thought process”
“I can simplify the program to:”
int saturating_add(int x, int y) {
if(false)
return INT_MAX;
if(false)
return INT_MIN;
return x + y;
}
“Or better still, to:”
int saturating_add(int x, int y) {
return x + y;
}
Full marks, compiler
Impact of undefined behaviour on
program analysers
Many program analysis tools to re-use compiler frontends: saves enormous front-end development effort
Two examples:
• GPUVerify uses Clang to translate OpenCL kernels into
LLVM byte-code, GPUVerify then works on the LLVM
byte-code
• SMACK translates LLVM byte-code into the Boogie
verification language. A successful C-based analyser
uses Clang to convert C to LLVM, and then applies
SMACK
Problem: the compiler front-end can generate arbitrary
code for inputs that exhibit UBs
Example: GPUVerify and UBs
GPUVerify says that this kernel is race-free
kernel void foo(global int * A) {
if(2/0 == 2) {
A[0] = get_global_id(0);
}
}
Reason: the Clang front-end regards the write to A as
unreachable, since it would require a UB in order to be
reached
Lesson: relying on compiler front-ends that exploit UBs
can render a would-be sound tool unsound
(In this case Clang does at least warn about the UB)
Equivalence modulo
inputs testing
Le et al. PLDI’14
Partitioning of P
Statements
covered
by I
Program
P
compiler
profiler
Input
Statements
not covered by I
I
From
D
P1
P2
P3
compiler
compiler
compiler
compiler
24
42
24
24
P
make
Mismatches indicate bugs
… differing
only in D
Single compiler
Execute on I
An example of
metamorphic testing
Many-Core Compiler Fuzzing – PLDI’15
We applied C compiler fuzzing methods in the
OpenCL domain:
-
Random differential testing [Yang et al., PLDI’11]
Equivalence modulo inputs testing [Le et al., PLDI’14]
Assess quality of industrial OpenCL compilers
-
We tested 21 OpenCL (device, compiler) configurations
Discovered 50+ compiler bugs, from vendors including:
-
Many are now fixed
OpenCL uses runtime compilation; compiler is a
core component of device driver
Lifting EMI to OpenCL
EMI requires input-dependent code
Typical OpenCL kernels do not contain much
input-dependent code
Our idea: manufacture input-dependent code
Applicable beyond
OpenCL
Dead-by-construction code injection
Take existing kernel… add extra input … inject conditional code
At runtime: DEAD = [0,1,2,3,…]
kernel void BFS(global int *q1, global int *q2), global int *DEAD)
{
if(get_local_id(0) == 0){
*local_q_tail = 0;//initialize the tail of w-queue
}
barrier(CLK_LOCAL_MEM_FENCE);
if(DEAD[43] < DEAD[21]) {
int tid = get_global_id(0);
if( tid<no_of_nodes)
{
int pid = q1[tid];
// ARBITRARY CODE
}
Injection is dead
by construction
g_color[pid] = BLACK;
int cur_cost = g_cost[pid];
struct Node cur_node = g_graph_nodes[pid];
for(int i=cur_node.x; i<cur_node.y + cur_node.x; i++) {
struct Edge cur_edge = g_graph_edges[i];
int id = cur_edge.x;
Detour: metamorphic testing
EMI is an instance of metamorphic testing
T.Y. Chen, S.C. Cheung, and S.M. Yiu, "Metamorphic testing: a
new approach for generating next test cases", Technical Report
HKUST-CS98-01, Department of Computer Science, Hong
Kong University of Science and Technology, Hong Kong (1998).
Metamorphic testing
Suppose we want to implement sin(x) efficiently
Imagine we do not have a reference
implementation (oracle) to check against
If our implementation says that sin(x) = y, how do
we know whether this is correct?
Metamorphic testing: example
Property of sin: for all x, sin(x) = sin(x + 2π)
If we find sin(x) ≠ sin(x + 2π), our implementation
must be broken
x → x + 2π is a metamorphic transformation for
sin
Generally, suppose:
g is a metamorphic
transformation
- we are trying to implement f
- we know that f(g(x)) = f(x) should hold for all x
Then we can test f(g(x)) = f(x) for our implementation
Mismatches indicate bugs
Metamorphic compiler testing
Let’s view compiling and running a program as a function:
compileAndRun(P, I)
returns output generated by P when
compiled and run on input I
If we fix input I, we get a function over programs:
compileAndRunOnI(P)
returns output generated by P when
compiled and run on input I
If program transformation f preserves semantics of P for
input I, we have:
compileAndRunOnI(P) = compileAndRunOnI(f(P))
So f is a metamorphic transformation
Metamorphic testing with identity
functions
Replacing program expression
e
with
(e+0)
should not affect program behaviour
We have applied the
identity function
It is a metamorphic transformation
But surely the compiler will replace (e+0) with e…
Manufacturing zeros
Recall: DEAD = [0,1,2,3,…] at runtime
Allows us to manufacture zero in a manner that
is opaque to the compiler
E.g.: DEAD[0]
DEAD[1] + DEAD[2] - DEAD[3]
DEAD[0]*(DEAD[41]+DEAD[42])
etc.
Using a manufactured zero
Instead of replacing e with (e + 0)
replace e with (e + manufactured_zero)
e.g. (e + (DEAD[1] + DEAD[2] – DEAD[3])
Compiler is oblivious to contents of DEAD
Lots of identity functions we can try:
e + manufactured_zero
e - manufactured_zero
e * manufactured_one
e must not modify x
e / manufactured_one
rotate(rotate(e, identity(x)), identity(-x))
clamp(e, identity(e), identity(e))
e must be side-effect free
Many other ways to encode identity function
Metamorphic testing using opaque values
We have looked at using opaque values (DEAD) to:
- inject dead-by-construction code
- apply identity functions to expressions
Opaque values prevent the compiler from optimising
away the compiled code
Many further metamorphic transformations are enabled
based on opaque value injection
Our hypotheses related to metamorphic
compiler testing
(1) Opaque value injection is a general idea:
easy to apply to practically any language
Start of OpenGL compiler testing demo
Let’s look at opaque value injection for GLSL
and kick off some metamorphic testing
Our hypotheses related to metamorphic
compiler testing
(2) Test case reduction is easy
Test case reduction
A high-value
application
Original program
Inputs known to
be important
Metamorphic
injections
execute
Variant 1
Variant 2 … Variant n
execute
Ouput 1
Ouput 2
… Ouput n
compare
Expected output
Test case reduction
undo
injection
Variant i
Original program
execute
Output
Reduced
variant 1
execute
mismatch
Expected output
undomismatch
injection
match
mismatch
execute
Output
Reduced
3
variant 2
execute
End result: a minimal
variant exposing the bug
Output
undo
injection
Reduced
variant 4
…
Our hypotheses related to metamorphic
compiler testing
(3) The approach may be good at finding “bugs
that matter”
Testing framework
From
via
High-value
application
we get
-
Inject
Test
Minimize
Minimal diff to high-value
application, exposing a
compiler bug
Our hypotheses related to metamorphic
compiler testing
(4) The approach is well-suited to testing compilers
for language features where implementation
differences are allowed
GLSL spec says:
“Any denormalized value … can be flushed to 0. The rounding mode
cannot be set and is undefined. NaNs are not required to be
generated.”
“Without any [precision] qualifiers, implementations are permitted to
perform such optimizations that effectively modify the order or number
of operations used to evaluate an expression, even if those
optimizations may produce slightly different results relative to
unoptimized code."
OpenGL testing demo again
Summary
Compilers, like any other software, can be buggy
Random differential testing involves cross-checking
compilers
RDT requires well-defined programs
Undefined behaviour is a complex topic that has
implications for program analysis
Equivalence modulo inputs testing involves crosschecking equivalent programs
EMI testing is an example of metamorphic testing
We have applied metamorphic testing to GLSL compilesr
through semantics-preserving transformations
This offers a number of advantages over other approaches
Open problems include:
How does metamorphic testing compare with
fuzzing?
Are we finding “bugs that matter”?
Can we guide the mutation process, to avoid the
duplicate bug problem?
Can we predict duplicate bugs according to how the
program output changes?
Can we use these techniques to test program
analysis tools?
Not everyone is keen on compiler fuzzing!
Derek Jones, The Shape of Code:
http://shape-of-code.coding-guidelines.com/tag/compiler-fault/
“When compiler fuzzers started to attract a lot of attention, a
few years ago, the teams working on gcc and llvm were quick
to react and fix many of the reported bugs (Csmith is the fuzzer
that led the way).”
“These days finding bugs in compilers using fuzzing is old news
and I suspect that the teams working on gcc and llvm don’t
need to bother too much about new academic papers claiming
that the new XYZ technique or tool finds lots of compiler bugs.”
“In fact I would suggest that compiler developers stop
responding to researchers working toward publishing papers on
these techniques/tools; responses from compiler maintainers is
being becoming a metric for measuring the performance of
techniques/tools, so responding just encourages the trolls.”
(my emphasis)
Key papers
Recommended reading on (state-of-the-art) compiler
testing:
Xuejun Yang, Yang Chen, Eric Eide, John Regehr. “Finding and understanding
bugs in C compilers”. PLDI 2011.
Vu Le, Mehrdad Afshari, Zhendong Su. “Compiler Validation via Equivalence
Modulo Inputs”. PLDI 2014.
John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, Xuejun Yang.
“Test-case reduction for C compiler bugs”. PLDI 2012.
Our recent contributions to this field:
Christopher Lidbury, Andrei Lascu, Nathan Chong, Alastair F. Donaldson. “ManyCore Compiler Fuzzing”. PLDI 2015.
Alastair F. Donaldson, Andrei Lascu. “Metamorphic Testing for (Graphics)
Compilers”. MET 2016.
© Copyright 2026 Paperzz