Creating Formally Verified Components for Layered

Creating Formally Verified Components for
Layered Assurance with an LLVM-to-ACL2
Translator
Jennifer Davis, David Hardin, Jedidiah McClurg
December 2013
© 2013 Rockwell Collins
All rights reserved.
Introduction
• Research objectives:
– Reduce need to trust compiler optimizations by reasoning about
post-optimization intermediate representations
– Reason about small fragments of code written in assembly
language
• We wish to create a library of formally verified software
component models for layered assurance.
• In our current work, the components are in LLVM intermediate
form.
• We wish to translate them to a theorem proving language, such
as ACL2.
• We built an LLVM-to-ACL2 translator
© 2013 Rockwell Collins
All rights reserved.
2
Motivating Work
• Jianzhou Zhao (U Penn) et al. produced several different
formalizations of operational semantics for LLVM in Coq. (2012)
– Intention to produce a verified LLVM compiler
• Magnus Myreen’s (Cambridge) “decompilation into logic” work
(2009)
– Imperative machine code (PPC, x86, ARM) -> HOL4
– Extracts functional behavior of imperative code
– Assures decompilation process is sound
• Andrew Appel (Princeton) observed that “SSA is functional
programming” (1998)
– This inspired us to build a translator from LLVM to ACL2
© 2013 Rockwell Collins
All rights reserved.
3
LLVM
• LLVM is the intermediate form for many common compilers,
including clang.
• LLVM code generation targets exist for a wide variety of machines
• LLVM is a register-based intermediate in Static Single Assignment
(SSA) form (each variable is assigned exactly once).
• An LLVM program consists of a list of entities.
– There are eight types including:
• function declarations
• function definitions
• Our software component models
are created from code that has
been compiled into the LLVM
intermediate form
© 2013 Rockwell Collins
All rights reserved.
4
ACL2
•
•
•
•
•
A Computational Logic for Applicative Common Lisp (ACL2)
Highly automated theorem proving system
Functional language with admission criteria
Executable subset of language
Rich set of legal identifiers (@foo, %.06, @bar_%.0)
• Side-effect free subset of Lisp, so it inherits Lisp peculiarities
– Function definition: (defun funname (parm1 parm2 parm3)
(<body>))
– Function invocation: (funname x y z)
– let binds variables to values within a function body
– Multiway conditionals use the cond form
– Lists are the fundamental data structure
– ACL2 supports integers and rationals
– Lisp predicate names are traditionally given a suffix of “p”
© 2013 Rockwell Collins
All rights reserved.
5
LLVM-to-ACL2 Translation Toolchain
theorem prover
© 2013 Rockwell Collins
All rights reserved.
6
Example—C Source
long sumarr(unsigned int n, long sum, long *array) {
unsigned int j = 0;
for (j = 0; j < n; j++) {
sum += array[j];
}
return sum;
}
• We can produce LLVM from C source as follows:
clang –O4 –S –emit-llvm sumarr.c
© 2013 Rockwell Collins
All rights reserved.
7
Example—LLVM
C Source:
long sumarr(unsigned int n, long sum, long *array) {
unsigned int j = 0;
for (j = 0; j < n; j++) {
sum += array[j];}
return sum;}
define i64 @sumarr(i32 %n, i64 %sum, i64* nocapture %array) nounwind
uwtable readonly {
%1 = icmp eq i32 %n, 0
br i1 %1, label %._crit_edge, label %.lr.ph
j
sum
.lr.ph:
; preds = %.lr.ph, %0
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]
%2 = getelementptr inbounds i64* %array, i64 %indvars.iv
%3 = load i64* %2, align 8, !tbaa !0
%4 = add nsw i64 %3, %.06
%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %._crit_edge, label %.lr.ph
._crit_edge:
; preds = %.lr.ph, %0
%.0.lcssa = phi i64 [ %sum, %0 ], [ %4, %.lr.ph ]
ret i64 %.0.lcssa
}
© 2013 Rockwell Collins
All rights reserved.
8
Translation Snippet
• Each block within an LLVM function contains a list of
instructions in SSA form with type information.
• Hence we can readily convert a list of instructions into an
appropriate let construct.
%2 = getelementptr inbounds i64* %array, i64 %indvars.iv
%3 = load i64* %2, align 8, !tbaa !0
%4 = add nsw i64 %3, %.06
(let ((%2 (getelementptr %array %indvars.iv 8)))
(let ((%3 (load-i64l %2 st)))
(let ((%4 (ifix (+ %3 %.06))))
...)))
© 2013 Rockwell Collins
All rights reserved.
9
Main Translator Algorithm
Translator
Get
Function
Names
LLVM AST
© 2013 Rockwell Collins
All rights reserved.
Remove
Aliases
Promote
Blocks to
Functions
Translate
to ACL2
ACL2 Code
10
Remove Aliases
• Aliases allow new names (e.g., @foo1 and @foo2) to be used
for globals and functions
@bar = global i32 123
@foo1 = alias i32* @bar
@foo2 = alias i32* @bar
• We eliminate these
© 2013 Rockwell Collins
All rights reserved.
11
Main Translator Algorithm
Translator
Get
Function
Names
LLVM AST
© 2013 Rockwell Collins
All rights reserved.
Remove
Aliases
Promote
Blocks to
Functions
Translate
to ACL2
ACL2 Code
12
Promote Blocks to Functions
• As we have seen, LLVM functions often contain inner blocks and
branch instructions
• Each of these blocks is pulled out as a new function.
• For each block, the phi instructions denote variables that
become parameters for that new function.
%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]
• The phi instructions also tell us the parameter values that must
be used at the new function’s call site(s)
© 2013 Rockwell Collins
All rights reserved.
13
Dealing with Order of Declarations
• ACL2 requires functions and constants to be defined before
they are used
• We do a topological sort on each of the call/dependency graphs
© 2013 Rockwell Collins
All rights reserved.
14
Main Translator Algorithm
Translator
Get
Function
Names
LLVM AST
© 2013 Rockwell Collins
All rights reserved.
Remove
Aliases
Promote
Blocks to
Functions
Translate
to ACL2
ACL2 Code
15
Translate to ACL2
• Function declaration  ACL2 function stub
• Function definition with instruction list  ACL2 defun construct
with a nested let-bound expression
• Memory  ACL2 single-threaded object (stobj) for efficient
execution
• Floating-point number  corresponding rational number
1.234
(+ 1 (/2 10) (/3 100) (/4 1000))
© 2013 Rockwell Collins
All rights reserved.
16
Example—ACL2
.lr.ph:
; preds = %.lr.ph, %0
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]
%2 = getelementptr inbounds i64* %array, i64 %indvars.iv
%3 = load i64* %2, align 8, !tbaa !0
%4 = add nsw i64 %3, %.06
%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %._crit_edge, label %.lr.ph
(defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st)
(declare (xargs :stobjs st
:guard (and (integerp %.06) (natp %indvars.iv) (natp %n)
(natp %array))))
(let ((%2 (getelementptr %array %indvars.iv 8)))
(let ((%3 (load-i64l %2 st)))
(let ((%4 (ifix (+ %3 %.06))))
(let ((%indvars.iv.next (nfix (+ %indvars.iv 1))))
(let ((%exitcond (if (= %indvars.iv.next %n) 1 0)))
(if (= %exitcond 1) %4
(@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st))))))))
© 2013 Rockwell Collins
All rights reserved.
17
Example—ACL2
(defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st)
(declare (xargs :measure (nfix (- (nfix %n) (nfix %indvars.iv)))
:stobjs st
:guard (and (integerp %.06) (natp %indvars.iv) (natp %n)
(natp %array) (< %indvars.iv %n))))
(if (not (and (mbt (integerp %.06)) (mbt (natp %indvars.iv)) (mbt (natp %n))
(mbt (natp %array)) (mbt (< %indvars.iv %n)))) %.06
(let ((%2 (getelementptr %array %indvars.iv 8)))
(let ((%3 (load-i64l %2 st)))
(let ((%4 (ifix (+ %3 %.06))))
(let ((%indvars.iv.next (nfix (+ %indvars.iv 1))))
(let ((%exitcond (if (= %indvars.iv.next %n) 1 0)))
(if (= %exitcond 1) %4
(@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st)))))))))
© 2013 Rockwell Collins
All rights reserved.
18
Tail Recursion
• Note that the translated function for the LLVM loop becomes a
tail-recursive function (uses an accumulator) in ACL2.
• Tail-recursive functions are nice for execution, since an
arbitrary number of recursive tail calls can be made without
exhausting the stack.
• However, tail-recursive functions are not convenient for
reasoning because they pollute the induction scheme.
• We can generate non-tail-recursive functions operating over
simple lists from tail-recursive, stobj-based functions. This
technique is called Hardin’s Bridge*.
*in memory of Scott Hardin, a civil engineer who designed several physical bridges,
and a man who valued rigor. He was the father of one of the authors.
© 2013 Rockwell Collins
All rights reserved.
19
Hardin's Bridge
Form: Non-tail-recursive
with mutable state
(x-iter j res st)
defiteration
1
2
(defun x (res d)
…
(if (endp d) res
(op (car d)
(x res (cdr d))))
Form: Tail recursive with
mutable state
(x-tail k res st)
defiteration
3
Form: Non-tail-recursive
and operating over a list
for(k=0; k< *SZ*; k++) {
res = op(d[k], res);
}
Form: Imperative and
operating over an array
1
© 2013 Rockwell Collins
All rights reserved.
2
3
20
Applying Hardin’s Bridge

We use the bridge technique to prove that @sumarr_%.lr.ph is
equal to the following non-tail-recursive function:
(defun sumlist64 (res lst)
(declare (xargs :measure (len lst)))
(cond ((not (true-listp lst)) (ifix res))
((endp lst) (ifix res))
(t (+ (ifix (load-i64ll (take 8 lst)))
(sumlist64 res (nthcdr 8 lst))))))

Proving properties about @sumarr_%.lr.ph can then be
accomplished by proving them instead about sumlist64, a nontail-recursive function better suited for theorem proving
© 2013 Rockwell Collins
All rights reserved.
21
Current State of the Translator
• We use the def::ung macro to automatically define the
domain of recursive functions.
– This allows recursive functions to be admitted in ACL2 without
manually adding measures.
• We added support for modular arithmetic (e.g., fixed width
addition).
• We have rerun the sumarr example with the current translator.
– No editing of the translated code was needed.
– Recursive function admitted automatically via def::ung
– Fixed width addition is preserved in the non-tail-recursive spec.
© 2013 Rockwell Collins
All rights reserved.
22
Limitations of the Translator
• Exceptions
• Indirect call instructions
© 2013 Rockwell Collins
All rights reserved.
23
Future Work
• Stack analysis and data structure analysis
• LLVM DataLayout directives (global endianness,
alignment/padding specification)
• LLVM intrinsic functions (there are a large number of these).
• Variable-length argument lists
• Attempt to eliminate cycles in the call graph by code rewrites
when possible (rather than just blindly emitting mutualrecursion).
© 2013 Rockwell Collins
All rights reserved.
24
Conclusion
• Built an LLVM-to-ACL2 translator
– Produced an executable ACL2 specification
• Tail recursion
• Efficient execution with in-place updates via ACL2’s stobj mechanism
– Demonstrated that the translation produces working ACL2 code for
a recursive example program
• Presented technique for reasoning about tail-recursive ACL2
functions that execute in-place
– Utilizes formally proven Hardin’s bridge to non-tail-recursive
versions operating on lists
• Tested examples with global variables, pointers, and string
constants.
© 2013 Rockwell Collins
All rights reserved.
25