On Trojan Horses in Compiler Implementations

On Trojan Horses in Compiler Implementations
Wolfgang Goerigk
?
Institut fur Informatik und Praktische Mathematik, Christian-Albrechts-Universitat
zu Kiel, Preuerstrae 1-9, D-24105 Kiel, Germany. [email protected]
Abstract. This paper is to present a security-related motivation for
compiler verication, and in particular for binary compiler implementation verication. We will prove that source level verication is not sucient in order to guarantee compiler correctness. For this, we will adopt
the scenario of a well-known attack to Unix operating system programs
due to intruded Trojan Horses in compiler executables. Such a compiler
will pass nearly every test, state of the art compiler validation, the strong
bootstrap test, any amount of source code inspection and verication,
but for all that, it nevertheless might eventually cause a catastrophe.
We will show such a program in detail, and it is surprisingly easy to
construct. In that, we share a common experience with Ken Thompson,
who initially documented this kind of attack.
1 Introduction and Motivation
In 1984, Ken Thompson, the inventor of Unix, devoted his Turing Award lecture
[19] to security problems due to Trojan Horses intruded by compiler implementations. He shows a certain kind of attack in some detail: a small piece of virus
code, a Trojan Horse, hidden in the binary implementation of a concrete C compiler, not visible in the compiler source code, but reproducing itself when this
source code is recompiled in a bootstrapping process, intruding a back-door into
the Unix login command. This article is to relate Ken Thompson's example to
compiler verication.
The problem is known at least since Ken Thompson's lecture in 1984. However, in the programming languages and compiler community, L.M. Chirica and
D.F. Martin ([1], 1986) have been the rst, and in 1988 J Moore [16] pointed out,
that full compiler verication has not only to verify the transformation (mathematical mapping from source to target programs, compiling specication), but
also the compiler implementation. Usually, implementation proceeds again in
two steps: rst, the compiler program is constructed in a high level implementation language, and then that program is implemented using an existing compiler
for the implementation language (compiler bootstrapping , cf. section 3). J Moore
already suspected, that for theoretical reasons some properties also of the binary
machine code implementation have to be proved manually. After over 30 years
?
The work reported here has been supported by the Deutsche Forschungsgemeinschaft
(DFG) in the Verix and VerComp projects on Correct Compilers and Techniques
for Compiler Implementation Verication.
of research on compiler verication it is now high time to realize the impact of
Ken Thompson's example, i.e. that source level verication is not sucient at
all in order to convince users of compiler correctness. The main focus of this
paper is to show why and to sketch a possible solution: a practical and feasible
approach to low level compiler implementation verication [7, 11].
Our paper is organized in two stories. The rst story (sections 2, 3, 4) tells in
detail about the problem and its relation to compiler verication and validation.
The second story (section 5) tells about the solution. We hope that the rst story
is interesting to the reader for its own, since due to lack of space we are not able
to give more than a brief sketch of a full compiler correctness proof including
binary machine code implementation correctness. Thus, for the second story we
will mainly refer to work presented elsewhere [14, 15, 13, 8, 6, 7, 11]. That work
is part of the Verix and VerComp projects on compiler verication and on
compiler implementation verication at the universities of Karslruhe, Ulm, and
Kiel.
The rst story will start with some exercises in writing self-reproducing programs (section 2). After some remarks on compiler bootstrapping and the so
called compiler bootstrap test [21] (section 3) we will turn our attention to selfreproducing compilers (section 4), in particular to compilers which reproduce
their (incorrect) machine code if applied to a correct version of their source
code. We will give a concrete example. The programs we study in this paper
share a common pattern: They are conditionally self-reproducing (section 2.2)
with a normal, a reproduction and a catastrophic case. The code for the latter will be very hidden within the implementation and may cause unexpected
results, thus we want to call it a Trojan Horse.
At the end of the rst story there will be two concrete compiler programs
for C written in C resp. in machine code, a provably correct source program
and an incorrect implementation of it, which passes the compiler bootstrap test.
We can look at these programs as witnesses for a proof of the fact that source
level verication is not sucient to guarantee compiler correctness (theorem 3
in section 4).
As a matter of fact, we need an explicit additional compiler implementation
correctness proof in order to guarantee trustworthiness of compilers with sucient mathematical rigour. And this is what the second story tells about. At a
rst glance it sounds very cumbersome, as if we would have an additional program verication job, now for a large machine program. Fortunately, it turns out
that exactly one test is sucient [10, 14, 13]. Unfortunately, however, it is the
bootstrap test, and we have to verify that its result (the compiler machine program) has been generated as expected (and veried semantically). Fortunately
again, we can exploit the correctness of specication and high level implementation in order to show that a purely syntactical code checking suces, that we may
use a technique which we call a posteriori code inspection based on syntactical
code comparison [11, 7]. So there is a correct way out, a practically usable proof
technique for proving the correctness of low level compiler machine executables.
2 Self-reproducing Programs
We will start our rst story with some exercises in writing self-reproducing programs, actually programs which print or return their source code when executed.
Generations of students and programmers have successfully worked on such exercises before, and from a theoretical point of view it is not very surprising
that such programs exist: we know about the principle possibility from recursion and xed point theory. But we are not primarily interested in the programs
themselves, nor in winning an award for the shortest or most beautiful one in
whatever competition we could imagine. We want to learn some lessons and to
prepare some prerequisites which we later on will use in order to construct selfreproducing compilers. In particular, we want to point out that the technique we
use is generalizable to a construction process for self-reproducing programs, or
even more generally for reective programs. The latter can not only reproduce
but also introspect (know about, compute with) their own source code.
2.1 Self-reproduction by Substitution
Let us start studying a very small C program, consisting of only one parameterless (main) procedure denition with two statements:
main(){
char *b = "main(){
char *b = %c%s%c;
printf(b,34,b,34);
}";
printf(b,34,b,34);
}
The rst statement assigns the string constant "main(){ ... }" to the variable
b, and the second statement is printf(b,34,b,34). Let us try to understand
what this program does. We do not want to argue formally, therefore we refer
to the every day programmer's understanding of the operational semantics of C
programs: This program prints a certain string, actually the value of b, replacing occurances of the character and string place holders %c, %s, and %c by the
character with character-code 34 (which is the character '"'), the string value
of b itself, and the character with code 34 again. We get the string
main(){
char *b = %c%s%c;
printf(b,34,b,34);
}
but with %c replaced by " and %s replaced by exactly that string. Thus, the
printed result will be
main(){
char *b = "main(){
char *b = %c%s%c;
printf(b,34,b,34);
}";
printf(b,34,b,34);
}
Our small C program reproduces its own source code character by character.
Just in order to demonstrate this once more and to avoid the misunderstanding
that it is only possible in or due to specialities of machine oriented languages
like C, we will show a simple Lisp function which in the same way reproduces
its own (term-)syntax:
(defun selfrep ()
(let ((b '(defun selfrep ()
(let ((b '2000)) (subst b (+ 1999 1) b)))))
(subst b (+ 1999 1) b)))
So what is the technique we used? In general, source programs can not contain a copy of their own source code literally, since this would cause an innite
syntactical recursion. They can only produce it. The key idea is to break the
syntactic recursion by substitution: we copied the program text into a program
constant assigned to a variable b, but replacing the repeated occurance of the
text itself by a place holder, actually by %s in the C code, and by 2000 in Lisp.
Then, whenever the source text is to be computed, we return the content of
b after substituting the content of b into it. (In C we had to take care about
occurances of '"' as well.) The function printf substitutes implicitly, whereas
we need another dierent syntactical representation, for instance (+ 1999 1),
for the place holder 2000 in Lisp, because the occurances to be replaced by substitution must syntactically dier from those occuring in the substitution form
itself or elsewhere in the program.
2.2 Conditionally Self-reproducing Programs
We now proceed similar to the way Ken Thompson did in [19]. If we carefully
reread the previous paragraph, we will not only nd an explanation of how
these programs work; we can also learn how to construct them: we can add the
ability for self-reproduction to a program as follows: First, whenever we want to
compute the source code, we add the pattern
(let ((b '2000))
... (subst b (+ 1999 1) b) ... )
Then, if we are about to nish the program, we use a text editor, copy the entire
program, remove the place holder (2000 above) and paste the copied program as
a program constant to that position. The result will be a program in which every
call of (subst b (+ 1999 1) b) located within the let block will reproduce
the program source code.
Let us try this once again, before we nally leave Lisp, return to C and
proceed in our story: Suppose we want to write a function of one argument,
which dispatches over the concrete value selecting one of three cases: if the
argument is 'ident, the function shall return its source code (the reproduction
case), for the argument 'login it returns a special constant (the catastrophe ),
and in any other case it behaves like the identity function (the normal case):
(defun ident (x)
(cond ((equal x 'ident) ... )
((equal x 'login) 'Oops)
(t x)))
Following our procedure we rst add the pattern above. We could add it within
the conditional, but we prefer a surrounding block. This will enable us to replace "..." by (subst b (+ 1999 1) b), but the resulting program will not
yet reproduce its code in that case (instead, it would return 2000):
(defun ident (x)
(let ((b '2000))
(cond ((equal x 'ident) (subst b (+ 1999 1) b))
((equal x 'login) 'Oops)
(t x))))
The last step is to replace 2000 and paste the entire function denition we have
so far to that position:
(defun ident (x)
(let ((b '(defun ident (x)
(let ((b '2000))
(cond ((equal x 'ident) (subst b (+ 1999 1) b))
((equal x 'login) 'UUPS)
(t x))))))
(cond ((equal x 'ident) (subst b (+ 1999 1) b))
((equal x 'login) 'Oops)
(t x))))
Now, b contains a copy of the program up to the nal step, which was substituting b for the place holder 2000 within b, i.e. (subst b (+ 1999 1) b). This
function serves as an example for our construction principle here, but is also
shares a common pattern with the programs we are going to study in the rest
of this paper: Suppose we want to look at this function as an implementation of
the identity function. This implementation works correctly in any normal case,
i.e. with exactly two exceptions called catastrophe and reproduction. If applied
to 'login, it returns 'Oops, and if applied to 'ident, it returns its own code,
and in the innitely many other cases it will return the correct result. We will
see later, after generalizing to conditionally self-reproducing compilers, that we
won't even be able to realize the incorrect results (target programs) unless we
either entirely inspect every generated (target program) result, or, by accident,
guess the input which causes the catastrophe.
But let us now come back to the story: There is a second important point
to note: Programs which are able to reproduce themselves or to compute with
their own source code can contain additional stu, either used or unused (even
comments may be reproduced literally). Let us have a look to the following C
program, which is again a conditionally self-reproducing program, now written
in the C language:
/*--------------------------------------------File: reproduce.c (W. Goerigk, 25.11.1998)
-----------------------------------------------*/
char* buf = "
/*--------------------------------------------File: reproduce.c (W. Goerigk, 25.11.1998)
-----------------------------------------------*/
char* buf = %c%s%c;
int main (int argc, char *argv[]) {
if (argv[1] && (strcmp(argv[1],%cident%c) == 0))
printf(buf,34,buf,34,34,34,34,34,34,34);
else if (argv[1] && (strcmp(argv[1],%clogin%c) == 0))
printf(%cOops%c);
else
printf(argv[1]);
}
void cheat () {}
";
int main (int argc, char *argv[]) {
if (argv[1] && (strcmp(argv[1],"ident") == 0))
printf(buf,34,buf,34,34,34,34,34,34,34);
else if (argv[1] && (strcmp(argv[1],"login") == 0))
printf("Oops");
else
printf(argv[1]);
}
void cheat () {}
Recalling the general strategy to write such programs, we may just ignore everything that is assigned as a string constant " ... " to the variable buf (we know
that it has been constructed by copy and paste; there are six additional occurances of '"') and concentrate on the denition of main in order to understand
this program. This program takes a string argument. It dispatches on this argument and returns the string content of buf with the appropriate substitutions,
if the argument is "ident" (reproduction ). It returns "Oops", if the argument is
(catastrophe ), and otherwise it returns the argument string (normal ).
Thus, this program is a C version of the above Lisp function.
This nishes the rst chapter of our rst story. We have enough prerequisites
now in order to manage and to construct self-reproducing C programs or actually more generally reective programs. We are interested in self-reproducing
compilers. In particular, we are not looking for programs printing their source
code. Instead we want programs which reproduce their own binary machine code
implementation, which pass the so called compiler bootstrap test:
"login"
3 Compiler Bootstrapping and the Bootstrap Test
Compiler bootstrapping is a phrase used for implementing compiler programs
using compilers. It is a bit more like Munchhausen's bootstrapping (\am eigenen
Schopf aus dem Sumpf ziehen"), if implementation language and source language
are the same. Many people prefer to use the word bootstrapping only in this case,
because then we could (in principle) apply the compiler to itself, thus produce
a compiler executable \magically". But there is no magic. Somehow we need
an implementation for the implementation language, an interpreter, a compiler
for the subset used in the compiler, or a compiler producing inecient code or
running on another machine. N. Wirth gives a lot of interesting applications for
this kind of compiler bootstrapping in [21]. In particular, he proposes the so
called compiler bootstrap test:
=
SL
SL
CSL
TL
SL
CSL
TL
SL
SL
m1
CSL
TL
SL
SL
m0
TL
TL
SL
SL
m
TL
TL
SL
m2
TL
TL
TL
!
=
ML
M
Fig.1. The Bootstrap Test. We use McKeeman's T-diagrams to show repeated compiler applications: Every T-shaped box represents a compiler program. (E.g. named m,
implemented in M's machine language ML, compiling SL-programs to TL-programs.)
Compiler input (programs) appear at the left hand side of the box, outputs at the right
hand side. Compiling a compiler (hopefully) returns a compiler, so that we can apply
it again, playing a kind of dominoes game with these boxes
Let CSL be the compiler source program. Suppose we use an existing compiler m
from SL to TL on a machine M in order to generate an initial implementation
m0 . If this happens to work correctly, then we can use m0 on the target machine,
compile CSL again and generate m1 . We may not exactly know how m0 looks like,
because it is generated by an unknown (existing) compiler, but m1 is now a TLprogram generated according to CSL . Let us furthermore assume that CSL and
hence m1 are deterministic programs. Then we may now repeat this procedure,
applying m1 to CSL again. If all compilers work correctly, we get m1 back, i.e.
m2 = m1 . The bootstrap test succeeds.
If not, something has gone wrong. This happens very often in a compiler
development. Therefore, compiler constructors esteem this test highly in order
to uncover bugs. But if the compilers are correct, we can prove that the bootstrap
test will succeed. This is a consequence of the following bootstrapping theorem
[14, 3, 11], which holds, if the notion of compiler correctness in use at least
implies that the compiler preserves partial program correctness [18, 9]:
Theorem 1 (Bootstrapping Theorem). If m0 and CSL are both correct, if
m0 , applied to CSL , terminates with regular result m1 , and if the underlying
hardware worked correctly, then m1 is correct.
2
Let us assume that m0 and CSL are both correct and deterministic. Then
m1 is the one and only correct result of applying m0 to CSL . But then, m1 and
CSL are both correct (and deterministic), hence, we can apply the bootstrapping
theorem again and conclude that m2 is the one and only correct result of applying
m1 to CSL . Correctness (actually preservation of partial correctness) now implies
after regular termination of m0 and m1 , that m2 = m1 [ CSL ] = CSL [ CSL ] =
m0 [ CSL ] = m1 . Thus, we can formulate the following bootstrap test theorem:
Theorem 2 (Bootstrap Test Theorem). If m0 and CSL are both correct and
deterministic, if m0 , applied to CSL , terminates with regular result m1 , if m1 ,
applied to CSL , terminates with regular result m2 , and if the underlying hardware
worked correctly, then m1 = m2 .
2
In particular, we have m1 = m1 [ CSL ], hence m1 reproduces itself when
applied to the correct compiler source program CSL ; m1 is a self-reproducing
compiler. As a matter of fact, however, this property alone does not tell us
anything about the correctness of m1 (or m0 or m2 ). A successful bootstrap test
does not imply correctness, even if CSL is correct. And that brings us back to
our story. We are now going to construct a (correct) source program CSL and an
(incorrect) compiler implementation m0 passing the bootstrap test.
It is easy, by the way, to write an incorrect compiler source program which
after compilation with m passes this test. Just consider a source language feature
which is compiled incorrectly but not used in the compiler itself. Hence, the
challenge here is that we want to construct CSL correctly but nevertheless nd
an incorrect m0 passing the test. Actually, this is the reason why we will need
our earlier experiments with self-reproducing programs.
4 Self-reproducing Compilers
We have seen self-reproducing compilers in the previous section. Every correct
compiler written in its own source language produces an example. Before we
start to construct an incorrect example, we show a small C-program which we
use as an example for the \correct" compiler source program CSL . For this paper,
we prefer to write concrete and complete small programs which the reader could
type into the machine and run to see the eect. We do not even write a real
C-compiler. As a short cut we just call an existing one:
char cmdbuf[255] = "make CC=gcc `basename ";
int main (int argc, char *argv[]) {
strcat(cmdbuf, argv[1]); strcat(cmdbuf," .c`");
system(cmdbuf);
}
This program just calls the system's GNU C compiler to perform the actual
compilation. Given a string argument "program.c", we call the operating system to execute the command line "make CC=gcc `basename program.c .c`
which runs gcc on the le named program.c and produces an executable named
program. Anyway, we hope the reader can imagine this system call replaced by
a real compiler-function call.
Let us now proceed in our story. If we carefully reread section 2.2, it becomes
obvious that there is no principle problem constructing a wrong implementation
for CSL , because we could apply the tricks we used so far to machine programs
as well, and the resulting machine program could be as incorrect as our implementation of the identity function before. However, our story would become
kind of uninteresting then. So let us make a dierent attempt which exploits the
fact, that we are actually writing compilers: If we are able to construct reective
programs dealing with their own source code, and if our programs are actually
compilers as well, then why shouldn't we try to produce the incorrect machine
code just by correctly compiling incorrect source code?
What are the requirements for an incorrect implementation m0 of the above
compiler? First of all, it shall pass the bootstrap test, i.e. whenever called on
the source program above, it shall return itself (reproduction ). Then, for another particular program LSL we want m0 to return wrong code LTL as well
(catastrophe ). Finally, m0 should behave like CSL in any other case (normal ).
Thus, we get the following equation, which we will use as a pattern in order
to construct a C-program CSL just like we constructed the incorrect identity
function before. Finally, we will produce m0 by compilation.
CSL [ ] =
= CSL then m0 /* = CSL [ CSL ] */
else if = LSL then LTL
else CSL [ ]
if
Obviously, the resulting program will become reective if we use CSL [ CSL ] in
order to construct m0 in the reproduction case: the compiler will compile its own
source code. But we know how to construct such programs:
/*-------------------------------------------------------File: compile-incorrect.c (W. Goerigk, 25.11.1998)
----------------------------------------------------------*/
#include<string.h>
#include<stdio.h>
char login[255] =
"void main () {printf(\"Cheating Login (Oops): \");}";
char* buf = "...
";
char cmdbuf[255] = "make CC=gcc `basename ";
FILE* handle;
void main (int argc, char *argv[]) {
if (argv[1] && (strcmp(argv[1],"compile.c") == 0)) {
system("mv compile.c .compile.c.orig");
if ((handle = fopen("compile.c", "w+")) != NULL) {
fprintf(handle,buf,
34,92,34,92,34,34,34,buf,34,34,34,
34,34,34,34,34,34,34,34,34,34,34,34,
34,34,34,34,34,34,34,34,34,34,34,34,34,34);
fclose(handle);
system("make CC=gcc compile");
system("mv .compile.c.orig compile.c");
};
}
else if (argv[1] && (strcmp(argv[1],"login.c") == 0)) {
system("mv login.c .login.c.orig");
if ((handle = fopen("login.c", "w+")) != NULL) {
fprintf(handle,login);
fclose(handle);
system("make CC=gcc login");
system("mv .login.c.orig login.c");
};
}
else {
strcat(cmdbuf, argv[1]); strcat(cmdbuf," .c`");
system(cmdbuf);
}
}
First of all we had to decide how to identify CSL resp. LSL . For simplicity, we just
assume that the les to be compiled in these cases are called "compile.c" resp.
"login.c". We could use any other predicate, for instance a literal comparison,
a checksum, or a special magic number or whatever. Anyway. Then, we had to
decide how to compile. We use gcc again. In the normal case we can use the same
code as in CSL . In the reproduction and catastrophic cases we have to write the
wrong source code into a le, compile it, and then remove the le. Well, that's
it.
The actual program is twice as long because we dropped the string repeating
the program text. In that copy, we have to replace 34 occurances of '"' and two
occurances of '\' by %c as well. The above program reproduces m0 when applied
to "compile.c", compiles a bug into the program generated for "login.c", and
in any other case it will compile just like CSL , i.e. as correct as gcc does. By
the way, in order to show the eect, we have to invent a correct content of
"login.c". But actually, we may write whatever we want into that le. Our
compiler will generate a program that prints "Cheating Login (Oops): ". We
can easily imagine the catastrophic case to generate a target programs that
eventually causes a catastrophe somehow.
=
=
C SL
TL
SL
SL
m 0 TL TL
SL
C SL
TL
SL
SL C SL
TL
SL
SL
m 0 TL TL
SL
SL
m 0 TL TL
TL
m 0 TL
SL
=
=
=
Fig.2. Passing the Bootstrap Test. By construction of m0 we have established the
equation m0 = m0 [ CSL ]. Thus, m0 will pass the bootstrap test arbitrarily often.
The programs we have shown here do not exactly meet the requirements we need
to prove theorem 3 below: We did not prove CSL correct. We did not even write
a real C-compiler. There are much more than only two source programs which
our machine implementation compiles incorrectly. Moreover, we generated the
incorrect m0 by compiling a corresponding source program, focussing a bit more
on how to construct such programs. But anyhow, we want to summarize the
result in the following theorem, and we hope that the reader can imagine the
adjustments necessary to exactly meet the requirements. There will be a formal
proof of this theorem for a non-trivial real compiler into the code of an abstract
machine in [4], using the Boyer/Moore theorem prover ACL2 [12], although that
article will also not contain the entire compiler correctness proof.
Theorem3 (No source level verication will protect us). There exists a
provably correct compiler program CSL from SL to TL written in SL, a compiler
machine program m written in TL, a particular SL-program L with incorrect
implementation LTL 6= CSL [ L ] such that for any 6= CSL , 6= LSL we have
(b) m [ CSL ] = m
(c) m [ LSL ] = LTL
(a) m [ ] = CSL [ ]
provided the above machine program applications returned regular results. Thus,
source level verication is not sucient to guarantee compiler correctness. 2
o
SL
SL
o
o
o
o
In order to summarize the rst story's message: We assumed a (small) compiler to be veried on source code level; we used an implementation of it in order
to bootstrap a machine implementation. The new compiler executable, generated
by compiling the veried source code, passed the bootstrap test, i.e. it has been
identical to the executable we used to generate it. Probably it will pass any other
test we may try. But for all that, nally we got an incorrect result. Something
was missing. By the way, the only tests which could nd the hidden error would
be to guess (by accident) the catastrophic case (and wait for the catastrophe to
happen), or to perform the bootstrap test with sucient mathematical rigour,
i.e. to really verify the result. The latter is what the second story will tell us
about: an explicit binary compiler implementation correctness proof.
If we apply the compiler to itself, yet triggering the reproduction case, we will
again get a compiler which works correctly in any but the two exceptional cases.
The reproduction case does not even show an eect unless we apply the result
in the catastrophic case. It is highly unlikely, that classical compiler validation
can uncover such a bug. Compiler validation is based on a carefully selected
(and published) test suite. In order to pass a validation procedure, a compiler
must succeed on the test suite. But that means, that the compiler results are
again tested by running them on some selected inputs. Our Trojan Horse is very
hidden within the incorrect implementation; it only shows up for one particular
source program. And since we are really bad guys, we won't tell which.
5 Avoiding Trojan Horses: Full Compiler Correctness
Obviously, transformation verication and source level verication of the compiler implementation are not sucient in order to avoid a bug or virus. Something
is missing, and it is clear from the previous story, that we have to concentrate
on the process of generating the compiler machine executable.
m0 is denitely not a correct implementation of CSL . Otherwise, it would
compile the login-program correctly, and, even more important, it would reproduce the correct implementation m0 , and not m0 , when applied to CSL . So
there must be a (syntactical) mismatch between m0 and m0 . If we carefully look
through m0 , comparing it instruction by instruction to what we would expect
as the result of compiling CSL , we would nd the mismatch. This is the idea of
our second story, but of course, we do not only want to nd our error. We want
to guarantee that there is no error at all.
Actually, in order to focus on the missing binary compiler implementation
correctness proof, let us assume the correctness of the source program CSL . To be
more precise: Let CC SL TL be a (semantically) correct compiling relation between
;
source and target language, and let CSL be a (correct) renement of CCSL TL .
In either case, by correctness (renement) we mean again at least preservation
of partial correctness, which captures the intuitive requirement that lower level
implementations return at most correct results w.r.t. higher level implementations or specications. (At the very end, it guarantees that we can trust machine
programs in this sense.) The following theorem from [11] can easily be proved
by transitivity (or compositionality) of the renement relation:
;
Theorem4 (Syntactical Code Inspection is Sucient). If CC SL TL is correct, if CSL is a correct implementation of CCSL TL , and if (CSL ; m) 2 CC SL TL ,
then m is a correct implementation of CC SL TL as well. Thus, m is a correct
compiler (executable) from SL to TL.
2
;
;
;
;
That means, that if the bootstrap test succeeds in a stronger sense, if we can
assure that this one execution of the compiler to itself generated the expected
target code m = m0 , then we can guarantee correct compilation for any other
program. Note, that this theorem reduces the semantical question of correct
compilation to a nal purely syntactical a posteriori code inspection based on
code comparison between CSL and m. m might be mechanically generated by
any initial unsafe implementation of CSL . However, if we would try to generate
it by applying m0 to CSL , the test would fail.
Transformation verication for CC SL TL , a proof that CSL renes CC SL TL , and
one nal syntactical code inspection together guarantee, that m0 is correct, does
not contain or (incorrectly) generate any bug or virus. Great. But unfortunately,
our story is not yet nished here, and we have to leave it open ended in this paper.
Without further investigation we won't be able to syntactically double-check
say 100KByte binary machine code just by comparing it to the corresponding
source program. That would be cumbersome and error-prone. It turns out that
there is a technique for such proofs which exploits modularization into adequate
intermediate layers. A diagonal argument allows for trusted machine support to
generate large parts without need for checking at all [11]. This can be seen as
an application of the work of Goodenough and Gerhart [10] on software testing
[13]. We also use result-checking techniques [20], for verication [5], but also for
further reduction of the code inspection work load [11, 7]. There is a lot to win
without weakening the rigorous correctness requirement.
;
;
6 Conclusions and Related Work
Our paper shows in detail why source level verication is not sucient in order
to guarantee compiler correctness, and we sketch a correct way out, proposing a
proof technique of a posteriori code inspection based on syntactical code comparison. The second story is very closely related to Paul Curzon's work on compiler
verication [2]. The crucial dierence is, that he uses and trusts a theorem prover
(HOL) both to carry out the proofs and to execute the compiling specication,
similar to the way J Moore [16, 17] and others [22] use ACL2 or its predecessor
Nqthm in order to prove correctness of a compiler program which is executable
within the prover.
Now, that we modularized the compiler verication task into three steps,
transformation verication, high level, and nally low level binary implementation verication, we could, from a pragmatic point of view, allow machine
support at least for the nal step. There are users who trust a machine execution of the code checking more than a hand proof. It is in their responsibility. But
note, in principle we need another new full verication of the checker program,
in particular, if we recall what could happen without. The situation is somehow comparable to using pocket calculators: We trust them. But if we would
not learn any longer how we could (in principle) manually double check their
results, we (even the experts) would hopelessly depend on the skill and good-will
of the manufacturers.
So, there is a good reason for us to further insist on the principle possibility
to provide a complete and completely documented mathematical proof of the
correctness of compiler executables, only depending on hardware correctness,
although somebody could remonstrate with us \on drifting away from the real
problem towards the pointlessly paranoid". Sure, the crucial work in compiler
verication has been for over 30 years, is, and will remain the semantical correctness of the transformation. But if everybody is pragmatic, and nobody seriously
asks the rigorous mathematical question what we additionally have and have
to be able to prove for the correctness of compiler executables, then we would
hopelessly remain sitting in the present situation which is best characterized by
the moral of Ken Thompson's Turing Award lecture in 1984: \You can't trust
code that you did not totally create yourself. (Especially code from companies
that employ people like me.) No amount of source-level verication or scrutiny
will protect you from using untrusted code."
References
1. L.M. Chirica and D.F. Martin. Toward Compiler Implementation Correctness
Proofs. ACM Transactions on Programming Languages and Systems, 8(2):185{
214, April 1986.
2. Paul Curzon. The Veried Compilation of Vista Programs. Internal Report, Computer Laboratory, University of Cambridge, January 1994.
3. Wolfgang Goerigk. An Exercise in Program Verication: The ACL2 Correctness
Proof of a Simple Theorem Prover Executable. Technical Report Verix/CAU/2.4,
CAU Kiel, 1996.
4. Wolfgang Goerigk. Compiler Verication Revisited. In Matt Kaufmann, Peter
Manolios, and J Strother Moore, editors, Using the ACL2 Theorem Prover: A
Tutorial Introduction and Case Studies. Kluwer Academic Publishers, 1999. In
preparation.
5. Wolfgang Goerigk, Thilo Gaul, and Wolf Zimmermann. Correct Programs without Proof? On Checker-Based Program Verication. In Proceedings ATOOLS'98
Workshop on \Tool Support for System Specication, Development, and Verication", Advances in Computing Science, Malente, 1998. Springer Verlag.
6. Wolfgang Goerigk and Ulrich Homann. Compiling ComLisp to Executable Machine Code: Compiler Construction. Technical Report Nr. 9812, Institut fur Informatik, CAU, October 1998.
7. Wolfgang Goerigk and Ulrich Homann. Rigorous Compiler Implementation Correctness: How to Prove the Real Thing Correct. In Proceedings FM-TRENDS'98
International Workshop on Current Trends in Applied Formal Methods, Lecture
Notes in Computer Science, Boppard, 1998. To appear.
8. Wolfgang Goerigk and Ulrich Homann. The Compiling Specication from ComLisp to Executable Machine Code. Technical Report Nr. 9713, Institut fur Informatik, CAU, Kiel, December 1998.
9. Wolfgang Goerigk and Markus Muller-Olm. Erhaltung partieller Korrektheit bei
beschrankten Maschinenressourcen. { Eine Beweisskizze {. Technical Report Verix/CAU/2.5, CAU Kiel, 1996.
10. J.B. Goodenough and S.L. Gerhart. Toward a Theory of Test Data Selection.
SIGPLAN Notices, 10(6):493{510, June 1975.
11. Ulrich Homann. Compiler Implementation Verication through Rigorous Syntactical Code Inspection. PhD thesis, Technische Fakultat der Christian-AlbrechtsUniversitat zu Kiel, Kiel, 1998.
12. M. Kaufmann and J S. Moore. Design Goals of ACL2. Technical Report 101,
Computational Logic, Inc., August 1994.
13. H. Langmaack. Contribution to Goodenough's and Gerhart's Theory of Software
Testing and Verication: Relation between Strong Compiler Test and Compiler
Implementation Verication. Foundations of Computer Science: Potential-TheoryCognition. LNCS, 1337:321{335, 1997.
14. Hans Langmaack. Softwareengineering zur Zertizierung von Systemen:
Spezikations-, Implementierungs-, U bersetzerkorrektheit. Informationstechnik
und Technische Informatik it-ti, 97(3):41{47, 1997.
15. Hans Langmaack. Theoretische Informatik ist Grundlage fur das sichere Beherrschen realistischer Software und Systeme. 25 Jahre Informatik an der Universitat Hamburg. Informatik: Stand, Trends, Visionen, pages 47{62, 1997.
16. J S. Moore. Piton: A veried assembly level language. Technical Report 22, Comp.
Logic Inc, Austin, Texas, 1988.
17. J S. Moore. Piton, A Mechanically Veried Assembly-Level Language. Kluwer
Academic Publishers, 1996.
18. Markus Muller-Olm. Three Views on Preservation of Partial Correctness. Technical Report Verix/CAU/5.1, CAU Kiel, October 1996.
19. Ken Thompson. Reections on Trusting Trust. Communications of the ACM,
27(8):761{763, 1984. Also in ACM Turing Award Lectures: The First Twenty
Years 1965-1985, ACM Press, 1987, and in Computers Under Attack: Intruders,
Worms, and Viruses Copyright, ACM Press 1990.
20. Hal Wasserman and Manuel Blum. Software reliability via run-time resultchecking. Journal of the ACM, 44(6):826{849, November 1997.
21. N. Wirth. Compilerbau. Springer, Berlin, 1986.
22. W.D. Young. A veried code generator for a subset of gypsy. Technical Report 33,
Comp. Logic. Inc., Austin, Texas, 1988.