On Trojan Horses in Compiler Implementations Wolfgang Goerigk ? Institut fur Informatik und Praktische Mathematik, Christian-Albrechts-Universitat zu Kiel, Preuerstrae 1-9, D-24105 Kiel, Germany. [email protected] Abstract. This paper is to present a security-related motivation for compiler verication, and in particular for binary compiler implementation verication. We will prove that source level verication is not sucient in order to guarantee compiler correctness. For this, we will adopt the scenario of a well-known attack to Unix operating system programs due to intruded Trojan Horses in compiler executables. Such a compiler will pass nearly every test, state of the art compiler validation, the strong bootstrap test, any amount of source code inspection and verication, but for all that, it nevertheless might eventually cause a catastrophe. We will show such a program in detail, and it is surprisingly easy to construct. In that, we share a common experience with Ken Thompson, who initially documented this kind of attack. 1 Introduction and Motivation In 1984, Ken Thompson, the inventor of Unix, devoted his Turing Award lecture [19] to security problems due to Trojan Horses intruded by compiler implementations. He shows a certain kind of attack in some detail: a small piece of virus code, a Trojan Horse, hidden in the binary implementation of a concrete C compiler, not visible in the compiler source code, but reproducing itself when this source code is recompiled in a bootstrapping process, intruding a back-door into the Unix login command. This article is to relate Ken Thompson's example to compiler verication. The problem is known at least since Ken Thompson's lecture in 1984. However, in the programming languages and compiler community, L.M. Chirica and D.F. Martin ([1], 1986) have been the rst, and in 1988 J Moore [16] pointed out, that full compiler verication has not only to verify the transformation (mathematical mapping from source to target programs, compiling specication), but also the compiler implementation. Usually, implementation proceeds again in two steps: rst, the compiler program is constructed in a high level implementation language, and then that program is implemented using an existing compiler for the implementation language (compiler bootstrapping , cf. section 3). J Moore already suspected, that for theoretical reasons some properties also of the binary machine code implementation have to be proved manually. After over 30 years ? The work reported here has been supported by the Deutsche Forschungsgemeinschaft (DFG) in the Verix and VerComp projects on Correct Compilers and Techniques for Compiler Implementation Verication. of research on compiler verication it is now high time to realize the impact of Ken Thompson's example, i.e. that source level verication is not sucient at all in order to convince users of compiler correctness. The main focus of this paper is to show why and to sketch a possible solution: a practical and feasible approach to low level compiler implementation verication [7, 11]. Our paper is organized in two stories. The rst story (sections 2, 3, 4) tells in detail about the problem and its relation to compiler verication and validation. The second story (section 5) tells about the solution. We hope that the rst story is interesting to the reader for its own, since due to lack of space we are not able to give more than a brief sketch of a full compiler correctness proof including binary machine code implementation correctness. Thus, for the second story we will mainly refer to work presented elsewhere [14, 15, 13, 8, 6, 7, 11]. That work is part of the Verix and VerComp projects on compiler verication and on compiler implementation verication at the universities of Karslruhe, Ulm, and Kiel. The rst story will start with some exercises in writing self-reproducing programs (section 2). After some remarks on compiler bootstrapping and the so called compiler bootstrap test [21] (section 3) we will turn our attention to selfreproducing compilers (section 4), in particular to compilers which reproduce their (incorrect) machine code if applied to a correct version of their source code. We will give a concrete example. The programs we study in this paper share a common pattern: They are conditionally self-reproducing (section 2.2) with a normal, a reproduction and a catastrophic case. The code for the latter will be very hidden within the implementation and may cause unexpected results, thus we want to call it a Trojan Horse. At the end of the rst story there will be two concrete compiler programs for C written in C resp. in machine code, a provably correct source program and an incorrect implementation of it, which passes the compiler bootstrap test. We can look at these programs as witnesses for a proof of the fact that source level verication is not sucient to guarantee compiler correctness (theorem 3 in section 4). As a matter of fact, we need an explicit additional compiler implementation correctness proof in order to guarantee trustworthiness of compilers with sucient mathematical rigour. And this is what the second story tells about. At a rst glance it sounds very cumbersome, as if we would have an additional program verication job, now for a large machine program. Fortunately, it turns out that exactly one test is sucient [10, 14, 13]. Unfortunately, however, it is the bootstrap test, and we have to verify that its result (the compiler machine program) has been generated as expected (and veried semantically). Fortunately again, we can exploit the correctness of specication and high level implementation in order to show that a purely syntactical code checking suces, that we may use a technique which we call a posteriori code inspection based on syntactical code comparison [11, 7]. So there is a correct way out, a practically usable proof technique for proving the correctness of low level compiler machine executables. 2 Self-reproducing Programs We will start our rst story with some exercises in writing self-reproducing programs, actually programs which print or return their source code when executed. Generations of students and programmers have successfully worked on such exercises before, and from a theoretical point of view it is not very surprising that such programs exist: we know about the principle possibility from recursion and xed point theory. But we are not primarily interested in the programs themselves, nor in winning an award for the shortest or most beautiful one in whatever competition we could imagine. We want to learn some lessons and to prepare some prerequisites which we later on will use in order to construct selfreproducing compilers. In particular, we want to point out that the technique we use is generalizable to a construction process for self-reproducing programs, or even more generally for reective programs. The latter can not only reproduce but also introspect (know about, compute with) their own source code. 2.1 Self-reproduction by Substitution Let us start studying a very small C program, consisting of only one parameterless (main) procedure denition with two statements: main(){ char *b = "main(){ char *b = %c%s%c; printf(b,34,b,34); }"; printf(b,34,b,34); } The rst statement assigns the string constant "main(){ ... }" to the variable b, and the second statement is printf(b,34,b,34). Let us try to understand what this program does. We do not want to argue formally, therefore we refer to the every day programmer's understanding of the operational semantics of C programs: This program prints a certain string, actually the value of b, replacing occurances of the character and string place holders %c, %s, and %c by the character with character-code 34 (which is the character '"'), the string value of b itself, and the character with code 34 again. We get the string main(){ char *b = %c%s%c; printf(b,34,b,34); } but with %c replaced by " and %s replaced by exactly that string. Thus, the printed result will be main(){ char *b = "main(){ char *b = %c%s%c; printf(b,34,b,34); }"; printf(b,34,b,34); } Our small C program reproduces its own source code character by character. Just in order to demonstrate this once more and to avoid the misunderstanding that it is only possible in or due to specialities of machine oriented languages like C, we will show a simple Lisp function which in the same way reproduces its own (term-)syntax: (defun selfrep () (let ((b '(defun selfrep () (let ((b '2000)) (subst b (+ 1999 1) b))))) (subst b (+ 1999 1) b))) So what is the technique we used? In general, source programs can not contain a copy of their own source code literally, since this would cause an innite syntactical recursion. They can only produce it. The key idea is to break the syntactic recursion by substitution: we copied the program text into a program constant assigned to a variable b, but replacing the repeated occurance of the text itself by a place holder, actually by %s in the C code, and by 2000 in Lisp. Then, whenever the source text is to be computed, we return the content of b after substituting the content of b into it. (In C we had to take care about occurances of '"' as well.) The function printf substitutes implicitly, whereas we need another dierent syntactical representation, for instance (+ 1999 1), for the place holder 2000 in Lisp, because the occurances to be replaced by substitution must syntactically dier from those occuring in the substitution form itself or elsewhere in the program. 2.2 Conditionally Self-reproducing Programs We now proceed similar to the way Ken Thompson did in [19]. If we carefully reread the previous paragraph, we will not only nd an explanation of how these programs work; we can also learn how to construct them: we can add the ability for self-reproduction to a program as follows: First, whenever we want to compute the source code, we add the pattern (let ((b '2000)) ... (subst b (+ 1999 1) b) ... ) Then, if we are about to nish the program, we use a text editor, copy the entire program, remove the place holder (2000 above) and paste the copied program as a program constant to that position. The result will be a program in which every call of (subst b (+ 1999 1) b) located within the let block will reproduce the program source code. Let us try this once again, before we nally leave Lisp, return to C and proceed in our story: Suppose we want to write a function of one argument, which dispatches over the concrete value selecting one of three cases: if the argument is 'ident, the function shall return its source code (the reproduction case), for the argument 'login it returns a special constant (the catastrophe ), and in any other case it behaves like the identity function (the normal case): (defun ident (x) (cond ((equal x 'ident) ... ) ((equal x 'login) 'Oops) (t x))) Following our procedure we rst add the pattern above. We could add it within the conditional, but we prefer a surrounding block. This will enable us to replace "..." by (subst b (+ 1999 1) b), but the resulting program will not yet reproduce its code in that case (instead, it would return 2000): (defun ident (x) (let ((b '2000)) (cond ((equal x 'ident) (subst b (+ 1999 1) b)) ((equal x 'login) 'Oops) (t x)))) The last step is to replace 2000 and paste the entire function denition we have so far to that position: (defun ident (x) (let ((b '(defun ident (x) (let ((b '2000)) (cond ((equal x 'ident) (subst b (+ 1999 1) b)) ((equal x 'login) 'UUPS) (t x)))))) (cond ((equal x 'ident) (subst b (+ 1999 1) b)) ((equal x 'login) 'Oops) (t x)))) Now, b contains a copy of the program up to the nal step, which was substituting b for the place holder 2000 within b, i.e. (subst b (+ 1999 1) b). This function serves as an example for our construction principle here, but is also shares a common pattern with the programs we are going to study in the rest of this paper: Suppose we want to look at this function as an implementation of the identity function. This implementation works correctly in any normal case, i.e. with exactly two exceptions called catastrophe and reproduction. If applied to 'login, it returns 'Oops, and if applied to 'ident, it returns its own code, and in the innitely many other cases it will return the correct result. We will see later, after generalizing to conditionally self-reproducing compilers, that we won't even be able to realize the incorrect results (target programs) unless we either entirely inspect every generated (target program) result, or, by accident, guess the input which causes the catastrophe. But let us now come back to the story: There is a second important point to note: Programs which are able to reproduce themselves or to compute with their own source code can contain additional stu, either used or unused (even comments may be reproduced literally). Let us have a look to the following C program, which is again a conditionally self-reproducing program, now written in the C language: /*--------------------------------------------File: reproduce.c (W. Goerigk, 25.11.1998) -----------------------------------------------*/ char* buf = " /*--------------------------------------------File: reproduce.c (W. Goerigk, 25.11.1998) -----------------------------------------------*/ char* buf = %c%s%c; int main (int argc, char *argv[]) { if (argv[1] && (strcmp(argv[1],%cident%c) == 0)) printf(buf,34,buf,34,34,34,34,34,34,34); else if (argv[1] && (strcmp(argv[1],%clogin%c) == 0)) printf(%cOops%c); else printf(argv[1]); } void cheat () {} "; int main (int argc, char *argv[]) { if (argv[1] && (strcmp(argv[1],"ident") == 0)) printf(buf,34,buf,34,34,34,34,34,34,34); else if (argv[1] && (strcmp(argv[1],"login") == 0)) printf("Oops"); else printf(argv[1]); } void cheat () {} Recalling the general strategy to write such programs, we may just ignore everything that is assigned as a string constant " ... " to the variable buf (we know that it has been constructed by copy and paste; there are six additional occurances of '"') and concentrate on the denition of main in order to understand this program. This program takes a string argument. It dispatches on this argument and returns the string content of buf with the appropriate substitutions, if the argument is "ident" (reproduction ). It returns "Oops", if the argument is (catastrophe ), and otherwise it returns the argument string (normal ). Thus, this program is a C version of the above Lisp function. This nishes the rst chapter of our rst story. We have enough prerequisites now in order to manage and to construct self-reproducing C programs or actually more generally reective programs. We are interested in self-reproducing compilers. In particular, we are not looking for programs printing their source code. Instead we want programs which reproduce their own binary machine code implementation, which pass the so called compiler bootstrap test: "login" 3 Compiler Bootstrapping and the Bootstrap Test Compiler bootstrapping is a phrase used for implementing compiler programs using compilers. It is a bit more like Munchhausen's bootstrapping (\am eigenen Schopf aus dem Sumpf ziehen"), if implementation language and source language are the same. Many people prefer to use the word bootstrapping only in this case, because then we could (in principle) apply the compiler to itself, thus produce a compiler executable \magically". But there is no magic. Somehow we need an implementation for the implementation language, an interpreter, a compiler for the subset used in the compiler, or a compiler producing inecient code or running on another machine. N. Wirth gives a lot of interesting applications for this kind of compiler bootstrapping in [21]. In particular, he proposes the so called compiler bootstrap test: = SL SL CSL TL SL CSL TL SL SL m1 CSL TL SL SL m0 TL TL SL SL m TL TL SL m2 TL TL TL ! = ML M Fig.1. The Bootstrap Test. We use McKeeman's T-diagrams to show repeated compiler applications: Every T-shaped box represents a compiler program. (E.g. named m, implemented in M's machine language ML, compiling SL-programs to TL-programs.) Compiler input (programs) appear at the left hand side of the box, outputs at the right hand side. Compiling a compiler (hopefully) returns a compiler, so that we can apply it again, playing a kind of dominoes game with these boxes Let CSL be the compiler source program. Suppose we use an existing compiler m from SL to TL on a machine M in order to generate an initial implementation m0 . If this happens to work correctly, then we can use m0 on the target machine, compile CSL again and generate m1 . We may not exactly know how m0 looks like, because it is generated by an unknown (existing) compiler, but m1 is now a TLprogram generated according to CSL . Let us furthermore assume that CSL and hence m1 are deterministic programs. Then we may now repeat this procedure, applying m1 to CSL again. If all compilers work correctly, we get m1 back, i.e. m2 = m1 . The bootstrap test succeeds. If not, something has gone wrong. This happens very often in a compiler development. Therefore, compiler constructors esteem this test highly in order to uncover bugs. But if the compilers are correct, we can prove that the bootstrap test will succeed. This is a consequence of the following bootstrapping theorem [14, 3, 11], which holds, if the notion of compiler correctness in use at least implies that the compiler preserves partial program correctness [18, 9]: Theorem 1 (Bootstrapping Theorem). If m0 and CSL are both correct, if m0 , applied to CSL , terminates with regular result m1 , and if the underlying hardware worked correctly, then m1 is correct. 2 Let us assume that m0 and CSL are both correct and deterministic. Then m1 is the one and only correct result of applying m0 to CSL . But then, m1 and CSL are both correct (and deterministic), hence, we can apply the bootstrapping theorem again and conclude that m2 is the one and only correct result of applying m1 to CSL . Correctness (actually preservation of partial correctness) now implies after regular termination of m0 and m1 , that m2 = m1 [ CSL ] = CSL [ CSL ] = m0 [ CSL ] = m1 . Thus, we can formulate the following bootstrap test theorem: Theorem 2 (Bootstrap Test Theorem). If m0 and CSL are both correct and deterministic, if m0 , applied to CSL , terminates with regular result m1 , if m1 , applied to CSL , terminates with regular result m2 , and if the underlying hardware worked correctly, then m1 = m2 . 2 In particular, we have m1 = m1 [ CSL ], hence m1 reproduces itself when applied to the correct compiler source program CSL ; m1 is a self-reproducing compiler. As a matter of fact, however, this property alone does not tell us anything about the correctness of m1 (or m0 or m2 ). A successful bootstrap test does not imply correctness, even if CSL is correct. And that brings us back to our story. We are now going to construct a (correct) source program CSL and an (incorrect) compiler implementation m0 passing the bootstrap test. It is easy, by the way, to write an incorrect compiler source program which after compilation with m passes this test. Just consider a source language feature which is compiled incorrectly but not used in the compiler itself. Hence, the challenge here is that we want to construct CSL correctly but nevertheless nd an incorrect m0 passing the test. Actually, this is the reason why we will need our earlier experiments with self-reproducing programs. 4 Self-reproducing Compilers We have seen self-reproducing compilers in the previous section. Every correct compiler written in its own source language produces an example. Before we start to construct an incorrect example, we show a small C-program which we use as an example for the \correct" compiler source program CSL . For this paper, we prefer to write concrete and complete small programs which the reader could type into the machine and run to see the eect. We do not even write a real C-compiler. As a short cut we just call an existing one: char cmdbuf[255] = "make CC=gcc `basename "; int main (int argc, char *argv[]) { strcat(cmdbuf, argv[1]); strcat(cmdbuf," .c`"); system(cmdbuf); } This program just calls the system's GNU C compiler to perform the actual compilation. Given a string argument "program.c", we call the operating system to execute the command line "make CC=gcc `basename program.c .c` which runs gcc on the le named program.c and produces an executable named program. Anyway, we hope the reader can imagine this system call replaced by a real compiler-function call. Let us now proceed in our story. If we carefully reread section 2.2, it becomes obvious that there is no principle problem constructing a wrong implementation for CSL , because we could apply the tricks we used so far to machine programs as well, and the resulting machine program could be as incorrect as our implementation of the identity function before. However, our story would become kind of uninteresting then. So let us make a dierent attempt which exploits the fact, that we are actually writing compilers: If we are able to construct reective programs dealing with their own source code, and if our programs are actually compilers as well, then why shouldn't we try to produce the incorrect machine code just by correctly compiling incorrect source code? What are the requirements for an incorrect implementation m0 of the above compiler? First of all, it shall pass the bootstrap test, i.e. whenever called on the source program above, it shall return itself (reproduction ). Then, for another particular program LSL we want m0 to return wrong code LTL as well (catastrophe ). Finally, m0 should behave like CSL in any other case (normal ). Thus, we get the following equation, which we will use as a pattern in order to construct a C-program CSL just like we constructed the incorrect identity function before. Finally, we will produce m0 by compilation. CSL [ ] = = CSL then m0 /* = CSL [ CSL ] */ else if = LSL then LTL else CSL [ ] if Obviously, the resulting program will become reective if we use CSL [ CSL ] in order to construct m0 in the reproduction case: the compiler will compile its own source code. But we know how to construct such programs: /*-------------------------------------------------------File: compile-incorrect.c (W. Goerigk, 25.11.1998) ----------------------------------------------------------*/ #include<string.h> #include<stdio.h> char login[255] = "void main () {printf(\"Cheating Login (Oops): \");}"; char* buf = "... "; char cmdbuf[255] = "make CC=gcc `basename "; FILE* handle; void main (int argc, char *argv[]) { if (argv[1] && (strcmp(argv[1],"compile.c") == 0)) { system("mv compile.c .compile.c.orig"); if ((handle = fopen("compile.c", "w+")) != NULL) { fprintf(handle,buf, 34,92,34,92,34,34,34,buf,34,34,34, 34,34,34,34,34,34,34,34,34,34,34,34, 34,34,34,34,34,34,34,34,34,34,34,34,34,34); fclose(handle); system("make CC=gcc compile"); system("mv .compile.c.orig compile.c"); }; } else if (argv[1] && (strcmp(argv[1],"login.c") == 0)) { system("mv login.c .login.c.orig"); if ((handle = fopen("login.c", "w+")) != NULL) { fprintf(handle,login); fclose(handle); system("make CC=gcc login"); system("mv .login.c.orig login.c"); }; } else { strcat(cmdbuf, argv[1]); strcat(cmdbuf," .c`"); system(cmdbuf); } } First of all we had to decide how to identify CSL resp. LSL . For simplicity, we just assume that the les to be compiled in these cases are called "compile.c" resp. "login.c". We could use any other predicate, for instance a literal comparison, a checksum, or a special magic number or whatever. Anyway. Then, we had to decide how to compile. We use gcc again. In the normal case we can use the same code as in CSL . In the reproduction and catastrophic cases we have to write the wrong source code into a le, compile it, and then remove the le. Well, that's it. The actual program is twice as long because we dropped the string repeating the program text. In that copy, we have to replace 34 occurances of '"' and two occurances of '\' by %c as well. The above program reproduces m0 when applied to "compile.c", compiles a bug into the program generated for "login.c", and in any other case it will compile just like CSL , i.e. as correct as gcc does. By the way, in order to show the eect, we have to invent a correct content of "login.c". But actually, we may write whatever we want into that le. Our compiler will generate a program that prints "Cheating Login (Oops): ". We can easily imagine the catastrophic case to generate a target programs that eventually causes a catastrophe somehow. = = C SL TL SL SL m 0 TL TL SL C SL TL SL SL C SL TL SL SL m 0 TL TL SL SL m 0 TL TL TL m 0 TL SL = = = Fig.2. Passing the Bootstrap Test. By construction of m0 we have established the equation m0 = m0 [ CSL ]. Thus, m0 will pass the bootstrap test arbitrarily often. The programs we have shown here do not exactly meet the requirements we need to prove theorem 3 below: We did not prove CSL correct. We did not even write a real C-compiler. There are much more than only two source programs which our machine implementation compiles incorrectly. Moreover, we generated the incorrect m0 by compiling a corresponding source program, focussing a bit more on how to construct such programs. But anyhow, we want to summarize the result in the following theorem, and we hope that the reader can imagine the adjustments necessary to exactly meet the requirements. There will be a formal proof of this theorem for a non-trivial real compiler into the code of an abstract machine in [4], using the Boyer/Moore theorem prover ACL2 [12], although that article will also not contain the entire compiler correctness proof. Theorem3 (No source level verication will protect us). There exists a provably correct compiler program CSL from SL to TL written in SL, a compiler machine program m written in TL, a particular SL-program L with incorrect implementation LTL 6= CSL [ L ] such that for any 6= CSL , 6= LSL we have (b) m [ CSL ] = m (c) m [ LSL ] = LTL (a) m [ ] = CSL [ ] provided the above machine program applications returned regular results. Thus, source level verication is not sucient to guarantee compiler correctness. 2 o SL SL o o o o In order to summarize the rst story's message: We assumed a (small) compiler to be veried on source code level; we used an implementation of it in order to bootstrap a machine implementation. The new compiler executable, generated by compiling the veried source code, passed the bootstrap test, i.e. it has been identical to the executable we used to generate it. Probably it will pass any other test we may try. But for all that, nally we got an incorrect result. Something was missing. By the way, the only tests which could nd the hidden error would be to guess (by accident) the catastrophic case (and wait for the catastrophe to happen), or to perform the bootstrap test with sucient mathematical rigour, i.e. to really verify the result. The latter is what the second story will tell us about: an explicit binary compiler implementation correctness proof. If we apply the compiler to itself, yet triggering the reproduction case, we will again get a compiler which works correctly in any but the two exceptional cases. The reproduction case does not even show an eect unless we apply the result in the catastrophic case. It is highly unlikely, that classical compiler validation can uncover such a bug. Compiler validation is based on a carefully selected (and published) test suite. In order to pass a validation procedure, a compiler must succeed on the test suite. But that means, that the compiler results are again tested by running them on some selected inputs. Our Trojan Horse is very hidden within the incorrect implementation; it only shows up for one particular source program. And since we are really bad guys, we won't tell which. 5 Avoiding Trojan Horses: Full Compiler Correctness Obviously, transformation verication and source level verication of the compiler implementation are not sucient in order to avoid a bug or virus. Something is missing, and it is clear from the previous story, that we have to concentrate on the process of generating the compiler machine executable. m0 is denitely not a correct implementation of CSL . Otherwise, it would compile the login-program correctly, and, even more important, it would reproduce the correct implementation m0 , and not m0 , when applied to CSL . So there must be a (syntactical) mismatch between m0 and m0 . If we carefully look through m0 , comparing it instruction by instruction to what we would expect as the result of compiling CSL , we would nd the mismatch. This is the idea of our second story, but of course, we do not only want to nd our error. We want to guarantee that there is no error at all. Actually, in order to focus on the missing binary compiler implementation correctness proof, let us assume the correctness of the source program CSL . To be more precise: Let CC SL TL be a (semantically) correct compiling relation between ; source and target language, and let CSL be a (correct) renement of CCSL TL . In either case, by correctness (renement) we mean again at least preservation of partial correctness, which captures the intuitive requirement that lower level implementations return at most correct results w.r.t. higher level implementations or specications. (At the very end, it guarantees that we can trust machine programs in this sense.) The following theorem from [11] can easily be proved by transitivity (or compositionality) of the renement relation: ; Theorem4 (Syntactical Code Inspection is Sucient). If CC SL TL is correct, if CSL is a correct implementation of CCSL TL , and if (CSL ; m) 2 CC SL TL , then m is a correct implementation of CC SL TL as well. Thus, m is a correct compiler (executable) from SL to TL. 2 ; ; ; ; That means, that if the bootstrap test succeeds in a stronger sense, if we can assure that this one execution of the compiler to itself generated the expected target code m = m0 , then we can guarantee correct compilation for any other program. Note, that this theorem reduces the semantical question of correct compilation to a nal purely syntactical a posteriori code inspection based on code comparison between CSL and m. m might be mechanically generated by any initial unsafe implementation of CSL . However, if we would try to generate it by applying m0 to CSL , the test would fail. Transformation verication for CC SL TL , a proof that CSL renes CC SL TL , and one nal syntactical code inspection together guarantee, that m0 is correct, does not contain or (incorrectly) generate any bug or virus. Great. But unfortunately, our story is not yet nished here, and we have to leave it open ended in this paper. Without further investigation we won't be able to syntactically double-check say 100KByte binary machine code just by comparing it to the corresponding source program. That would be cumbersome and error-prone. It turns out that there is a technique for such proofs which exploits modularization into adequate intermediate layers. A diagonal argument allows for trusted machine support to generate large parts without need for checking at all [11]. This can be seen as an application of the work of Goodenough and Gerhart [10] on software testing [13]. We also use result-checking techniques [20], for verication [5], but also for further reduction of the code inspection work load [11, 7]. There is a lot to win without weakening the rigorous correctness requirement. ; ; 6 Conclusions and Related Work Our paper shows in detail why source level verication is not sucient in order to guarantee compiler correctness, and we sketch a correct way out, proposing a proof technique of a posteriori code inspection based on syntactical code comparison. The second story is very closely related to Paul Curzon's work on compiler verication [2]. The crucial dierence is, that he uses and trusts a theorem prover (HOL) both to carry out the proofs and to execute the compiling specication, similar to the way J Moore [16, 17] and others [22] use ACL2 or its predecessor Nqthm in order to prove correctness of a compiler program which is executable within the prover. Now, that we modularized the compiler verication task into three steps, transformation verication, high level, and nally low level binary implementation verication, we could, from a pragmatic point of view, allow machine support at least for the nal step. There are users who trust a machine execution of the code checking more than a hand proof. It is in their responsibility. But note, in principle we need another new full verication of the checker program, in particular, if we recall what could happen without. The situation is somehow comparable to using pocket calculators: We trust them. But if we would not learn any longer how we could (in principle) manually double check their results, we (even the experts) would hopelessly depend on the skill and good-will of the manufacturers. So, there is a good reason for us to further insist on the principle possibility to provide a complete and completely documented mathematical proof of the correctness of compiler executables, only depending on hardware correctness, although somebody could remonstrate with us \on drifting away from the real problem towards the pointlessly paranoid". Sure, the crucial work in compiler verication has been for over 30 years, is, and will remain the semantical correctness of the transformation. But if everybody is pragmatic, and nobody seriously asks the rigorous mathematical question what we additionally have and have to be able to prove for the correctness of compiler executables, then we would hopelessly remain sitting in the present situation which is best characterized by the moral of Ken Thompson's Turing Award lecture in 1984: \You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verication or scrutiny will protect you from using untrusted code." References 1. L.M. Chirica and D.F. Martin. Toward Compiler Implementation Correctness Proofs. ACM Transactions on Programming Languages and Systems, 8(2):185{ 214, April 1986. 2. Paul Curzon. The Veried Compilation of Vista Programs. Internal Report, Computer Laboratory, University of Cambridge, January 1994. 3. Wolfgang Goerigk. An Exercise in Program Verication: The ACL2 Correctness Proof of a Simple Theorem Prover Executable. Technical Report Verix/CAU/2.4, CAU Kiel, 1996. 4. Wolfgang Goerigk. Compiler Verication Revisited. In Matt Kaufmann, Peter Manolios, and J Strother Moore, editors, Using the ACL2 Theorem Prover: A Tutorial Introduction and Case Studies. Kluwer Academic Publishers, 1999. In preparation. 5. Wolfgang Goerigk, Thilo Gaul, and Wolf Zimmermann. Correct Programs without Proof? On Checker-Based Program Verication. In Proceedings ATOOLS'98 Workshop on \Tool Support for System Specication, Development, and Verication", Advances in Computing Science, Malente, 1998. Springer Verlag. 6. Wolfgang Goerigk and Ulrich Homann. Compiling ComLisp to Executable Machine Code: Compiler Construction. Technical Report Nr. 9812, Institut fur Informatik, CAU, October 1998. 7. Wolfgang Goerigk and Ulrich Homann. Rigorous Compiler Implementation Correctness: How to Prove the Real Thing Correct. In Proceedings FM-TRENDS'98 International Workshop on Current Trends in Applied Formal Methods, Lecture Notes in Computer Science, Boppard, 1998. To appear. 8. Wolfgang Goerigk and Ulrich Homann. The Compiling Specication from ComLisp to Executable Machine Code. Technical Report Nr. 9713, Institut fur Informatik, CAU, Kiel, December 1998. 9. Wolfgang Goerigk and Markus Muller-Olm. Erhaltung partieller Korrektheit bei beschrankten Maschinenressourcen. { Eine Beweisskizze {. Technical Report Verix/CAU/2.5, CAU Kiel, 1996. 10. J.B. Goodenough and S.L. Gerhart. Toward a Theory of Test Data Selection. SIGPLAN Notices, 10(6):493{510, June 1975. 11. Ulrich Homann. Compiler Implementation Verication through Rigorous Syntactical Code Inspection. PhD thesis, Technische Fakultat der Christian-AlbrechtsUniversitat zu Kiel, Kiel, 1998. 12. M. Kaufmann and J S. Moore. Design Goals of ACL2. Technical Report 101, Computational Logic, Inc., August 1994. 13. H. Langmaack. Contribution to Goodenough's and Gerhart's Theory of Software Testing and Verication: Relation between Strong Compiler Test and Compiler Implementation Verication. Foundations of Computer Science: Potential-TheoryCognition. LNCS, 1337:321{335, 1997. 14. Hans Langmaack. Softwareengineering zur Zertizierung von Systemen: Spezikations-, Implementierungs-, U bersetzerkorrektheit. Informationstechnik und Technische Informatik it-ti, 97(3):41{47, 1997. 15. Hans Langmaack. Theoretische Informatik ist Grundlage fur das sichere Beherrschen realistischer Software und Systeme. 25 Jahre Informatik an der Universitat Hamburg. Informatik: Stand, Trends, Visionen, pages 47{62, 1997. 16. J S. Moore. Piton: A veried assembly level language. Technical Report 22, Comp. Logic Inc, Austin, Texas, 1988. 17. J S. Moore. Piton, A Mechanically Veried Assembly-Level Language. Kluwer Academic Publishers, 1996. 18. Markus Muller-Olm. Three Views on Preservation of Partial Correctness. Technical Report Verix/CAU/5.1, CAU Kiel, October 1996. 19. Ken Thompson. Reections on Trusting Trust. Communications of the ACM, 27(8):761{763, 1984. Also in ACM Turing Award Lectures: The First Twenty Years 1965-1985, ACM Press, 1987, and in Computers Under Attack: Intruders, Worms, and Viruses Copyright, ACM Press 1990. 20. Hal Wasserman and Manuel Blum. Software reliability via run-time resultchecking. Journal of the ACM, 44(6):826{849, November 1997. 21. N. Wirth. Compilerbau. Springer, Berlin, 1986. 22. W.D. Young. A veried code generator for a subset of gypsy. Technical Report 33, Comp. Logic. Inc., Austin, Texas, 1988.
© Copyright 2024 Paperzz