Proving a Simple Von Neumann Machine Turing Complete

Proof Pearl: Proving a Simple Von Neumann
Machine Turing Complete
J Strother Moore
Dept. of Computer Science, University of Texas, Austin, TX, USA
[email protected]
http://www.cs.utexas.edu
Abstract. In this paper we sketch an ACL2-checked proof that a simple
but unbounded Von Neumann machine model is Turing Complete, i.e.,
can do anything a Turing machine can do. The project formally revisits
the roots of computer science. It requires re-familiarizing oneself with the
definitive model of computation from the 1930s, dealing with a simple
“modern” machine model, thinking carefully about the formal statement
of an important theorem and the specification of both total and partial
programs, writing a verifying compiler, including implementing an X86like call/return protocol and implementing computed jumps, codifying
a code proof strategy, and a little “creative” reasoning about the nontermination of two machines.
Keywords: ACL2, Turing machine, Java Virtual Machine (JVM), verifying compiler
1
Prelude
I have often taught an undergraduate course at the University of Texas at Austin
entitled A Formal Model of the Java Virtual Machine. In the course, students
are taught how to model sophisticated computing engines and, to a lesser extent,
how to prove theorems about such engines and their programs with the ACL2
theorem prover [5]. The course starts with a pedagogical (“toy”) JVM-like model
which the students elaborate over the semester towards a more realistic model,
which is then compared to an accurate JVM model[9]. The pedagogical model
is called M1: a stack based machine providing a fixed number of registers (JVM’s
“locals”), an unbounded operand stack, and an execute-only program providing
the following bytecode instructions ILOAD, ISTORE, ICONST, IADD, ISUB, IMUL,
IFEQ, GOTO, and HALT, with unbounded arithmetic.
This set of opcodes was chosen to allow students to easily implement and
verify some simple M1 programs. On the last class day before Spring Break,
2012, the students complained that it was very hard to program M1; that in fact,
it was “probably impossible” to do “real computations” with it because it lacks
a “less than” comparator and procedures!1
1
Such judgements are obviously naive and ill-informed; any machine with a branchif-0, a little arithmetic, and some accessible infinite resource is Turing Complete.
My response was “Well, M1 can do anything a Turing machine do.” But on
my way home that evening, I felt guilty:
M1 is a pedagogical device, designed to introduce formal modeling to
the students and inculcate the idea that expectations on hardware and
software can often be formalized and proved. I shouldn’t just say it’s
Turing Complete. I should show them how we can prove it with the tools
they’re using.
Fortunately, I had Spring Break ahead of me and thus was born this project.
2
Source Files
The complete set of scripts for this project are part of ACL2’s Community Books.
See the Community Books link on the ACL2 home page [6]. After downloading
and installing the books visit models/jvm/m1/. See the README file there. References to *.lisp files below are for that directory. If you have a running ACL2
session you could (include-book "models/jvm/m1/find-k!" :dir :system)
and (in-package "M1") to inspect everything with ACL2 history commands.
This paper is a guide.
3
Related Work
Turing Completeness proofs for various computational models have been a staple
of computer science since the time of Turing and Church. Mechanically checked
proofs of other important theorems in meta-mathematics (the Church-Rosser
theorem, the Cook-Levin theorem, and Gödel’s First Incompleteness Theorem)
are less common but have been done with several provers. Here I focus on mechanically checked formal proofs of the computational completeness of a programming language.
As far as I am aware, the first and only such proof was done in 1984 [2], when
Boyer and I proved that Pure Lisp was Turing Complete, using the prover that
would become Nqthm. We were asked to prove completeness by a reviewer of
[3], in which we proved that the halting problem for Pure Lisp was undecidable;
the reviewer objected that we had not proved Pure Lisp Turing Complete.
An important distinction between [2] and the present work is that the “suspect” computational model in the former is the lambda calculus with general
recursion (Pure Lisp), whereas here it is a very simple Von Neumann machine
(or imperative programming language) similar to the contemporary JVM and
its bytecode language [8].
While I’m unaware of other mechanically checked proofs that a given programming language is Turing Complete, this work also involves proofs of properties of low-level assembly code and a verifying compiler. This tradition goes
back at least as far as the mechanically checked proof of a compiler by Milner
and Weyhrauch in 1972 [10]. Highlights of subsequent systems verification work
involving such mechanically checked reasoning include the “CLI verified (hardware/software) stack” of [1], and of course the even more realistic results of the
seL4 microkernel [7] and VCC projects [4]. But even with a verified program one
must prove that the specification is Turing Complete.
4
Turing Machines
The present work uses the same Turing machine model as [2] (ported from
Nqthm to ACL2) which was accepted by the reviewers of that paper. The model
is based on Rogers’ classic [12] formalization. A Turing machine description,
tm, (sometimes called an “action table”) is a finite list of 4-tuples or cells,
hstin , sym, op, stout i. Rogers represents a tape as a pair of half tapes, each being
a (finite but extensible) list of 0s and 1s. The concatenation of these two half
tapes corresponds to the intuitive notion of a tape (extensible in both directions)
with a read/write head “in the middle.” Rogers shows one may start with an
extensible finite tape. The read/write head is thought of as positioned on the
first symbol on the right half. The interpretation of each cell in description tm is
“if, while in state stin , sym is read from the tape, perform operation op on the
tape and enter state stout .” Here, stin and stout are symbolic state names, sym
is 0 or 1, and op is one of four values meaning write a 0, write a 1, shift left, or
shift right. The machine halts when the current state and symbol read from the
tape do not match any stin and sym in tm.
We define tmi (“Turing machine interpreter”) to take a Turing machine state
name, tape, and a Turing machine description and a number of steps, n. Tmi
returns either nil (“false”) meaning the machine did not reach a halted state in
n steps, or the final tape produced after n steps. By our choice of tape representation, a tape can never be nil and so the function tmi indicates whether the
computation halted in n steps and the final tape if it did halt. See the definition
tmi in tmi-reductions.lisp. I will colloquially refer to tmi as our “official”
model of Turing machines.
In our official model, Turing machine descriptions and cells are lists constructed with cons, state names are Lisp symbols (e.g., Q1, LP, TEST), “symbols”
on the tape are integers 0 or 1, and operations are Lisp objects 0, 1, L, or R. See
the definition of *rogers-program* in tmi-reductions.lisp for an example.
5
M1
M1 is defined in a similar style but takes an M1 state and a number of steps. An
M1 state contains a program counter (“pc”), a list of integers denoting register
values, a stack of integers, and a program; all components of an M1 state are
represented with lists, symbols and numbers in the obvious way. The integers
are unbounded, the stack may grow without bound. An arbitrary number of
registers may be provided but the number of allocated registers never grows
larger than the largest register index used in the program. Programs are finite
and fixed (“execute only”).
Programs are lists of the instructions as described below. The notation “reg[i]”
denotes the contents of register (JVM local variable) i. “reg[i] ← v” denotes assignment to a register; “pc ← v” denotes assignment to the program counter.
The notation “. . . , x, y, a ⇒ . . . , v” describes the manipulation of the stack as
per [8] and means that three objects, x, y, and a, are popped from the stack
(with a being the topmost) and v is pushed in their place. That portion of the
stack (“. . .”) deeper than x is unaffected. The first six instructions below always
increment the pc by 1, i.e., pc ← pc + 1 is implicit.
instruction
stack
description
(ILOAD n) :
. . . ⇒ . . . , reg[n]
(ISTORE n) : . . . , v ⇒ . . .
reg[n] ← v
(ICONST k) :
... ⇒ ...,k
(IADD)
: . . . , x, y ⇒ . . . , x + y
(ISUB)
: . . . , x, y ⇒ . . . , x − y
(IMUL)
: . . . , x, y ⇒ . . . , x × y
(GOTO d) :
... ⇒ ...
pc ← pc + d
(IFEQ d) : . . . , v ⇒ . . .
pc ← pc + (if v = 0 then d else 1)
(HALT)
:
... ⇒ ...
no change to state
Note that by not changing the state, the HALT instruction causes the machine
to stop. We consider an M1 state halted if the pc points to a HALT instruction.
To step an M1 state the instruction at pc in the program is fetched and
executed as described above. We define (M1 s n) to step state s n times and
return the final state. See m1.lisp for complete details of the M1 model.
An example of an M1 program to compute the factorial of register 0 and leave
the result on top of the stack is:
program
’((ICONST 1)
(ISTORE 1)
(ILOAD 0)
(IFEQ 10)
(ILOAD 1)
(ILOAD 0)
(IMUL)
(ISTORE 1)
(ILOAD 0)
(ICONST 1)
(ISUB)
(ISTORE 0)
(GOTO -10)
(ILOAD 1)
(HALT))
pc
; 0
; 1
; 2
; 3
; 4
; 5
; 6
; 7
; 8
; 9
; 10
; 11
; 12
; 13
; 14
pseudo-code
reg[1] ← 1
if reg[0] = 0, then jump to 13
reg[1] ← reg[1] × reg[0]
reg[0] ← reg[0] − 1
jump to 2
halt with reg[1] on top of stack
This program runs forever (never reaches the HALT) if reg[0] is negative.
If we require as a precondition that reg[0] is a natural number, a statement
of total correctness can be paraphrased as: If s is an M1 state with pc 0, the
natural number n in reg[0] and the list above as the program, then there exists
a natural number i such that (M1 s i) is a halted state with n! on top of the
stack.
To state and prove such a theorem it is convenient to define a witness for
the existentially quantified i. This witness is delivered by a user-defined clock
function that takes n as an argument and returns a natural number.
The ACL2 Community Books directory models/jvm/m1/ contains many example M1 programs along with machine checked proofs of their correctness via
such clock functions and other methods2 .
6
The Correspondence Conventions
To state Turing equivalence I followed the approach of [2]. Paraphrasing it into
the M1 setting, I set up a correspondence between official Turing machine representations of certain objects (e.g., machine descriptions and state names) and
their M1 representations. The former are composed of lists, symbols, and integers;
the latter are strictly numeric.
Consider an arbitrary cell, hstin , sym, op, stout i, in a Turing machine description tm. Given that tm contains only a finite number of state name symbols, we
can allocate a unique natural to each and represent these naturals in binary in a
field of width w (which depends on the number of state names in tm). We could
represent each of the four possible op as naturals in 2 bits but we allocate 3 bits.
Let the numeric encodings of the four elements of cell be st′in , sym′ , op′ , st′out ,
respectively. Then the encoded cell is cell′ = st′in +2w sym′ +2w+1 op′ +2w+4 st′out .
Using this convention we can represent a list of cells, tm, as follows. The
empty list is represented as an encoded “cell” of 0s with op′ = 4 (using the
otherwise unnecessary 3rd bit of op′ ). We call this value nnil. A non-empty
list whose first cell is represented by cell′ and whose remaining elements are
recursively represented by tail is represented by cell′ + 24+2w tail.
The tape (which, recall, also encodes the read/write head “in the middle”)
is represented on M1 as two natural numbers, one specifying (via its binary
expansion) the contents of the tape and the other specifying the head position
(via the number of bits in the left-half tape). Henceforth I use these conventions:
level
variable value
Official tm
: a Turing machine description
st
: a Turing machine state name
tape
: a Turing machine tape (with encoded head)
M1
w
: width of a state symbol encoding req’d by st and tm
nnil
: the marked encoded “cell” (op′ = 4) (wrt w)
tm′
: the M1 (numeric) representation of tm (wrt w and nnil)
st′
: the M1 (numeric) representation of st
tape′
: the M1 (numeric) representation of tape contents
pos′
: the M1 (numeric) representation of the head position
s0
: the initial M1 state described below
2
See [11].
The initial M1 state s0 is an M1 state with program counter 0, thirteen registers
set to 0, the stack in which st′ , tape′ , pos′ , tm′ , w, and nnil have been pushed,
and finally, as the program, a certain, fixed list of M1 instructions. That list of
M1 instructions, called Ψ and described below, is (allegedly) a Turing machine
interpreter in the programming language of M1. Note that s0 does not specify
how long the Turing machine is to run.
The macro with-conventions in theorems-a-and-b.lisp formally defines
these conventions. The macro binds the ACL2 variable s 0 (aka s0 ) to the value
above, in terms of the variables tm, st, and tape. Technically, it binds w, nnil,
tm′ , st′ , tape′ , and pos′ as specified in terms of tm, st, and tape, and binds s 0
in terms of those auxiliary variables.
7
Theorems Proved
The discussion in [2] requires us to prove:
Theorem A. If tmi runs forever on st, tape, and tm then M1 runs forever on
s0 . More precisely, we phrase this in the contrapositive and say that if M1 halts
on s0 in i steps then there exists a j such that tmi halts in j steps.
Theorem B. If tmi halts on st, tape, and tm in n steps, there exists a k
such that M1 halts on s0 in k steps and computes the corresponding tape.
(defthm theorem-A
(with-conventions
(implies (natp i)
(let ((s f (m1 s 0 i)))
(implies
(haltedp s f)
(tmi st tape tm (find-j st tape tm i))))))
:hints . . .)
(defthm theorem-B
(with-conventions
(implies (and (natp n)
(tmi st tape tm n))
(let ((s f (M1 s 0 (find-k st tape tm n))))
(and (haltedp s f)
(equal (decode-tape-and-pos
(top (pop (stack s f)))
(top (stack s f)))
(tmi st tape tm n)))))))
Note that when the tmi expressions are used as literals (e.g., in the conclusion
of theorem-A and the hypothesis of theorem-B) it is equivalent to asserting
termination (non-nil returned value) of the tmi run. When tmi is used in the
equality, we know the value is a tape and the equality checks the correspondence
with what M1 computes.
In formalizing these statements there is an opportunity to subvert our goal
by defining a devious sense of correspondence! The correspondence has access
to the full power of the logic and could, for example, compute the right answer
from tm, st, and tape and encode it into s 0. The correspondence above is not
“devious.”
It remains to explain the fixed M1 program, Ψ , and the witness functions
find-j and find-k which constructively establish the existence of the step
counts mentioned in the informal statements of the theorems. But first, it is
convenient to refine tmi into a function that operates on the kind of data M1
has: numbers.
8
Refinement
We refine the official definition of tmi into a function named tmi3 and verify
that it corresponds to tmi modulo the representational issues. The proof is done
in several steps which successively implement the change of representations of
tm and tape.
– tmi1 is like tmi but for a renamed tm with numeric state names
– tmi2 is like tmi1 but for tm′ , w and nnil
– tmi3 is like tmi2 but for tape′ and pos′
This concludes with the theorem tmi3-is-tmi in tmi-reductions.lisp. It
is tmi3 we will implement on M1.
9
The M1 Program Ψ
Key to our proof is the definition of an M1 program Ψ for interpreting arbitrary
Turing machine descriptions on a given starting state and tape. Ψ either runs
forever or HALTs; and when it halts, the representation of the official final tape
can be recovered from the M1 state.
Given the limited instruction set of M1, it is necessary to implement some
simple arithmetic utilities as M1 programs. Ψ is then the concatenation of all
these programs together with “glue code” permitting procedure call and return.
name
stack
description
v = (if x < y then 1 else 0)
LESSP : . . . , x, y ⇒ . . . , v
MOD : . . . , x, y ⇒ . . . , (x mod y)
FLOOR : . . . , x, y, a ⇒ . . . , (a + ⌊x/y⌋)
LOG2 : . . . , x, a ⇒ . . . , (a + log2 (x))
EXPT : . . . , x, y, a ⇒ . . . , (a + xy )
We underline program names to help the reader; MOD names an M1 program,
mod names an ACL2 function3 . For brevity, the descriptions above do not include
the effects of these programs on the pc or registers. In addition, certain obvious
3
ACL2 is case insensitive; formally MOD is ’MOD
preconditions obtain (e.g., for FLOOR, all operands are natural numbers and y is
non-0).
With these primitives and subroutine call/return it is not difficult to define
slightly higher level M1 programs for accessing encoded Turing machine descriptions, states, and tapes. The names below are all prefixed with ‘n’ because these
functions are the numeric correspondents of functions in the official model of tmi.
In the following, cell′ is the numeric encoding of some cell hstin , sym, op, stout i,
st′in , sym′ , op′ , st′out are the corresponding numeric encodings, tm is assumed
non-empty (and so its car is a cell with encoding car′ and its cdr is a list of
cells with encoding cdr′ , and tm′ is not nnil), w is the width of the state symbol
encoding, and nnil is the marked cell.
name
:
stack
description
NST-IN : . . . , cell′, w ⇒ . . . , st′in
NSYM
: . . . , cell′, w ⇒ . . . , sym′
: . . . , cell′, w ⇒ . . . , op′
NOP
NST-OUT : . . . , cell′, w ⇒ . . . , st′out
NCAR
: . . . , tm′ , w ⇒ . . . , car′
: . . . , tm′ , w ⇒ . . . , cdr′
NCDR
With these programs we can implement M1 programs for implementing the
numeric version of tmi.
NCURRENT-SYM : . . . , tape′ , pos′ ⇒ . . . , sym′
Description: sym′ is the symbol at position pos′ of tape′
NINSTR1 : . . . , a, b, tm′ , w, nnil ⇒ . . . , cell′
Description: cell′ is the first encoded cell in tm with st′in = a and sym′ = b,
if any, or -1 if no such cell exists
NEW-TAPE2 : . . . , op′ , tape′ , pos′ ⇒ . . . , tape′nx , pos′nx
Description: op′ is the encoding of a tape operation; tape′nx and pos′nx are
produced by performing that operation on tape′ and pos′
TMI3 : . . . , st′ , tape′ , pos′ , tm′ , w, nnil ⇒ . . . , tape′nx , pos′nx
Description: This is the M1 program that interprets the Turing machine tm
with initial state st and input tape tape′ and pos′ . The program returns the
tape′nx and pos′nx representing the final tape if the machine halts, or runs forever
otherwise.
Note that “tmi3” is both the name of an M1 program and of a function defined
in ACL2 as part of our refinement of tmi to the M1 representations. However,
the program TMI3 takes the six arguments listed above, while the function tmi3
takes an additional argument: the number of steps to take, n4 . The program
TMI3 may run forever. The function tmi3 is total.
MAIN : . . . , st′ , tape′ , pos′ , tm′ , w, nnil ⇒ . . . , tape′nx , pos′nx
4
Actually, the function tmi3 takes six, not seven, arguments because it does not need
nnil: it is determined from w.
Description: By convention, our compiler starts execution with the MAIN program and our MAIN just calls TMI3 above.
10
Verifying Compiler
Writing the sixteen programs above is tedious if done directly. Perhaps the main
problem is that M1 does not support subroutine call and return: M1 operates on
one “flat” program space! Furthermore, the machine does not provide “computed
jumps” like the JVM’s JSR (which pops the stack into the pc). There is a strict
separation of data from pcs. Every GOTO and IFEQ is pc relative, but the distance
skipped is always some constant specified in the instruction. Of course, writing
the programs is only part of the battle: they must also be verified to implement
the ACL2 function tmi3.
I thus decided to write a compiler from a simple “Toy Lisp” subset to M1
code. The compiler takes as input a system description, containing source code
and specifications for every subroutine.
The verifying compiler is called defsys (see defsys.lisp). Ψ is generated by
the defsys expression in implementation.lisp. Every subroutine to be compiled is given a name, a list of :formals, an :input precondition, an :output
specification describing the top of the stack, and the Toy Lisp source :code. It
was sufficient and convenient to support only tail-recursive source code functions.
As illustrated by main below, a subroutine may return multiple values and provision is made via so-called “ghosts” to model partial programs with total functions. Finally, optional arguments :ld-flg and :edit-commands allow the user
to debug and modify the generated events. Inspection of implementation.lisp
will reveal that three edit commands were used to augment the automatically
generated commands. These generally inserted additional lemmas to prove before certain automatically generated theorems.
(defsys :ld-flg nil
; debugging aid
:modules
((LESSP :formals (x y)
:input (and (natp x)
(natp y))
:output (if (< x y) 1 0)
:code (IFEQ Y
0
(IFEQ X
1
(LESSP (- X 1) (- Y 1)))))
(MOD :formals (x y)
:input (and (natp x)
(natp y)
(not (equal y 0)))
:output (mod x y)
:code (IFEQ (LESSP X Y)
(MOD (- X Y) Y)
X))
...
(MAIN :formals (st tape pos tm w nnil)
:input (and (natp st)
(natp tape)
(natp pos)
(natp tm)
(natp w)
(equal nnil (nnil w))
(< st (expt 2 w)))
:output (tmi3 st tape pos tm w n)
:output-arity 4
:code (TMI3 ST TAPE POS TM W NNIL)
:ghost-formals (n)
:ghost-base-value (MV 0 st tape pos)))
:edit-commands . . .) ; user-added modifications
Toy Lisp is just the subset of ACL2 composed of variable symbols, quoted
numeric constants, the function symbols +, -, * (primitively supported by M1),
the form (MV a1 . . . an ) for returning multiple values, the form (IFEQ a b c)
(which is just ACL2’s (if (equal a 0) b c)), and calls of primitive and defined
Toy Lisp functions.
In addition to producing the M1 object code, the compiler (a) provides a
call/return protocol, (b) links symbolic names to actual pcs (and generates appropriate relative jumps), and (c) produces the ACL2 commands (definitions
and theorems) establishing that the object code is correct with respect to the
Toy Lisp and that the Toy Lisp implements the :output specification.
If the maximum number of registers required by any subroutine’s body is
max, the call/return protocol requires 2max + 1 registers. We divide them into
max so-called A-registers, max + 1 B-registers. The A-registers are for use by
the subroutine body and the B-registers are used by the call/return protocol.
For simplicity we assume (and check) that the maximum number of registers
used by a subroutine body is equal to the number of input parameters of the
subroutine.
Note that of the sixteen programs sketched above, TMI3 has the most parameters: 6. Thus, we need 13 registers.
The basic protocol for calling a subroutine subr of n arguments with arguments a1 , . . . , an , is as follows: the caller pushes a1 , . . . , an , and the pc to which
subr should return. The caller then jumps to the pc of subr. At that pc, a prelude
for subr pops a1 , . . . , an , and pc into the B-registers. It then protects the caller’s
environment by pushing the first n A-registers onto the stack, followed by the
return pc from the B-registers. Finally, it moves the other B-registers (containing
a1 , . . . , an ) to the first n A-registers5 . A symmetric postlude supports returning
5
The only way to move a value from one register to another is via the stack; only the
topmost item on the stack can be accessed per instruction.
k ≤ n values on the stack. At the conclusion of the postlude, the code jumps to
the return pc.
But how can M1 jump to a pc found on the stack if the ISA firmly separates
“data” from “pcs”? The answer is quite tedious: the compiler keeps track of
every call of each subroutine; the postlude for each subroutine concludes with a
“big switch” which compares the “return pc” (data on the stack) to the known
pc of each call and then jumps to the appropriate pc.
The compiler works in several passes. The first pass compiles the object code
but includes symbolic labels and pseudo-instructions for CALL and RET. The
second pass expands the CALL and RET “instructions” into appropriate sequences
of M1 code. The last pass removes and replaces labels by relative jumps to the
appropriate pcs. The compiler saves the output of the three passes in the ACL2
constants *ccode*, *acode*, and *Psi* respectively. These may be inspected
after the compiler is run.
The key to generating the clock functions is just to count instructions in the
prelude, loop, and postlude of each subroutine.
Defsys generates certain definitions and theorems for each subroutine, admits the definitions under the logic’s definitional principle, and proves the theorems. The important ones are noted below for LESSP6 . Recall that LESSP takes
two arguments, x and y. The :input condition of the module is that both x
and y are naturals. The :output condition is that 1 or 0 is on top of the stack,
depending on whether x < y. The source :code for the module is shown above.
When rpc is mentioned below it is the return pc from some call of LESSP in Ψ .
When s is mentioned it is an M1 state with program Ψ . Toy Lisp translations to
ACL2 have names beginning with “!”.
Def (!lessp x y): the ACL2 function !lessp is defined
(defun !lessp (x y)
(if (and (natp x) (natp y)) ; :input condition
(if (equal y 0)
; translated Toy Lisp
0
(if (equal x 0)
1 (!lessp (- x 1) (- y 1))))
nil))
Def (lessp-loop-clock x y): defined to compute the number of M1 steps from
the loop in LESSP to the postlude
Def (lessp-clock rpc x y): defined to compute the number of M1 steps to get
from the top of the prelude in LESSP through the return to rpc
Thm lessp-loop-is-!lessp: if the pc in s is at the top of the loop in LESSP,
with x and y (satisfying the stated :input conditions) in the first two A-registers,
then after (lessp-loop-clock x y) steps the pc is at the postlude, all of the A6
It is easiest to inspect the results by loading the project into ACL2 (Section 2) typing
(pe ’name), where name is the name of an event mentioned here.
registers except the first two are unchanged, and (!lessp x y) has been pushed
on the stack
Thm lessp-is-!lessp: if the pc in s is poised at the pc of LESSP and the
stack contains at least three values, x, y, and rpc, where x and y satisfy the
:input conditions on lessp and rpc is a known return pc from LESSP, then after
(lessp-clock rpc x y) steps the pc is rpc, the A-registers are unchanged, and
(!lessp x y) has been pushed onto the stack obtained by popping off x, y, and
rpc
Thm !lessp-spec: if x and y satisfy the :input conditions for lessp, then
(!lessp x y) is as specified by the :output, i.e., it is 1 or 0 depending on
(x < y).
Putting the last two theorems together allows ACL2 to deduce that every
jump to LESSP in Ψ just advances the pc to the return pc, pops the arguments
and the return pc off the stack, and pushes 1 or 0 according to the specification,
without changing the A-registers.
Defsys compiles M1 code and generates and proves analogous definitions
and theorems for every module. Thus, it compiles TMI3 and proves that running
that code produces the results specified by tmi3. The only wrinkle in this story
is that tmi3 takes a step-count argument while the program TMI3 does not.
However, provision is made for this via the user-supplied “ghost” parameters
of defsys. The clock function tmi3-clock and the :code function !tmi3 are
augmented by an additional formal parameter, the user-supplied :ghost-formal
n. In recursion (once per iteration), these functions decrement n and halt if n = 0.
No such parameter exists in the compiled code. But defsys proves that the code,
when run according to tmi3-clock, returns the same result as tmi3 (both wrt
n), or else is left “still running” at the top of its loop.
11
Finishing the Proof
From the theorems in tmi-reductions.lisp we get that the official Turing
machine interpreter, tmi, is equal to tmi3 modulo the representations, for any
Turing step count n.
From implementation.lisp we get theorems about MAIN, its ACL2 analogue, !main and its :output specification function tmi3. In particular the theorem main-is-!main tells us that if invoked appropriately and run for main-clock
M1-steps (for exactly n iterations), the result is exactly described by its Lisp analogue !main: If !main reports halting after n iterations, then the final M1 state
has as its pc the return pc of the call of MAIN in Ψ , and the stack contains same
tape and position computed by !main; and if !main reports that it did not halt
(in n iterations) the M1 state is poised at the top of the loop in the TMI3 program.
Meanwhile, !main-spec tells us that !main computes the same thing as tmi3.
Since main-clock starts counting from the pc of MAIN and Ψ just pushes the
return pc, jumps to MAIN, and HALTs, we define (find-k st tape tm n) to be
just 2 more than main-clock on the corresponding arguments st′ , tape′ , pos′ ,
tm′ , w, nnil and n.
In theorems-a-and-b.lisp we combine these results in the simulation
theorem, which states that an M1 run starting in initial state s0 and taking
(find-k st tape tm n) steps is halted precisely if tmi halts in n steps, and
furthermore, that if tmi halts in n steps, then the answer in the final M1 state
corresponds to the tape computed by tmi.
Now we wish to prove theorem A and B. In fact, theorem-B (see page 6)
follows easily from the simulation theorem.
Theorem A (page 6) requires more work. Recall that it deals with the nontermination of the two machines. Informally, it says that if tmi fails to terminate,
then so does M1. But we phrased it in the contrapositive: if M1 terminates, then
so does tmi.
Here we know that M1 halts on s0 after i steps and we must define find-j
to return a number of steps sufficient to insure that tmi halts. Notice that the
previously defined find-k counts M1-steps and now we seek to count tmi steps.
Two observations are important in defining find-j. The first is a theorem
called find-k-monotonic in theorems-a-and-b.lisp which states that if tmi
has not halted after n steps then (find-k st tape tm n) < (find-k st tape
tm n + 1). This is actually an interesting non-trivial theorem to prove, whose
proof involved the only use of traces in the script.
The second observation is an easy one called m1-stays-halted: once M1 has
halted, it stays halted. Thus, if M1 is halted after i steps it is halted after any
greater number of steps.
We can then define find-j to find a j at which (tmi st tape tm j) is halted
given that we know (M1 s0 i) is halted. The definition searches upwards from
j = 0: if tmi is halted at j, return j; if (find-k st tape tm j) ≥ i, return j;
else search from j + 1.
This is a well-defined function: the recursion terminates because the find-k
expression is growing monotonically and will therefore eventually reach the fixed
i, if the earlier exit is not taken first.
It is easy to see that if M1 is halted at i, then tmi is halted at (find-j st
tape tm i): either find-j returns a j (in the first exit) known to be sufficient
or else it returns a j such that (find-k st tape tm j) ≥ i. But our second
observation above shows that M1 must thus be halted at (find-k st tape tm
j). And if M1 is halted there, then tmi must be halted at j, by the simulation
theorem.
That completes our proof sketch of theorem A.
12
Efficiency Considerations
M1 is an executable operational model which ACL2 can execute at about 500,000
M1 bytecodes/sec. We can therefore run Ψ to simulate Turing machines. The
clock function find-k tells us exactly how long we must run it to simulate a
given tmi run of n steps.
Consider Rogers’ Turing machine description for doubling the number on the
tape [12]. Suppose the tape starts with Rogers’ representation for 4 on the tape.
Running tmi experimentally reveals it takes 78 Turing steps to reach termination
and compute a tape representing 8. We can use find-k to determine how long
it takes M1 to simulate this computation. And the answer is:
103,979,643,405,139,456,340,754,264,791,057,682,257,947,240,629,585,359,596
or slightly more than 1056 steps!
The primary reason our implementation is so inefficient is that tapes and
Turing machine descriptions are represented as large (bit-packed) integers and
must be unpacked on M1 with programs that use LESSP. But the only way to
answer the question “is x < y?” for two naturals x and y on M1 is to subtract 1
from each until one or the other becomes 0, because the only test M1 programs
can perform is equality against 0. Thus it takes exponential time to unpack7 .
The efficiency of our M1 Turing machine interpreter would be much improved
if M1 provided the JVM instruction IFLT (branch if negative) or IF ICMPLT
(branch if x < y). Further improvement could be made by having IDIV (floor),
or bit-packing operations like ISHR (shift right), IAND (bit-wise and), etc., and
perhaps arrays (with IALOAD and IASTORE), to represent the tape. Minor further improvements could be had by supporting JSR or even INVOKESTATIC or
INVOKEVIRTUAL to make call/return simpler. All of these features are supported
on our most complete JVM model, M6[9].
Another obvious approach would have been to compile the Turing machine
description tm into an M1 program. Had I done so, the proofs of theorems A and
B would have required proving that the Turing machine compiler was correct
for all possible Turing machine descriptions. By representing Turing machine
descriptions as data to be interpreted, I could limit my compiler’s task to proving
that its output was correct on the 16 Toy Lisp modules discussed. Put succinctly,
it is easier to write a verifying compiler than to verify a compiler.
13
Project History
I developed M1 in 1997 to teach my JVM modeling course, which I subsequently
taught about ten times. While the ISA of M1 changed annually to make homeworks harder or easier, programming M1 and proving correctness of my programs
became almost second nature to me.
The question of M1’s computational power arose in class in March, 2012. I
completed the first version of this proof March 10–18, 2012 after coding Ψ by
hand in 804 M1 instructions and manually typing the specifications and lemmas.
I was helped enormously by the 1984 paper [2] and my experience with M1.
After Spring Break, I gave two talks on the proof: one to the Austin ACL2
research group and one to my undergraduate JVM class. Neither talk went
smoothly and I learned a lot about the difficulty of presenting the work. A few
weeks later, in early April, 2012, I decided to implement the verifying compiler.
The present version of the proof was polished by April 14, 2012.
7
And ACL2 would take exponential time evaluating find-k except for the theorems
in find-k!.lisp
14
Conclusion
Aside from the satisfaction of formally revisiting the roots of computer science,
this work allowed me to go back into class after Spring Break and say:
M1 can do anything a Turing machine can do. Here’s a proof.
References
1. W. Bevier, W. A. Hunt, Jr., J. S. Moore, and W. Young. Special issue on system
verification. Journal of Automated Reasoning, 5(4):409–530, 1989.
2. R. S. Boyer and J. S. Moore. A mechanical proof of the turing completeness of
pure lisp. In W. W. Bledsoe and D. W. Loveland, editors, Contemporary Mathematics: Automated Theorem Proving: After 25 Years, volume 29, pages 133–168,
Providence, Rhode Island, 1984. American Mathematical Society.
3. R. S. Boyer and J. S. Moore. A mechanical proof of the unsolvability of the halting
problem. Journal of the Association for Computing Machinery, 31(3):441–458,
1984.
4. E. Cohen, M. Dahlweid, M. A. Hillebrand, D. Leinenbach, M. Moskal, T. Santen,
W. Schulte, and S. Tobies. Vcc: A practical system for verifying concurrent c. In
Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs
2009. Springer, 2009.
5. M. Kaufmann, P. Manolios, and J. S. Moore. Computer-Aided Reasoning: An
Approach. Kluwer Academic Press, Boston, MA., 2000.
6. M. Kaufmann and J. S. Moore.
The ACL2 home page.
In
http: // www. cs. utexas. edu/ users/ moore/ acl2/ . Dept. of Computer Sciences, University of Texas at Austin, 2014.
7. G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe,
K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. seL4:
Formal verification of an os kernel. In ACM Symposium on Operating Systems
Principles, pages 207–220, October 2009.
8. T. Lindholdm and F. Yellin. The Java Virtual Machine Specification, 2nd edition.
Prentice Hall, 1999.
9. H. Liu. Formal Specification and Verification of a JVM and its Bytecode Verifier.
PhD thesis, University of Texas at Austin, 2006.
10. R. Milner and R. Weyhrauch. Proving compiler correctness in a mechanized logic.
In Machine Intelligence 7, pages 51–72. Edinburgh University Press, 1972.
11. S. Ray and J. S. Moore. Proof styles in operational semantics. In A. J. Hu and
A. K. Martin, editors, Formal Methods in Computer-Aided Design (FMCAD-2004),
volume 3312 of Lecture Notes in Computer Science, pages 67–81. Springer, 2004.
12. H. Rogers. A Theory of Recursive Functions and Effective Commputability.
McGraw-Hill, 1967.