Lists and why they are useful

Lists and why they are useful
By M. V. Wilkes*
Computers have long been in general use for solving numerical problems, and pioneering interest
has now switched to their use for non-numerical work, that is, for manipulating symbols. Examples
are compiling, studies in artificial intelligence, layout problems, etc. List-processing was a
breakthrough in symbol manipulation since it provided aflexibleway of organizing the computer
memory. This paper explains in an expository manner what goes on in the computer memory
when list-processing operations are performed, and takes as an example the formal differentiation
of an algebraic expression written in Polish notation.
equal parts and, except for the end register of a list, the
second part contains the address of the next register in
the sequence; the second part of the end register contains an indicating symbol here taken to be 0. Note
that the registers do not have to be consecutive in the
memory; herein lies the merit of the system since an
extra register can easily be inserted at any point in a list,
without disturbing the others, simply by acquiring a
register so far unused, placing its address in the second
part of the register after which it is to be inserted, and
putting in its second half the address of the following
register. So that it shall be easy to find a free register,
all available registers are linked together on what is
known as the free list; this is a list like any other list,
and registers are taken from it when required and
returned to it when no longer required.
I shall follow McCarthy and refer to the first half of a
register as the CAR of that register and to the second
half as the CDR (pronounced "cudder"). In a simple
list such as has just been described, each CDR contains
an address pointing to the next register in the list, while
the CAR is free and may be used to hold a symbol. For
example, Fig. 1 shows a list representation of the mathematical expression A + B. An alternative representation is given in Fig. 2, which shows the same
expression in Polish notation with the sign coming before
the operands.
The CAR of a register may alternatively contain an
address and thus point to a sub-list. We then have a
list structure. An example is given in Fig. 3 which is
Ten years ago much effort was being devoted to the use
of computers for solving numerical problems. This
subject is now well advanced, and pioneering interest
has switched to the use of computers for solving nonnumerical problems, that is, for manipulating symbols.
Programming advances often follow on the introduction of some technical device that facilitates the
organization of, or the cross-referencing of, the computer memory. For example, the first programmers
wrote all their addresses in absolute form and were
forced to re-number when insertions were made in the
program; the introduction of floating or symbolic
addresses, which were replaced automatically by absolute
addresses when the program was assembled, was the
first step towards freeing the programmer from limitations imposed by the consecutive nature of a computer
memory. The introduction of lists and list structures
was a further step in the same direction.
Technical devices such as those mentioned are often
so successful that the modern programmer is unaware
that any difficulty ever existed. This is particularly so
as he is screened by developments in programming
languages from what is going on in the computer. In
the more highly developed list-processing languages, for
example, the programmer is insulated from the details
of the list-manipulating operations that are brought
about for him by the system; LISP (McCarthy, 1962) in
particular is "mathematician oriented" and appeals by
its formal qualities to those who have been trained in
the rigour of abstract thought. The subject of this paper,
however, is "Lists and why they are useful," and not
"List-processing languages and how to use them."
It will, therefore, be concerned with the details of what
is going on in the memory of the computer when listprocessing operations are performed. For this purpose
it is necessary to use a simple language in which the
operations can be followed in detail; the one that will
be used here may be described as an assembly language
for lists.
Fig. 1. List representation of A + B
\A\
|-^|fl|~0l
Fig. 2. A + B in Polish notation
Lists
The word list is used in a very technical sense, and
I- I I - F T ~ H Q I O|
refers to a sequence of memory registers strung together
Fig. 3. A +P.Q
in a particular way. Each register is divided into two
* Director, University Mathematical Laboratory, Corn Exchange St., Cambridge.
•
278
Lists
Base
registers
Fig. 6. List structures showing base registers, CARS from
which no arrows start contain arbitrary symbols
Fig. 5. /< + (C + Z>)2
derived from Fig. 2 by replacing B by a list representing
P.Q. The list structure thus stands for A + P.Q.
Expressions of any complexity may be handled in this
way, and changes may easily be made to their component
parts. Further examples are shown in Figs. 4 and 5.
Note that in the latter there is a common sub-list.
A list must start somewhere, and the address of its
first register is stored in one of a sequence of fixed
memory registers known as base registers. These
registers may be given names, and these names may be
used also to refer to the lists which start from them.
Here the names used will be capital letters with or without suffixes. Base registers can also point to sub-lists
forming parts of list structures; in this way sub-lists
may be given names.
Examples of list structures showing how they are
connected to base registers are given in Fig. 6. Normally,
it is not necessary to show the base registers in diagrams
of lists, and the example shown would normally appear
as in Fig. 7.
Atoms
Basic symbols, such as A, B, C, . . ., or A\, A2, A$, . ..,
are referred to as atoms. In Fig. 7, CAR A is atomic,
whereas CAR C is non-atomic.
The following relationships hold between the lists in
Fig. 6 or Fig. 7:
C = B, D = CAR B, E = CDR D.
Statements of this type may also be regarded as commands in a program. For example, the statement
C = B implies that C is to become an alternative name
for the list B; its programming significance is to copy
the content of the base register corresponding to B
into the base register corresponding to C. Similarly,
E = CDR D associates the name E with the list whose
first member is the second member of the list D. If we
start with a configuration of Fig. 7 and execute the
following statements
R = CDR C
R = CDR R
CAR R = A
we arrive at the configuration shown in Fig. 8, in which
the list A is now a sub-list of B.
279
Fig. 7. As Fig. 6 with base registers not shown
R
^Li—rur
—m-m-m-mFig. 8. The result of performing the following operations
on the list structures shown in Fig. 7: R = CDR C,
R = CDR R, CAR R — A
It will be noted that symbols drawn from the same
alphabet have been used both as the symbols being
manipulated and as the names of lists. One meets
similar situations in other formal systems, and it is
necessary to distinguish between cases in which symbols
are the names of other entities and cases in which they
stand for nothing but themselves. For example, in
ordinary mathematics, we might have
z = y2 + 1 where y = x
-j-
=
2x
~>
> - = 0-
ax
ox
In the case of the ordinary derivative, y is regarded as
standing for x whereas, in the case of the partial derivative, y is regarded as standing for itself. It might be
thought that in list processing the difficulty could be
avoided by having two sets of symbols, and using one
for the names of lists and the other for the symbols
being manipulated. It would soon be found, however,
that this would not work, and that situations would
arise when a symbol used as the name of a list had to be
referred to in its own right. In what follows, symbols
will be enclosed in quotation marks when they are to
be regarded as standing for themselves.
Lists
symbols representing the program in source language
must be accepted and processed so as to yield another
stream of symbols representing the same program in
target language. An example of how this is done may
be found in Wilkes (1964), in which a simple list-processing language, very similar to the one used here, is
implemented in terms of a compiler written in itself.
The same language has been applied to a layout problem
encountered in connection with the design of deposited
wiring for high-speed computing circuits (Wiseman,
1964).
Studies in artificial intelligence call for powerful listprocessing techniques, and involve such operations as
the placing of items on lists, the searching of lists for
items according to specified keys, and so on. Languages
in which such operations can be easily specified are
indicated. The pioneer language in this regard was
IPL developed by Newell, Simon and Shaw (see Newell,
1961).
Recursion
In symbol manipulation much use is made of recursive
subroutines. A recursive subroutine is one that can use
itself, and for this to be possible two things are necessary.
One is that, each time the subroutine is called in, it
should make use of a different part of the memory for
working space; the second is that the subroutine should
communicate with the rest of the program through one
or more stacks or their equivalent. A stack, or a pushdown list as it is sometimes called, works on the last
in, first out, principle. One can place a new item on the
top of the stack, or one can take off the item that happens
at that moment to be at the top of the stack.
It is possible to use a single stack to control a recursive
subroutine. When the subroutine is called in, a link is
first placed on the stack; this gives the point in the
program to which control is to be returned when the
subroutine has done its work. Next, the arguments are
placed on the stack in order, and control is sent to the
subroutine. During operation of the subroutine, the
arguments and link are removed from the stack, and the
results are placed there. Control is then returned to the
calling-in program which takes the results from the stack.
The state of the stack is now the same as it was before the
subroutine was called in. Such a subroutine may call
itself in during the course of the calculation, and each
time it does so extra arguments and links get piled on
the stack; if the subroutine has been correctly constructed, however, everything works out properly, the
stack always being found to contain the right information at the right moment. Although one stack is
sufficient, it is frequently convenient to use several; in
particular, a separate stack is often used to contain the
links.
A stack can be conveniently and efficiently constructed
by making use of a sequence of consecutive registers in
the memory. If desired, however, a list of the list-processing kind can be pressed into service. For this
purpose two operations are required. One, PUSH DOWN A,
takes a register from the beginning of the free list and
inserts it at the beginning of list A, and the other,
POP UP A, performs the reverse operation. Thus, by
writing PUSH DOWN A, CAR A = "X", one can put the
symbol "X" on the stack formed by the list A. Similarly,
by writing CAR B = CAR A, POP UP A, one can remove
the symbol from the stack and place it in CAR B. The
operation PUSH DOWN A can be expressed in terms of
elementary list-processing operations in the manner
shown below. F is the name of the free list.
Example—formal differentiation
An example frequently taken to illustrate symbol
manipulation and the use of recursive subroutines is
that of formal differentiation. A program is given
below for differentiating an algebraic expression stored
in the computer as a Polish list, and it is hoped that this
example will help to make clear what has been said above.
The operations allowed in the algebraic expression are
addition, subtraction, and multiplication; division,
exponentiation, and trigonometrical functions could
easily have been included at the expense of making the
program longer. Differentiation is with respect to X.
The result is left in a rather rough form. For example,
the result of differentiating X. Y is given as 1.7 + X.O.
It is not difficult to write an editing routine which will
remove the unnecessary ones and zeros, and the
interested reader is referred to Wilkes (1964) where such
a routine is given.
The program is based on the following rules. In the
first place, there is the rule for differentiating an expression consisting of a single symbol, namely, that the
symbol should be replaced by " 1 " if it is "X", and by
"0" otherwise. Secondly, there are rules for differentiating sums, differences, and products. These latter
rules are always written recursively, that is, the symbol
for differentiation appears on both sides of the equation.
This recursive form is reflected in the differentiating
routine which, at appropriate points, contains instructions calling in itself.
To call in a routine it is necessary to put on list M,
which is used as a stack, first, a list which will receive the
result and, second, the Polish list to be differentiated.
The beginning of the routine is given the label 10 and,
after placing the appropriate quantities on the stack, the
subroutine may be called in by the instruction TO 10
AND BACK which causes an appropriate link to be stored
on a private stack inaccessible to the programmer.
RETURN causes control to be sent to the place indicated
D= F
CDR
F = CDR F
D = A
A = D.
Uses of list processing
List processing may be applied to any problem in
which symbol manipulation is involved. An obvious
example is the writing of a compiler. Here a stream of
280
Lists
by the link standing at the top of this stack, and the
stack to be popped up. Conditional statements are
written in a form reminiscent of ALGOL with, however,
round brackets instead of the words begin and end to
enclose compound statements. F is the free list. It is
hoped that with these few words of explanation the
routine will prove comprehensible.
TO 10 AND BACK, TO 10 AND BACK
RETURN) OTHERWISE
(CAR L2 = F
Ry = CAR L2, R2 = CDR Ry, i?3 = CDR R2
F = CDR Ri, CDR R3 = " 0 "
CAR Li = F
Sy = CAR L^ S2 = CDR Sy, 5 3 = CDR S2
F = CDR Si, CDR Si = " 0 "
CARL, = " + " , CAR Ry = " . " , CAR Sy = " . "
CAR R2 = CAR 5 2 , CAR S3 = CAR 2?3
PUSH DOWN M, CAR M = S2
PUSH DOWN M , CAR M = B2
PUSH DOWN M , CAR M — Ri
PUSH DOWN M , CAR M = i? 3
TO 10 AND BACK, TO 10 AND BACK
RETURN)
10
A = CAR M, POP UP M
D = CAR M, POP UP M
IF CAR A — ATOM T H E N
(IF CAR A = "X" THEN CAR D = " 1 " OTHERWISE
CAR D = "0", RETURN)
OTHERWISE
(By = CAR A, B2 — CDR By, B3 = CDR B2
CAR D = F
Ly = CAR D, L2 = CDR Ly, Li = CDR L2
/ • = CDRZ,3, CDR£ 3 = "0")
Acknowledgement
This paper is based on an expository lecture given at
the 1964 Meeting of the Association for Computing
Machinery, held in Philadelphia. The paper was originally published in the Proceedings of that meeting,
and I am grateful to the Association for permission to
reprint.
IF CAR By = " + " or " - " THEN
(CAR Ly = CAR By
PUSH DOWN M, CAR M = L2
PUSH DOWN M, CAR M = B2
PUSH DOWN M, CAR M = Z 3
PUSH DOWN M, CAR M = _B3
References
MCCARTHY, J. et at. (1962). L/SP 1.5 Programmer's Manual, M.I.T. Press.
NEWELL, ALLEN (Ed.) (1961). Information Processing Language-V Manual, The RAND Corp.
WILKES, M. V. (1964). "An experiment with a self-compiling compiler for a simple list-processing language," Annual Review of
Automatic Programming, Vol. 4, Pergamon Press, Oxford.
WISEMAN, N. E. (1964). "Application of list-processing methods to the design of interconnections for a fast logic system," The
Computer Journal, Vol. 6, p. 321.
Correspondence
The Editor,
The Computer Journal,
software/hardware combination to or from a common internal
code. An essential corollary to such a system is the acceptance of the printed record as authoritative: a particular representation on tape or cards may be imposed by the limitations
of peripheral equipment or transmission systems, but this is
not relevant to the user, and he should not have to bother
about it. The user should be able to say "print a letter A"
or "tell me what the next character of the input stream is,"
leaving it to the system to sort out the details of the physical
representation. Such facilities can be provided by a sophisticated software system, and they can be provided now,- with
existing peripheral machines, not at some undetermined date
in the future when we have all been standardized.
Yours etc.,
Sir,
May I make two comments on the article "The ISO character
code" by H. McG. Ross, in the October Journal? Firstly, I
am sorry to see that Backspace will be used "to prepare composite symbols and for underlining as in ALGOL." Although this use of backspace is common, it is far inferior to
the use of a non-escaping underline, as anyone who has
experienced both systems will testify. The labour of punching ALGOL programs is greatly reduced by the provision of a
non-escaping key with vertical bar and underline, as on the
MC-ALGOL Flexowriter used at the Mathematical Centre,
Amsterdam.
Secondly, I would propose the perhaps heretical view that
standardization of character codes is not as important as is
sometimes made out. Code translation is an easy process
for a computer to carry out, and the computing system should
be designed to deal with any code, translating by software or a
D. W. BARRON.
The University Mathematical Laboratory,
Corn Exchange Street,
Cambridge.
9 November 1964.
281