IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
402
Symbolic Evaluation and the Analysis of Programs
THOMAS E. CHEATHAM, JR., GLENN H. HOLLOWAY, AND JUDY A. TOWNLEY
Abstract-Symbolic evaluation is a form of static program analysis in as in'
which symbolic expressions are used to denote the values of program
X LT Y ->
MAX(X, Y)*
variables and computations. It does not require the user to specify
which path at a conditional branch to follow nor how many cycles of
a loop to consider. Instead, a symbolic evaluator uses conditional Many compilers would compute the MAX by inserting an
expressions to represent the uncertainty that arises from branching unnecessary comparison, rather than replacing it by Y.
and develops and attempts to solve recurrence relations that describe
Likewise, the liberal use of procedures and functions as
the behavior of loop variables.
structuring devices often degrades performance more than
We describe a symbolic evaluator for part of the ELI language [23], would be expected as a result of call and return overhead.
with particular emphasis on techniques for handling conditional data
sharing patterns, the behavior of array variables, and the behavior of Few compilers make a serious effort to assess the possible
variables in loops and during procedure calls. An expression simplifier, effects of procedures on the call environment or to analyze
which is the heart of the system, is described in some detail. Potential function values in terms of their inputs.
applications of the symbolic evaluator to problems in program validaThe user of a program verifier usually must provide induction, verification, and optimization are mentioned.
tive assertions for all loops, even when they appear obvious
...
Index Terns-Automatic program analysis, expression simplification,
first-order recurrence relations, program optimization, program verification, symbolic evaluation.
I. INTRODUCTION
or redundantly express what the program has computed. This
is itself an error-prone and tedious activity. For instance,
following a loop such as
FOR I TO LENGTH(INDICES)
REPEAT INDICES [I] <- I END
THE productivity of programmers and the quality of their the programmer would like the verifier to assume that
products depends significantly on their programming
Vj, 1 < j S LENGTH(INDICES), INDICES [j] = i
tools. Good tools for program validation and optimization
depend, in turn, on good mechanical analysis of programs. We In fact, the programmer must usually explicitly propose this
have developed techniques for program analysis using symbolic as an assertion to be proved, and provide an inductive assertion
evaluation and have implemented them in a system that derives for the loop in order for the verifier to carry out the proof.
a precise description of the values computed and side effects
Likewise, little assistance is now provided in performing
created by a given program. The resulting analysis can serve the specialization of a general-purpose program or in checking
as the basis for tools supporting several aspects of the pro- the results. The tailoring may require simple deductions based
gramming process.
on the semantics of the programming language and simple
Smart programming aids, that can help find bugs, that can facts about the data structures employed, some of which could
detect and tune regions critical to performance, that can insure be deduced mechanically. A specialization facility, even one
assumed constraints on data are not violated, and that can help with limited initiative, heavily dependent on human guidance,
tailor a generic program to suit a specific application, are would be an improvement over hand manipulation of proregrettably rare. Detecting possible bugs such as division by grams: the tedious bookkeeping tasks could be handled for
zero, dereferencing of a null pointer, or use of an array index the programmer, and semantic checks on the validity of transout of range, depends upon knowledge of the values of vari- formations would help insure that the functionality of the
ables. Most compilers make little use of facts available in original algorithm is preserved.
the context of expressions. For instance, an expression
Of course, developing smarter programming aids is not
MAX(X, Y) may appear in a context in which the relation simple. In general, for practical programming languages,
between x and Y can be determined from path information,
Manuscript received May 15, 1978; revised March 27, 1979. This
work was supported in part by the Advanced Research Projects Agency
under Contract N00039-78-G-0020.
The authors are with the Center for Research in Computing Technology, Harvard University, Cambridge, MA 02138.
'Program examples in this paper are written in El-I [23]. Unusual features of the language will be explained in footnotes as they arise. The
expression P -> E is a conditional form equivalent to
if p then E
in Algol.
0098-5589/79/0700-0402$00.75 O 1979 IEEE
CHEATHAM et al.: SYMBOLIC EVALUATION
most of the interesting analytical questions are either undecidable or costly to answer. Some language designers [21]
suggest that we restrict our languages severely in order to
minimize the potential ambiguities and thereby improve
compilation and verification. Such restrictions can be clumsy
to implement, however, since they are not always enforceable
during compilation. This direction is inherently limited in
any case, since only so many restrictions can be tolerated
before the language becomes unusable.
We have chosen instead to try to develop better program
analysis techniques, and to provide the results in a form that
can be shared by code generators and verifiers, as well as by
other tools not yet commonly available, such as static error
checkers, source-to-source optimizers, performance analyzers,
and protection mechanisms. We use the phrase symbolic
evaluation to describe this program analysis.
We have developed a symbolic evaluator for EL1, the base
language of the ECL programming system [23]. The output
of the ELI symbolic evaluator is a program database. The
database includes an annotated version of each original procedure, in which all implicit computations (e.g., data type
conversion) are made explicit and each expression is tagged
with the mode and value of its result. Values are described by
symbolic expressions. Patterns of potential sharing among
storage locations are recorded, and the history of each location
is represented by the set of values that have been assigned to
it, each tagged with an indication of the context in which its
assignment appears. The various contexts are related by a
context graph, which encodes the flow paths through the
program. A detailed description of the program database
is given in Sections II and III.
Our approach, though related to earlier work on symbolic
execution [1], [5], [15], [20] and logical analysis of programs [18], [27], [28], differs from it in three important
respects.
1) Most other work has focused on simpler languages.
EL1 has compound data structures and pointers, recursive
procedures, a powerful data type abstraction facility, and
several ways to exploit storage sharing (through reference
parameters, through pointers, and through functions calls,
blocks and other expressions that may have locative results2).
Although EL1 does not permit unrestricted go-to's, to handle
languages that do would only require the incorporation of
interval analysis techniques to untangle the control structure
of such programs.
2) We attempt to analyze the behavior of loops. For each
variable x, whose value varies within a loop, we develop a
recurrence relation describing the value of x and then attempt
to solve the relation, deriving a solution for x as a function,
x(k), of a symbolic value k, which represents the cycle index
of the loop. We also obtain a symbolic expression for the
number of cycles taken by a loop. Most symbolic execution
21n ELI, a BEGIN-block or a function call yields not just a value, but a
location containing a value. Such a locative expression may usefully appear as the target of an assignment, or as the binding of a SHARED variable.
Subsequent assignments to such a variable will affect the location computed by its initializing expression.
403
systems either do not treat loops at all or limit the analysis
to a fixed number of cycles. Others have mentioned, in conjunction with generating loop invariants [10], [18], [28],
the idea of using recurrence relations to express the behavior
of loop variables. We have, however, developed techniques
for solving recurrence relations that arise frequently, including
those describing array values.
3) We are concerned with analyzing procedures in such a
way that the analysis can be used efficiently to assess the
effects of a procedure call. Given a procedure P, we develop
descriptions of P's result and its side effects, expressed as
functions of its input parameters and the initial values of any
free variables it uses. Then to analyze a call on P we need
only substitute the actual (symbolic) inputs into the functions
and reduce them. Most symbolic execution systems either
depend upon the in-line substitution of procedures or do not
handle user-defined procedures at all.
The value of our approach is based on the fact that most
computations in most programs are not pathologically complicated. Even when recurrence relations arise that we cannot
solve, the expressions developed by the symbolic evaluator
are useful representations of a program's behavior.
II. THE SYMBOLIC INTERPRETER
The controlling component of the symbolic evaluator is the
symbolic interpreter. Its purpose is to scan an ELi program
and build a program database. It uses a simplifier to reduce
and normalize symbolic expressions and calls on other components to analyze the effects of loops and procedure calls.
The symbolic interpreter makes only one pass over the input
text; whatever repetitive processing may be required to analyze
loops and procedures is performed by those components.
A. The Program Database
The program database has three basic parts: the context
graph, the environment (or ENV), and the shadow. The context graph captures the flow of control and records the symbolic predicates that give rise to branching. It serves as a frame
of reference, or coordinate system, with respect to which the
program's behavior is described. The ENV describes the variables of the program and the sharing among them. The
shadow gives the detailed meaning of each expression in the
input program.
1) The Context Graph. A context is a basic block of the
program, a sequence of computations with no control branching in or out. If there is a flow path through the program in
which control can pass directly from context P to c, then
we say that P is a possible predecessor of c. We make an
exception, however, for repetitive control structures. If P
is the last context in a loop and c is the first, then by convention P is not explicitly represented as a possible predecessor
of c. The loop analyzer instead encodes the effects of the
flow.
For each input procedure, the symbolic interpreter builds
a context graph in which nodes represent contexts and edges
represent possible predecessor relationships. Because of the
definition of a possible predecessor, the graph is directed and
acyclic. Attached to each node resulting from a control
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
404
branch is the symbolic value of the condition that diverts
control into the corresponding context. Every context is
given a unique context number, chosen so that P's number
is less than c's if P is a possible predecessor of c.
To distinguish among points within a single context, a
second coordinate called a time number is used. The symbolic
interpreter increments its time counter whenever there is a
change in the environment through assignment or variable
declaration. Each variable value in the program database is
given a context tag consisting of the context number and the
time number of the point at which it takes effect. When the
symbolic value of a variable v at some particular point P is
needed, the context tag of P is used to select the correct
value from the list of all values that have been assigned to v
in the program. (The value-finding algorithm is described in
Section 1I-B and Appendix A.)
2) The Environment (ENV). The ENV associates with every
variable name in the program a set of locations, each marked
with a scope label and time number. A location is a data item
used by the symbolic interpreter to describe the properties
and behavior of a storage area, such as that used by a variable.
A scope label is a reference to a node in a tree that encodes the
nested block structure of the program. The location corresponding to an identifier in a given scope and time number is
found by searching for the one with a) the scope label that is
equal to, or the most recent ancestor of, the given scope label
and b) the most recent time number in that scope not exceeding the given time number. A location provides the declared
mode of the corresponding variable, its name, and, if it is
compound, its declared dimensions. Any of these quantities
may be expressed symbolically. If the variable is possibly
shared with other variables or with unnamed locative expressions, its location will include sharing links to those other
locations. Each sharing link is given a context tag to identify
the conditions under which sharing takes place. For example,
if x is an integer (INT) variable and Y is declared by3
DECL Y:INT SHARED [) P => X; 2 (]
then Y will be shared with x if condition P is TRUE and will
be an independent variable initialized to 2 if P is FALSE. The
sharing links between Y's location and x's will hold the
context tag obtained from the first block exit. A sharing link
may also be tagged with a list of symbolic component indices,
called a selection vector. The selection vector is used to
indicate that one location is shared with a component of
another. For this reason sharing links are directional.
Also accessible from a location is a list of the values assigned
to the location at various points in the program. Each member
of a value list is called a value cell and includes a context tag
for the point of assignment. These tags are used to determine
which value is valid for a given location at any program point.
Values are normalized symbolic expressions (see Section III).
The value of a component of a compound object is recorded
31n
ELI,
[) * * (s]iS shorthand for BEGIN *
END, and p => e means "exit
expression with value e when condition
be any number of these exit-conditional statements
the enclosing block, loop,
holds." There
within a block.
p
can
or case
as part of the value of the entire object rather than in a separate
ENV location. For instance, after the assignment
V[I] <-
E
the symbolic value of array variable v becomes
store(v, <i>, e)
where v, i, and e stand for the previous symbolic values of v,
I, and E, respectively. (The "store" symbolic expression is
described in Section III.)
3) The Shadow. The shadow is a computation tree representation of the input program in which implicit computations
are made explicit and the attributes of each expression are
linked to its tree node. The implicit computations explicated
include type conversion (which may be user-defined), automatic dereferencing of pointers in certain contexts, hidden
manipulations of loop variables, and the like. The information
attached to each shadow expression depends on the nature of
the expression. The attributes of a simple locative expression,
such as a variable or the component of a variable, include the
location computed (ENV entry), the mode of the result, and a
selection vector if the expression is a selection. The mode
appears in the shadow because, for variables declared with
a generic mode,4 it may be possible to give a more precise
mode than that recorded in the ENV.
For multiple exit expressions, such as BEGIN-blocks, the
shadow information includes a result list. Each entry on the
result list refers to a possible exit value of the block and
includes a context tag and the enabling predicate for the
corresponding block exit. Iteration forms and procedure
calls have shadow attributes that aid in determining values
for loop and procedure variables and procedure results. These
are discussed in Sections IV and V.
B. Finding Values for Variables
As it processes declarations and assignments, the symbolic
interpreter records enough information to permit the values
of variables to be retrieved correctly. However it does not
attempt to keep the value at every ENV location up to date
at every moment during analysis. The reason for this is
efficiency: we want to defer computing the value of an expression until we know we need it. The effect of an assignment
to a variable, say Y, is recorded in Y's location only and not
immediately reflected in any other location for a variable,
say x, with which it may be shared. Later, if the value of x
is needed at some program point P, we consider both the
values associated with x's location and those of Y and any
other potentially shared location. From this set of values,
one or more are selected as being the most recent ones that
could be valid at P. If there is more than one possible value
for x, the symbolic interpreter creates a conditional expres4A generic mode in ElI represents a set of mode alternatives. For example, a procedure parameter may have the declared type ONEOF(INT, REAL),
meaning that either an integer or a real value will be bound on procedure
entry. Once binding has occurred, however, the mode of the parameter is
fixed for the lifetime of the procedure activation. Analogous rules hold for
local variables.
CHEATHAM et al.: SYMBOLIC EVALUATION
405
sion associating each value with a predicate describing the
distinguishing circumstances under which it is the value of
x. For a value obtained indirectly through an assignment
to a shared variable Y, this predicate will reflect both the
conditions for sharing and the control predicates that enable
Y's most recent assignment. We describe the method for
determining values in more detail in Appendix A and provide
an example of its operation.
C. An Example of Symbolic Interpretation
As a simple example of symbolic interpretation, consider the
following program segment, in which ellipses represent parts
that have no effect on the variables whose declarations are
shown:
xo < YO
Xo
< y
XO
>yo
xo> yo
Fig. 1. Example context graph.
<1, 1>
DECL X:REAL BYVAL INPUT(REAL);
DECL Y:REAL BYVAL INPUT(REAL);
<1,2>
<1,3>
BEGIN
DECL LEFT:REAL SHARED
BEGIN
X LE Y =>X;
<2,3>
<3,3>
Y;
END;
DECL RIGHT:REAL SHARED
BEGIN
<5,4>
X LE Y =>Y;
x;
<6,4>
END:
<7,5>
/* 'Shift origin to leftmost point';
RIGHT <- RIGHT
LEFT <- 0;
END:
-
SQRT(X + Y)
LEFT;
<7,6>
<7, 7>
<7,7>
The pairs of integers <c,t> are not part of the program.
They are the values of the context number and time number
representing the point of control immediately following the
statements with which they are associated. (Omitted portions
of the program have been ignored in assigning context and
time numbers.) The context graph for this program segment
appears in Fig. 1.
The declarations of x and Y cause locations to be added to
the ENV for these variables. Each is initialized with distinct
symbolic values representing the results of the tWO INPUT
expressions. Call these symbolic values xo and yo.
Interpretation of the conditional expression in LEFT'S
declaration causes two nodes, labeled 2 and 3, to be added
to the context graph. Each has node 1 as its predecessor. To
each node, the branching condition that gives rise to entry of
the corresponding context is attached: xo < yo to node 2
and xo > yo to node 3. The shadow entry for the block
[) X LE Y => x; Y (] is also created, with a result list of two
entries: the location for x, with its context tag <2,3>, and
that for Y, tagged <3,3>. Had this expression appeared in a
value-requiring setting, as an argument to an arithmetic oper-
Fig. 2. ENV after declarations of x, Y, LEFT, and RIGHT.
ator, perhaps, the symbolic interpreter would use the result
list to obtain a symbolic value, in general a conditional expression with predicates extracted from the context graph. The
variable LEFT, however, is declared SHARED, so the interpreter
instead uses the result list to record sharing links between the
new location it adds to the ENV for LEFT and the existing
locations with which it may be shared. Each such sharing
link is labeled by the corresponding context tag taken from
the result list.
The declaration of RIGHT is next processed similarly. At
this point, the ENV locations for x, Y, LEFT and RIGHT appear as shown in Fig. 2. The circles represent location entries;
the rectangles, the values of the locations to which they are
attached. In general, a string of several symbolic values is
associated with a location, each labeled with the context tag
for the program point at which it is bound to the location.
The arcs between locations are sharing links, labeled with
context tags identifying the control paths that bring about
the sharing relationships. Note that LEFT and RIGHT have
no initial value cells of their own; each is defined entirely
by its sharing pattern. Value cells are added only when a
value is actually needed or as a result of direct assignment.
The assignment RIGHT <- RIGHT - LEFT, for instance,
causes the generation of explicit values for RIGHT and LEFT,
and then causes a second value, representing the difference, to
be added to the value list for RIGHT. To obtain a value for
RIGHT in context 7, the interpreter considers all paths in the
context graph from that context to the entry of the program.
Each path passes either through context 5, in which case RIGHT
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
406
<7,5>
cond(xo<Yo,Xo yo)
<7,7>
0
Thus, if there had been an input assertion associated with
SQRT, requiring that the argument be nonnegative, we would
know directly from the symbolic value of the argument that
this call is valid.
III. SYMBOLIC EXPRESSIONS AND THEIR SIMPLIFICATION
cond ( xo 5 yo,yo. xo)
<
Fig. 3.
ENV
7,6>
H
cond (xo <
after assignments
Yo
to
yo
xo.xo
-
LEFT
and
Yo )
RIGHT.
shares with Y, whose most recent value along that path is yo,
through context 6, in which case RIGHT shares with x,
whose initial value is xo. The condition distinguishing between
contexts 5 and 6 is represented by the expression xo S yo,
so the value obtained for RIGHT is cond(xo < yo, y0, Xo).5
At the same point LEFT evaluates to cond(xo < yo, xo, yo).
The difference after simplification is thus cond(xo < yo,
yo - xo, xo - yo). This value is attached to the shadow
entry for the program expression RIGHT - LEFT, and is added
to the value list for RIGHT when the assignment is interpreted.
After the subsequent assignment to LEFT, the ENV entries for
RIGHT and LEFT are as shown in Fig. 3. No effects on x and
Y are recorded in their ENV entries as a result of assignments
to RIGHT and LEFT, however. The assessment of side effects
on shared variables is deferred until their values are needed,
in this case at the call of SQRT.
Evaluation of the SQRT expression forces the symbolic
interpreter to update the values for x and Y. Viewed syntactically, there would appear to be paths on which x is
shared with both LEFT and RIGHT and others on which it is
shared with neither. Using the branch conditions in the context graph, however, the evaluator is able to construct the
precise sharing possibilities. Combining each condition with
the latest value for a shared location along the corresponding
path, we compute the value of x as
or
cond(xo > yo,
cond(xo S yo, yo
-
xo, xo
cond(xo S y0 andX0 < Y0,
-
Yo),
0,
*
xO ))
from
RIGHT
from
from
LEFT
x
which simplifies to
cond(xo > yo, xo - yo, O).
Similarly, the value of Y is cond(xo S yo, yo
the value of the argument to SQRT iS
cond(xo > yo, xo
-
yo, O) +
cond(xo
-
yo, yo
xo).
-
yo, yo
xo, 0). Thus
-
xo, 0)
which simplifies to
cond(xo > yo, xo
-
5The expression cond(p, t, f) represents a conditional value equal to t
when the Boolean predicate p is TRUE and to f when p is FALSE.
Symbolic expressions called "SEXPR'S" are an algebraic notation used by the symbolic evaluator to represent and reason
about program values. The normal form for SEXPR's and the
design of the simplifier, which reduces and normalizes them,
are similar in many respects to those of [9], [19]. Several
factors influenced the choice of SEXPR representation. 1) The
evaluator frequently tests pairs of symbolic values for equality.
Also, equality relationships are among the strongest facts for
the simplifier. Therefore two expressions having the same
value should have the same representation if possible, and
moreover it should be very efficient to test for identical
expressions. 2) The representation should be as compact
as possible, since a large portion of the program database
consists of symbolic values. 3) For ease of manipulation,
the meaning of symbolic expressions should not depend on
contextual information or on internal side effects. 4) The
symbolic evaluator often needs to describe computations,
such as a sum of arithmetic terms, or structures, such as a
homogeneous array of values, whose form is regular but
whose extent is either large or unknown. Notations involving
quantification are useful for expressing such values in closed,
manipulable form. 5) Finally, there must be some general,
recursive means of expressing values which cannot be put in
closed form.
A. Representation ofSymbolic Expressions
Symbolic expressions are composed of manifest constants,
value tokens, and operators called SEXPR functions. Manifest
constants denote known fixed values, such as integers, real
numbers, Booleans, strings, and so on. Value tokens denote
specific values which are for some reason indeterminate. A
value token may represent the result of an input operation,
for example, or it may stand for a generic member of a set
of values. SEXPR functions include the usual arithmetic,
relational, and logical operators, as well as others which we
will describe below. Symbolic expressions are composed by
applying SEXPR functions to constants, value tokens, and
other applications of SEXPR functions. We use the phrase
symbolic value as a shorthand for "the symbolic expression
that is the value" of some program expression.
In order to give equal values equal representations where
feasible, symbolic expressions are put in a normal form.
Algebraic properties of the functions, such as associativity,
commutativity, distributivity, and transitivity, are used to
eliminate syntactic variants of semantically equivalent forms.
Nested binary expressions are linearized, and like terms are
combined. Arguments in the resulting n-ary expressions are
in a standard order: constant operands are first, then simple
value tokens, and then function applications. Each value
token and SEXPR function is given a unique order number by
the simplifier; these numbers are used to order the tokens and
applications of different functions. Two applications of the
same SEXPR function are ordered according to the first pair
CHEATHAM et al.: SYMBOLIC EVALUATION
of corresponding operands that have different order, with
empty arguments taking precedence over nonempty ones.
The simplifier insures that every normalized symbolic
expression is uniquely represented in the database. That is,
if two SEXPR's appear identical when printed, their internal
representations actually share the same list structure. Thus,
the expressions plus(47, times(x, Y)) and le(times(x, Y),
10) have times(x, Y) as a shared sublist. Besides contributing
to storage compactness, this scheme allows the test for syntactically identical SEXPR'S to be implemented as a simple
pointer comparison.
Each representative SEXPR is entered in a hash table. Thus,
the simplifier can quickly determine whether an expression is
already reduced and normalized just by looking it up in the
hash table. If so, it simply returns the given expression.
The unique representation of SEXPR's also facilitates the
association of properties with SEXPR'S. For example, since
the logical negation of a Boolean expression is a relatively
expensive operation that occurs frequently, the negation of
an expression is associated through a hash table with the
original expression when the negation is first computed; it
can then later simply be looked up. Similarly, with an arithmetic expression, we record its negative when we first compute it.
407
denominator of a quotient to be positive. We also eliminate
negative factors from all but the first factor of a times(* .*)
expression, by replacing it with times(- 1,-- ) when necessary. Neither plus(- ) nor times(- * ) has a plus application as
an argument; plus(- - , plus(- ), * - ) is eliminated by absorbing the inner terms into the outer expression, and times(- * ,
plus( - ), - - *) is eliminated by symbolic multiplication.
2) Quantified Arithmetic Functions
finite\sum(j, L, U, t(j))
t(L) + t(L+1) + -*+ t(u)
finite\product(j, L, U, f(j))
f(L) * f(L+l) *****f(u)
least(j, L, U, p(j))-
the least j such that L < j < u and p(j) = TRUE, or
else U, if p(j) = FALSE for L < j S U
In each of these forms, j is a value token which acts as a
bound variable ranging between L and u. The operands t(j),
f(j), and p(j) represent expressions that, in general, contain
occurrences of j.
To reduce a finite\sum, the simplifier first reduces its
summand, t(j). If the result is a plus( .* ), the finite\sum is
distributed over t(j) and the terms are simplified individually.
If the difference u- L can be determined to be two or less, the
finite\sum is replaced by its expansion. Otherwise the reduced
summand t(j) is factored as t(j) = I * D(j), where I is independent of j and D(j) is either unity or depends on j. If D(j) = 1,
the finite\sum is replaced by (U-L+1)*I, which is then simplified. If D(j) is rational in powers of j and exponentials of
the form bJ, where b is independent of j, the simplifier may
be able to replace the finite\sum by a closed form expression.
Its algorithm for determining whether a closed form exists is
based on the decision procedure in [17].
Simplification rules for finite\product(j, L, u, f(j)) are similar to those for finite\sum, except that Karr's algorithm is not
invoked. If f(j) is a simple product, then the finite\product
operator is distributed over its factors. If f is independent of
j, the expression is replaced by the result of simplifying
expo(f, U-L+l).
The form least(j, L, u, p(j)) is simplified by trying to absorb
the predicate p(j) into the upper limit u. For example,
B. The Simplifier
The simplifier's overall strategy is to simplify the operands
of an expression and then call a specific simplification routine
for the given operator.
We describe the individual simplification routines in the
following subsections. Each subsection begins with a description of the formats of a class of SEXPR'S, together with their
interpretations in conventional notation or in English. The
format of each kind of SEXPR is shown as an applicative
expression (an operator followed by a parenthesized list of
operands separated by commas) since this form corresponds
directly with the list structure representation used internally.
We use angle brackets <.- > to denote operator-less lists.
For example, the SEXPR representing component selection
has the form sel(F, <il, -* , in>). The second operand of
the "sel" function is a list of selection index expressions. In
discussing the simplification of SEXPR s, we will not always
adhere to the applicative notation, but will use more convenleast(j, I, N, le(M, j))
tional forms for readability. For instance, the component
becomes
selection just given would be written F [il, * , in] 1) Simple Arithmetic Functions
PlUS(T1
,
T)-T1+
timeS(F 1, *F*
n-F1
divide(N, D)-N/D
eXPO(B, E) B**E
+T
**'F
least(j, 1, min(M, N), FALSE)
which becomes
max(1, min(M, N))
(min, max, and le are discussed in the next two subsections.)
The simplifier reduces arithmetic expressions by performing
We use a unique value token, called "infinity," to represent
constant computations where possible, by combining like an integer such that min(x, infinity) may be replaced by x,
terms and factors in sums and products, and eliminating com- for any x. As will be discussed in Section IV, this token is
mon factors from the numerator and denominator of each used by the loop analyzer as an argument to the least function.
quotient. Factor elimination is performed only when the
3) Relational Expressions
denominator has a single term; the normalized numerator is
scanned for factors common to both numerator and denomiIe(L, R) L < R
nator and these are dropped. To obtain a normal form, we
eq(L, R) L = R
ne(L, R) L # R
impose certain sign conventions. For example, we force the
408
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
Strict inequalities of the form x < y are replaced by weak
inequalities using one of the following transformations:
i<j becomes iSj- 1,forintegeri,j,
x < y becomes x S y - epsilon, for real x, y,
where epsilon is a unique value token that stands for a very
small real value. (Mixed-mode inequalities, such as x < i, do
not appear as SEXPR's because the symbolic interpreter converts such expressions to a consistent form in accordance with
ELI semantics.)
For relational expressions, normal forms have been chosen
for convenience in simplifying conjunctions. In the case of
le(L, R), this means moving nonconstant terms from R into
the first operand, and constant terms from L into R. Quick
recognition of inequalities with the same nonconstant part is
useful since two such relations must reduce to a single expression when conjoined. Quick recognition of inequalities of
the form le(L, cl) and le(-L, c2) is also useful, since they
must conflict when conjoined if cl < -c2 .
An equality eq(L, R) is arranged so that L will be a good
candidate for replacement by R in other expressions conjoined with the equality. We want to replace less simple
expressions by simpler ones when possible, and must avoid
replacing an expression L by another containing L as a subexpression. To determine the candidate for substitution, we
put the terms of L-R in normal order-using the ordering
algorithm described in Section III-A-and, ignoring any negative signs, we pick the first nonconstant term that is not contained in some other term. Thus eq(L, R) is reduced to
eq(L', R'), in which R' contains no instances of L', and L'
is chosen to be a value token if possible. For example,
eq(y, v [xl) and eq(v [x], x) are both in normal form.
ne(L, R) is reduced in exactly the same way as eq(L, R);
thus, negating a normalized not-equal expression simply
amounts to changing the operator.
4) Conditional Expressions
case(pl, el,
*
Pn
pi, using the ordering scheme described above in Section 111-A.
Nested case expressions are flattened. For example
case(p1 =>
case(p2 => e2,
P3 => e3),
becomes
case(p1 and p2 => e2,
Pi and p3 =>e3,
If all e. are Boolean expressions, then
case(p, => el,** Pn => en)
becomes
or(p1 and e1, , Pn and en),
which is then further simplified.
When a conditional expression appears as an operand of
another SEXPR function, the function is distributed over the
arms of the conditional unless bound variable definitions make
this impossible. For example,
case(x <10 =>x, x> 10=> 10)< 10
becomes
case(x 10 => x10, x> 10 => 10 10)
which reduces to TRUE.
5) Compound Value Expressions
array(<jl, *** in>, <Si,*** Sn>q E(jl, ***Xin))the array of rank n with extent si in dimension i and
whose element indexed by <jl, - in> iSE(jl,
in).
store(A,
en)=
the e. such that pj is TRUE. By assumption the pj are
mutually exclusive and exhaustive. Thus, or(p1,
Pn) = TRUE and for i : j, and(pi, Pj) = FALSE.
cond(p, e1, e2) case(p, e1, not(p), e2)
min(x, y) cond(le(x, y), x, y)
max(x, y) cond(le(x, y), y, x)
The cond, min, and max operators are provided as a convenience for the various components of the symbolic evaluator
that build conditional expressions. Each is converted to case
form during normalization.
<jl,
*
jin>)
X)
a copy of the compound value A in which the component indexed by <il,
n,ji n> has been replaced
by x.
sel(A, <jj,
in>)
the component of A indexed by <jl,
,n>
In the array expression, each ji acts as a bound variable
ranging between 1 and si. An example is the identity matrix
of order n:
array(<j1, j2>, <n,n>, cond(j1 = j2, 1,0)).
Array expressions are normalized by normalizing their operands
and then looking for obvious simplifications such as replacing
, Sm>,
array(<j1, --* , jm>, <Si,
1 *
array(<jm+1, * * * in>, <Sm+li X * *
-
For expository purposes, we write case expressions as
case(p1 -> el,
*%
Pn
.
'
enJ
The arms of a case expression are ordered on the predicates
,
Sn>, E(01,E
-
X
in)))
by
array(<j1I, - *. * in>, <S1, * * *, Sn>, E(j1,
* -*
in))*
This transformation derives from the view of a multidimen-
sional matrix as a composition of one-dimensional vectors.
CHEATHAM et
al.: SYMBOLIC EVALUATION
409
For the same reason,' a selection from an array expression is
reduced as follows:
sel(array(<jl, * *, jm>,
<il, . . .* in>)
becomes
Sm>, E(jj, * * * X jm)),
s 1, ' * *,
E(ij, * * *, in),
or
or
if
sel(E(i1, * *, im), <im+1
,
array(<jn+l **.,jm>,
<Sn+l,
in>),
>
**
n=m,
if n>m,
if n<m
ins jn+lR *
im)),
E(ij,
The store operator is provided as a convenience, but it is not
maintained by the simplifier. Before normalization, store(A,
in>, x) is converted to
<i,
..* *
'
,
array(<jl,
Xn>,
<Si, *Sn>,
cond(j1 il
and
...
X,
E(j1,,
where the value of A
was
in)))
and jn
=
in
in the normal order. The simplification of a conjunction is
performed by successively calling conjoin(c, B) where c is a
conjunction, and B is a new conjunct. That is, c has the form
and(El, * *,5 EeR Ui, * * , Uu, D 1, * * , Dd) where the Ej, Uj,
and Dj are equalities, units, and disjunctions, respectively.
Conjoin(c, B) basically performs the simplification of the
logical conjunction of c and B. It may, however, be directed
to infer, in addition, the effects of transitivity. The steps in
conjoin(c, B) are as follows.
Step 1: The equalities E1,
, e are substituted into B:
for each Ej, which has the form eq(Lj, Ri), each occurrence of
Lj in B is replaced by Rj, and the result is simplified. If this
reduces B to TRUE it is ignored and if to FALSE, the whole
conjunction becomes FALSE.
Step 2: If B is an equality, it is substituted into each conjunct of c and then added to the equality conjuncts of c.
Again, if any conjuncts reduce to TRUE or FALSE they are
eliminated or the whole is reduced to FALSE, respectively.
Step 3: If B is a unit conjunct (i.e., not a disjunction), then
we proceed as follows.
Step 3a: B is first conjoined with each unit u1, , u u of
c. Several outcomes are possible.
i) The result is FALSE (for example, uj =le(L, a) and
B = le(- L, -(a+1)) or Uj =eq(L, R) and B = ne(L, R)) in which
case the whole is reduced to FALSE.
ii) The two are incomparable, in which case B is added to
-
array(<jl, * * in>, <Si, * * Sn>, E0j ** in))6) Boolean Expressions
and(c l, *, Cn) - conjunction of c1, * , CCn
C.
*, D)disjunction of D 1,*** D n
or(DIl, n*
iii) The conjunction of B with some u leads to a strengthnot(P) negation of P.
ened relation (for example, uj = le(L, a) and B = ne(L, a) lead
Boolean expressions are represented in conjunctive normal to le(L,a-l), or uj=le(L,O) and Bsle(L,-5) leaves just
form (CNF). CNF is not, however, usually used in program le(L, -5)). In this case the strengthened relation is added to
simplifiers. Both King [19] and Deutsch [9], for example, c and uj and B are eliminated.
use disjunctive normal form. There are two reasons for this
The simplifier will also, if so directed, conjoin to c any new
departure. First, the predicates used in reasoning about a relations that can be inferred by transitivity.6 For example
particular program point often derive from the path condi- from le(L-R,O) and le(R-S,O) we can infer le(L-S,0) and
tions enabling control to be at that point. Since a path con- add it to c.
dition is a conjunction of the predicates from each branch
Step 3b: B is then i-resolved7 against each disjunction
along a path, conjunctive form seems natural and efficient.
DI, *, Dd. For example, if
Second, our preliminary design studies suggest that we can
B = le(L, a)
develop a resolution-based theorem prover [25] that avoids
= or(le(-L, -a), d2,
dk),
Dj
many of the efficiency problems of conventional resolutionbased systems by taking advantage of the knowledge built then resolving B and Dj results in
into the simplifier. Resolution operates on expressions in
or(eq(L, a), d2, * * , dk)
conjunctive form, so CNF is a natural choice for this reason
as well.
which replaces Dj (by subsumption). Again at this point, if
In CNF, the operands of a conjunction are never themselves the user so chooses, the simplifier will produce the disjuncconjunctions, the operands of a disjunction are neither con- tions that result from transitivity. For example, resolving
junctions nor disjunctions, and negations are pushed inside of
"'and" and "or" expressions. Negation disappears altogether
when applied to a constant or to a relational expression,
6Our current methods for inferring the results of transitivity are too
being replaced by the appropriate inverse. For example, expensive to be used generally. We expect, however, to add a linear solver
the simplifier which would derive these inferences less expensively. In
not(le(L, 10)), where L is integer-valued, is replaced by to
the meantime, a flag is used to indicate whether the simplifier should conle(-L, -11). In general, constant operands of a Boolean sider transitivity implications.
7By i-resolution we mean an extension of the technique of pure resoluexpression will either be dropped or will subsume the entire
tion [25] to take advantage of the known properties of certain operators,
expression. Redundant operands are also discarded.
such as arithmetic, relational, and logical functions. We have built-in
a) Simplification of Conjunctions
knowledge of their transitivity, commutativity, associativity, their inThe operands of a conjunction are partitioned into three verse functions, and the like. I-resolution in general will simply result in
a stronger fact, rather than a contradiction. This result will presumably
groups: equality relations, other unit conjuncts, and disjunc- be easier to refute subsequently, but in any case is necessary to the derivations. Equalities precede units, and units precede disjunctions, tion of a normal form for the conjunction of two disjunctions.
-
-
-
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
410
=le(L-R, 0)
Dj = or(le(R-S, 0), d2,
produces a new disjunction
B
, dk)
or(le(L-s, 0), d2,- , dk)
which would be added to c.
Step 4: If B is a disjunction, all unit conjuncts u1, * ,U
of c are resolved against B before it is added to c (with appropriate actions if B reduces to a single disjunct).
Note that the simplifier does not resolve pairs of disjunctions (only units versus disjunctions). This step is being left
to the theorem prover, unless experience shows that we must
include it in the simplifier for efficiency.
b) Simplification ofDisjunctions
Given a disjunction or(D1, * * *, Dn), the simplifier first
normalizes the disjuncts Dj and then eliminates disjunctions
and conjunctions from the resulting operands. Disjunctive
operands are absorbed into the set {Dj}. Conjunctions are
moved outside of the given disjunction by distributing the
"or"1 over the "and's.'" And, when given a disjunction of
expressions that are themselves neither "and's" nor "or's,"
the simplifier negates the given disjunction by conjoining the
negations of its disjuncts: and(not(D 1), * * *, not(D0 )). This
expression is then simplified by the conjunction simplifier.
Finally, the result is negated again, producing, in general, a
reduced disjunction. The operands of a disjunction are ordered
so that we can obtain the disjunction from its conjunctive
negation without reordering terms, simply by inverting operands and changing the operator.
7) Recursive Function Expressions
we associate with each loop a new value token k, which serves
as an iteration counter for the loop. Then, for any quantity,
say x, whose value may be changed on some cycle of the loop,
we want to determine that function x of k, denoted x(k),
such that x(k) is the value of x at the beginning of the k-th
cycle of the loop. The second goal is to determine the number of cycles taken by the loop, denoted kL.
B. Strategy
The basic strategy for loop analysis is as follows. For each
variable x whose value might change during a cycle of a loop,
we install a value token Xk (called a possible induction value,
or PIV for short) to represent the value of x at the beginning
of the k-th iteration through the loop body. We then do a
symbolic evaluation of the body, at the conclusion of which
we compute the following:
1) For each quantity x, which has been changed, we determine its value under an assumption of cycling again; this
value is denoted Xk+1 and is referred to as the again value
of x.
2) Let P1, * * ,Pn be the explicit exit conditions given in
the loop body. We compute the symbolic expression pj representing the value of each exit condition Pj.
For each quantity x which may change we also determine
(in addition to x k and X k+ 1) the symbolic value x1 which
is the value of x at the point just prior to the first cycle of
the loop. The two quantities X k, X k+ I are treated as a recurrence relation with boundary value xl. The solution to the
recurrence relation, denoted x(k), is the value desired, that
of x at the beginning of a general, or k-th cycle.
We then take each exit condition pj and substitute for each
PIV Xk occurring in pj, its solution x(k). We thereby obtain
lambda(j, h(j))the function of j defined by the expression h(j)
a solution for pj, denoted pj(k), which is that function of k
The lambda expression is used by the loop analyzer to defining the j-th exit condition. Let lim be the successor of
the
describe symbolic values that cannot conveniently be put beenupper limit of the range of the loop parameter, if one has
explicitly provided, and the unique value token "inin closed form8. A lambda expression is associated with a
finity"
otherwise. The value of kL, the number of the cycle
value token, called a recursive function value. If R is such a
on which exit occurs, is then
value token then R may be used like a SEXPR function with
one integer valued argument. If the expression lambda(j,
least(j, 1, lim, pi (j) or * or Pn(j)).
h(j)) is associated with R, then the meaning of the applicaOf course, solving the recurrence relations defined by x k,
tion R(e) is given by the expression h(e). Of course, h(j)
X k 1 and xl is not always straightforward. In the sections
may itself refer to R. For instance, h(j) might be cond(j = 0,
1, R(j- 1) * j), in which case R represents the integer factorial below we will describe the kinds of recurrence relations that
result from program loops and discuss some of the techniques
function. The creation of lambda expressions and recursive we
employ for solving them.
function value tokens is discussed in Section IV-G.
C. Recurrence Relations
IV. LooP ANALYSIS
The recurrence relations with which we are concerned are
A. Goals
first-order relations, but there are a number of complications
To a major extent, the power of the symbolic evaluator de- which can arise. We will describe some of these, indicating the
rives from its ability to analyze loops. When we speak about kinds of program constructs which give rise to the various
analyzing loops, we have two goals in mind. To begin with, complications.
First, the recurrence relations may not be independent. For
81n principle, the array SEXPR (Section III-B5) could be replaced by a example, any variable playing the role of a counter in a loop
generalized version of the lambda expression described here. The decision will often occur in the recurrence relations for other variables.
to use separate notations was made for the sake of readability and to avoid
over-burdening the lambda form used by the loop analyzer to express Consider the following simple loop to add the first N elements
of an array x:
unsolved recurrence relations.
-
CHEATHAM et al.: SYMBOLIC EVALUATION
DECL S:INT BYVAL
DECL I:INT BYVAL
REPEAT
I GT N => S;
S K- S + X [I];
I+ 1;
I
END
0;
1;
The again values are
Ik+
Sk+i
=Ik + I
Sk
Xk+1 =Xk
+
Xk[Ik]
with the recurrence relation for s indicating its dependence
upon the previous value of s as well as x and I. We call these
simultaneous recurrence relations.
Second, the recurrence relations may be conditional. For
example, if the first cycle of a loop initializes certain variables, the recurrence relations for the variables involved will
be conditional. Similarly if the loop performs some operation on a subset of the components of a compound object,
certain recurrence relations will reflect the conditionality of
whether an element is in the subset or not.
Third, the recurrence relations may be dependent upon
several value tokens, so that they become what we call multivariate recurrence relations. For example, the following loop
does a cyclic shift of the elements of an array A:
FOR I TO LENGTH(A)-1
REPEAT
TEMP <- A[I+1];
A [I+1] <- A [I];
A [I] <- TEMP;
END
After we have solved for the counter I, obtaining i(k)= k, the
again value for A has the form
Ak+l = array(<j>, <U>,
case(k = j => Ak [k+ I],
k =j-1=>Ak[k],
k#Aj and k#Aj-1 =>AkbI]))
411
is only very recently that a decision procedure for finite sums
has been developed, defining the conditions under which some
finite sum has a closed form representation [17] .9
The current symbolic evaluator includes the following methods for solving arithmetic recurrence relations.
1) Invariant and Almost Invariant Relations: An invariant
recurrence relation is one in which Xk+l = Xk. The solution
is immediate, namely x(k) = x1. If, however, x depends
upon some parameter j, there are a number of recurrence
relations, which we term "almost invariant," that can also
be solved directly. An example is
Xk
+
1
(j)
=
X k(h(j))
x1(j) = b(j).
In this case, we have the solution
x(k, j) = b(hk (j))
where hP(j) denotes the p-fold application of h to j.
2) Finite Sums: Given the recurrence relation
Xk + g(k)
the solution is
Xk+I
=
x(k) = xl
finite\sum(j, 1, k- 1, g(j)).
When we have a finite sum we attempt to replace it by the
closed form solution using the decision procedure developed
by Karr. If the summand g(j) of a finite sum is rational in
j and exponentials of the form b we are able to determine
whether or not there exists a closed form of the sum (which
will, again, be rational in j and exponentials of j) and if so,
what it is.
3) Finite Products: If the recurrence relation has the form
+
,
g(k)
we can, by taking logarithms, reduce this to the sum case.
Taking antilogarithms we convert the solution obtained for
the sum to that appropriate for the product.10
xk+1
= xk *
E. Solving Simultaneous Recurrence Relations
The recurrence relations derived by symbolic evaluation of
a loop body may be mutually dependent. It is often the
where u represents the length of A. If we consider the recur- case, however, that they are in fact degenerate in the sense
rence relation for the j-th element of A, we have
that not all the variables depend on all the others. Thus we
to isolate those recurrence relations that can be
attempt
=
=
Ak+1 [j] case(k j => Ak[k+l],
solved directly, substitute the solutions obtained, and thereby
k = j-1 =>Ak[k],
reduce the number of apparent simultaneous relations. For
k j and k * j - 1 > Ak [j])
example, given
which involves both k and j, so we require a solution A(k, j)
Xk+ =F(Xk, k, k)
that depends on both.
Ik+ G(Ik, k)
D. Solving Arithmetic Recurrence Relations
we may be able to solve the second equation to obtain i(k).
Arithmetic recurrence relations have been the subject of con- Substituting this for Ik in the first equation yields
siderable study. However, as with research on differential equations, these efforts have usually been concerned with develop9Karr's work was motivated by our efforts to solve recurrence relations
ing techniques for people to use in solving recurrence relations.
from program loops.
The methods result in finite approximations (in the form of arising
10We do not expect to employ this transformation in all cases, but rather
truncated series) rather than closed form solutions. Indeed, it detect and handle the simple, common cases by table look-up.
=
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
412
Xk+I = F(Xk, I(k),k)= FI(Xk,k)
which we can now attempt to solve directly.
Looking back at the example summing the elements of an
array, we find that the recurrence relations are of this sort.
x is invariant, I can be solved as I(k) = k and then s solved
as
s(k) = finite\sum(j, 1, k- I, x [j]).
[that is, the recurrence "stays" in range (C) until k' reaches
one and it then jumps to range (A)]. This can be solved as an
almost invariant relation to yield the solution A1 [11 (j'
counts down with k' until both reach one).
(D) k=j+l
We have Ak[j] = Ak.l [j+1]. We again consider the right
side. Writing k' = k - I, j' = j +1 and converting the predicate
k = j +1 to k' = j '-l, we see that A k,[j '] falls into either range
(A) or (B) and hence the solution is Al [j']. Converting back
to j and k we have the solution Ak [j] = A1 [j+1I for k = j+1.
F. Solving Conditional and Multivariate
Recurrence Relations
As we noted above, conditional recurrence relations also
(E) k>j+1
commonly result from program loops. To our knowledge
Here the relation is invariant until k falls into range (D), so
these recurrence relations have not yet been studied.
We are developing the following approach to solving condi- that the solution for range (E) is
tional recurrence relations. We first factor the conditionality
Ak[j] =Al [j+l]
into its dependence on k in such a fashion that within ranges
of values of k the functions are unconditional. Consider the Putting all the solutions together, we have
example of the pairwise interchange of array elements preA(k)[j] = case(j < k => A1 [j+l],
sented above. We can cast the recurrence relation for the
k = j and 1 < k => Al [1],
value of the j-th element of A on the k-th cycle as a ranged
k.
1 ork j-l => Al [j])
conditional value: 1 1
When the loop completes we have kL equal to the symbolic
Ak[j] =
value of LENGTH(A), i.e., the u-th cycle is the first one not
(A)
ranged\value(k < 1 => A1 [j I,
taken, so that the value of A following the loop (assuming that
1 <k.j-1 =>Ak l [i],
(B)
u is greater than 1) is
k=jand l<k=>Ak1 [j-I], (C)
k=j+1 => Akl [j+1],
(D)
array(<j>, <u>, cond(j = u, A1 [1], Al [j+l]))
(E)
j+1 <k=>Ak-l [i)
which describes the array after u- 1 swaps of successive overNote that in this form the conditionality has been factored lapping pairs of elements.
into five distinct and ordered ranges for k, some of which
In general, given some conditional recurrence relation
may be empty. Note also that we have employed equality f(k, j) = cond(- *) we attempt to map it into a ranged confacts about j and k to obtain indices that are functions of j. ditional value
We may now be able to solve each (nonconditional) recurranged\value(k 6 t1 (j) => F0o(),
rence relation and then by combining resulting solutions, get
tl(j)<k t2(j) => Fl(j),
a solution for AkD[Ii. Consider the ranges [labeled (A)-(E)]
in turn:
(A) k. 1
tN(j) < k => FN(j))
The solution is A1 [j].
with t1(j).< t2(j) < * **<tN(j) and the Fp(j) not condi(B) 1<k'<j-1
tional. We then attempt to solve the unconditional relations
The solution is again A1 [j].
given
by the Fp(j), using the solution for the range k S tp(j)
(C) k=jandl<k
as the boundary condition for the next range tp(j) <k .
Here we have Ak [j ] = Akl [j- ]. We consider the values tp+1(j). We will not consider the method in any further detail
of the parameters on the right side to see whether, under the here; the interested reader is referred to [3] for a discussion of
conditions associated with the range, the expression reduces these techniques for solving conditional recurrence relations.
to cases already solved. Thus if we take k' = k-1 and j' = j -1,
and use the fact that k = j and hence k' = j', we have
G. Forcing Solutions
if k'> 1
There may, of course, be recurrence relations that cannot
A k,[; ] = A k'- [j'- 1]
= Al[ ]l
if k' < 1
be solved by any of the above methods. We then "force" a
solution in the form of a recursion equation. This at least
permits
us to capture what happens to the variable on each
1'A ranged conditional value is a symbolic expression for array values
used only by the loop analyzer. The general form of the j-th element of an cycle of the loop.
array on the k-th cycle would be
Suppose that the value of x is a recurrence relation that
ranged\value (k < t(j) => Fo(k, j),
we
cannot solve and that its again value is
<k<
ti(j)
t2(j)
=>
F,(k, j),
tN(J) < k => FN(k, j))
< tN(j) and the Fp(k, j) are unconditional.
where ti(j) < t2() <
.
.
=f(Xk,k)
and initial value x 1. We then construct a recursion equation
Xk+l
CHEATHAM et al.: SYMBOLIC EVALUATION
413
to represent its value as follows. We introduce a new function agree, and if the sharing takes the same form in both, then
the analysis associated with the template is reused. Othersymbol XR and give it the definition
wise, a second template is created, and the procedure is reXR = lambda(j, cond(j = 1, xl, f(xR(j- 1), j- 1)))
analyzed in the new environment. In general, a set of templates
is scanned when a call to a procedure is encountered,
which incorporates the initial value and the again value of x
and
a
previous analysis is reused if applicable. Hantler and
on each cycle. We then take as the solution for x
King [131 have proposed a similar approach in dealing with
x(k) = xR(k)
the problem of parameter aliasing.
The environment in which a procedure body is analyzed
In general, there may be several variables that we cannot solve
and their again values may involve the PIV's for several other contains a location for each formal parameter and free variable. Each of these locations is given a special value token
such variables. In this case we would have
representing the unknown initial value of the corresponding
xk+l = f(xk, Yk, * , Zk, k)
variable. Then the body is symbolically evaluated, producing
a result location, with its symbolic value, as well as new values
and we would introduce new function symbols XR YR,
for each of the free variable and formal parameter locations.
ZR to define a set of mutual recursion equations.
These value expressions are functional descriptions of the
V. PROCEDURE ANALYSIS
effects of the procedure. When a given call is analyzed, the
actual input values are substituted for the value tokens created
A. Basic Strategy
during analysis and the resulting expressions become, after
Given a procedure P which has formal parameters x *
the new values of the actual inputs and the
simplification,
xn and free variables'2 F1, * *, Fm, the procedure analyzer's
of
the
value
procedure
result.
task is to develop information that will let us determine, for
If
a
to
a
call
procedure
currently being analyzed is encounany particular call of P, what value will be computed and
the
evaluator
avoids starting a new analysis
tered,
symbolic
what the effects will be on the locations corresponding to the
and
the
as
Variable behavior patmarks
recursive.
procedure
shared parameters and free variables. If the procedure P were
not solved in the
terns
in
are
arising
recursive
procedures
not recursive, we could simply substitute the body of P for
of
At the point of a
current
the
version
evaluator.
symbolic
each call, with the appropriate bindings for actual parameters.
recursive
of
the
procedure
and each
call,
the
shared
parameters
This approach is not only unsuitable for recursive procedures,
the provariable
in
the
environment
modified
by
potentially
it would also be expensive since the same procedure body
unknown
cedure
are
value
tokens
symbolic
given
denoting
might be reanalyzed in many similar contexts.
One alternative is to perform a single analysis of the pro- values.
cedure in isolation, using special declarations of the modes B. Analysis of Recursive Procedures
of free variables and analyzing all possible patterns of sharing
A useful and readily identifiable subclass of recursive proamong formal parameters and free variables. This analysis
cedures
can be handled by our loop analysis techniques. We
could then be specialized for each particular call. In practice,
that
say
procedure P is a simple recursive procedure if it has
however, this approach would involve a good deal of unnecesat
most
one recursive call along any flow path (including
sary effort, since the full potential for sharing is rarely exfrom its entry to an exit. Suppose, for example,
loop
cycles)
ploited by an actual set of procedure calls.
P
that
has
a
single argument x and two paths of control, one
A method we find more efficient and more convenient for
with
a
recursive
call P(a):
the user (since there is no need for supplementary declarations) is the following. When a call to an as yet unanalyzed
P(xl:
A
a
procedure is found, that procedure is analyzed in the environment of the call. A template is created and associated with
C
D
E
0-0-the analysis. The template gives a generalized description of
the call environment, including the modes of actual parameter values and free variables, and the sharing patterns among
them, but not the values themselves. Analysis of the called Viewed this way, the behavior of P can be thought of as a loop
procedure reflects this generalized (weakened) description A-C of recursive descents, a computation A-B when control
of the environment. Once the analysis is complete and sym- has reached the "bottom," and then a loop D-E to "unwind"
bolic expressions describing the procedure's outputs have the recursive descent. In the loop A-C the initial value of the
been obtained, the actual input values are substituted in the parameter x will be the value of the actual parameter and the
output expressions to determine the precise effects on the cycle again value will be a. In the loop D-E the initial value
will be that computed on the A- B path, and the again value
calling environment.
will
be that computed at point E. The number of cycles taken
When a subsequent call of the same procedure is encounfor
each
loop is determined by the predicate which causes
tered, the new call environment is compared with the template.
to be taken in favor of path A-C. Thus we can in
A-B
path
If they match, i.e., if formal parameter and free variable modes
principle employ the loop analysis machinery described earlier
to analyze the pair of loops to determine the effects of a call
if there is a use of
in some
to
12A variable F is said be free
procedure p
F in P which lies outside the scope of any declaration in P of a formal parameter or local variable named F.
on P.
The issues which make this analysis slightly more compli-
414
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
cated arise from the fact that procedures in general return locations as values. We consider the following paradigms.
1) If the formal parameter x is bound by value, then on
each recursive call a new copy of the parameter is created. In
the unwind loop (D-E) we have to realize that, at point D, the
value of x on the i-th cycle of unwinding is the value of x at
point c during the (kL- i)-th cycle of descent (kL being the
number of cycles taken).
2) If x is bound by reference (and reoccurs in its position
as an actual argument in the recursive call), then the effects
on x during the recursive descent and unwinding accumulate,
much like what happens to the variables in a regular loop.
3) If neither 1) nor 2) holds or if the location resulting
from a recursive call on P is the target of an assignment, then
loop analysis techniques are not directly applicable.
VI. CONCLUSION
First a remark on the status of the implementation of the
symbolic evaluator. We have working versions of the symbolic interpreter, the simplifier, and those parts of the loop
analyzer which solve scalar and array variables. We are in
the midst of adding facilities to handle pointers inside a
loop, and recognizing those which behave like lists, trees,
and so on [321. We have an initial implementation of the
analyzer for nonrecursive, user-defined procedures. We plan
to incorporate a full solver for linear inequalities in the simplifier, using an adaptation of the simplex method similar to
that used in the Stanford Pascal verifier [24]. We are investigating ways of exploiting the simplex approach to facilitate instantiation of quantified variables in theorem-proving
applications of the simplifier. We expect to report on progress
in all the above areas as we proceed.
We began this paper by discussing the need for better programming tools for debugging, validation, verification, and
optimization. The EL1 symbolic evaluator is designed to act
as the program analysis component of tools in each of these
areas. We envision it as the heart of an integrated collection of
program development aids, providing a common database of
facts about the meaning and behavior of programs under
development.
As mentioned in the Introduction, the symbolic evaluator is
a static analyzer, not a dynamic interpreter that propagates
symbolic values. Nevertheless, some of the goals of the work
on symbolic execution [1], [5], [15], [20], e.g., symbolic
testing under a variety of assumptions about input values,
branch conditions and loop bounds, could easily be achieved
through symbolic evaluation. For example, suppose that a
programmer wants to validate a series expansion loop. Depending on the complexity of a generic term in the series,
the result of symbolic evaluation may be in closed form or
it may be a recursive formulation. If it is in closed form, it
can be compared directly with the intended series. Often in
fact, the recursive formulation is a sufficiently clear representation of the result to make further investigation unnecessary.
If not, the user could ask to see the result evaluated for particular inputs, to see it simplified under interesting constraints
on the inputs, or to see the recursive result expanded to as
many terms as needed to convince him that the loop is correct.
Moreover, these specializations of the general result could be
provided without reinterpretation of the program, as would
be necessary with strictly path-oriented tools.
Symbolic execution is also used in a number of program
verifiers based on the Floyd-Hoare inductive proof technique [9], [13]. Since our symbolic evaluator develops
value formulas for program variables throughout a program,
verification conditions can be generated from user-provided
assertions without a special reinterpretation of the program
for this purpose. We have built a verification condition generator [4] using a symbolic evaluator for a subset of EL1, and
we have extended our simplifier for use as a theorem prover
in first-order logic [331.
Another aspect of the usefulness of the symbolic evaluator
in verification concerns the derivation of invariant assertions
directly from a program. We have in mind here our use of
recurrence relations to describe the effects of iterations. In
simple cases this allows the user to omit inductive assertions
from loops altogether; in more complex cases he can concentrate on subtle invariants and omit the more straightforward
ones.
Tools for software validation, or fault detection, [11], [12],
[30] play a role complementary to that of verification. A
verifier tries to prove that a program correctly matches its
specifications; a fault detector tries to find possible violations
of the semantics of the programming language. Examples of
such faults include division by zero, dereferencing a null
pointer, component selection with an index out of range,
data generation with negative dimensions, and inconsistent
use of variant-type variables. To detect them, the symbolic
evaluator would generate validity conditions as it interprets
the program. A typical validity condition, generated on interpretation of the selection A [I], would assert that the
symbolic value of l must lie in the range [1, LENGTH(A)].
Later, the proof of such validity conditions can be attempted
using the facts developed by the symbolic evaluator. If it
succeeds, the program expressions giving rise to validity
checks can be marked valid, resulting in the suppression of
runtime checks by the code generator. If the proof fails,
we may nevertheless be able to state the conditions on input values under which the program would fail. Reference
[34] discusses this idea of fault detection in coordination
with symbolic evaluation, and we hope to implement a fault
detection facility in the near future.
Current trends in programming methodology and programming language design have amplified the need for excellent
optimization and specialization tools. Increasing use of structured programming, development by stepwise refinement,
and modular design through data abstraction and encapsulation facilities is changing the nature of the code optimization
problem. More code is being produced by mechanical means
(e.g., macro substitution, specialization of library algorithms,
combination of separately defined abstractions), and the attention of the programmer is turning to improving clarity and
maintainability, rather than efficiency. Optimizers must
therefore detect inefficiencies that might in the past have
been ignored on the assumption that no "reasonable" programmer would create them. In short, more reasoning power
is needed in optimization tools.
The symbolic evaluator is well-suited to serve as the basis of
CHEATHAM et al.: SYMBOLIC EVALUATION
such an optimizer. Reasoning power is supplied by the simplifier. The development of symbolic values for each variable
at each program point, fully reflecting the path condition leading to that point, is excellent flow analysis. Constant propagation, common subexpression and dead code recognition are
implicit. Transformations predicated on the appearance of
syntactic patterns with semantic constraints (22], [291 can be
applied to the program shadow produced by the symbolic interpreter. Because shadow expressions and symbolic values
have a standard form, it is less often necessary to provide
multiple patterns for a single transformation, as is the case
with a conventional pattern activated optimizer. Transformed
program segments can be reanalyzed in context to ensure that
they produce the intended results.
We have begun work on a source-to-source code optimizer
for EL1 based on the symbolic evaluator. Generation of optimized source-level code will allow programmers to assess
the results of optimization and to guide the code improvement
process. Ultimately, we envision a system in which the optimizer acts as the programmer's mechanical partner throughout the refinement of a program from its most abstract level,
through the choice of efficient representations, to an efficient
realization. Generation of machine code is thus just the last
in a series of specialization steps.
APPENDIX A
VALUE DETERMINATION
Let us consider the problem of determining the value of a
variable x at some program point P. To insure that the value
of each location shared with x is incorporated correctly, the
symbolic interpreter considers the flow paths to P individually.
A flow path to P is represented in the context graph by a
sequence of possible predecessor arcs leading from the context node for P to the entry node. Along each such path the
sharing relationships and the assignments to variables are unconditional; it is path branching that gives rise to conditionality. Assume for the moment that x is not shared with a
component of a larger object and that no other variable shares
a component of x. Each path to P is considered in turn and
415
after the declaration"3
DECL X: M1 SHARED [) Q => A [I, J]; CONST(M1) (]
the locations for x and A would be linked as follows:
(ii,j
Here <c, t> is the context tag for the right side of the exit
conditional Q => A[i, i ]. From the context tag and the context graph, the Boolean condition for sharing can be obtained.
[i, j] is the selection vector relating x to A. (Throughout this
example, lower case identifiers denote the symbolic values of
the corresponding upper case variable names.) The direction
of the arrow between the locations gives the sense of the containment, i.e., x is part of A.
As it explores the sharing graph in search of locations potentially shared with x, the value finding algorithm may
traverse directed links in either direction. As it does so it
creates a net selection vector representing the cumulative effects of selections along the path from x, plus the net direction of containment and a Boolean predicate over index expressions describing the conditions under which the sharing
will take place. For instance, if, after the declaration above,
there were further declarations
DECL W:M2 SHARED A[K];
DECL T:M3 SHARED W[J,M]
the sharing graph would become
(DS
K.JQ&(k
iWm
The search for locations shared with x would reveal that, if
x shares with A[i,j] and if i= k, then x also shares with
w [jI and x [ml shares with T.
Subsequent to the declarations of w and T above, we have
two paths to consider in determining the value of x. Along
the path on which q is FALSE, there are no locations shared
with x and the value lookup for x is simple. For the path on
which q is TRUE, sharing with A, w, and T must be considered.
Suppose that an assignment to w occurs, followed by a use
of x:
the most recent value of x or any location shared with x along
that path is obtained. The most recent value is that having the
largest time number not exceeding the time of P.
If the values on all paths are the same, then this is the value
of x at P. Otherwise we create a conditional expression that
combines the values for individual paths under the predicates
that distinguish one path from another. To obtain the enabling
predicate for the value for a particular path, we conjoin the
w <- W2;
predicates that comprise the path, excluding those that are
OUTPUT(X)
common to all paths to P. The results are combined in a
case expression, which, after simplification, is placed in the On the path with q = TRUE, X shares w [j provided i = k, and
I
ENV entry for x with the context tag of P.
x shares A [i, j ] unconditionally. The value for x derived by
When x may be shared with a component of a larger object the interpreter
A, or when another variable c shares a component of x, the
cond(i = k, w2[j], a[i,j])
representation of the sharing relationships and the task of obtaining a symbolic value for x becomes more complicated. exhibits both the conditionality
and the containment. On the
The sharing link created by the symbolic interpreter between
x and A reflects both the context in which sharing is estab13The expression CONST(M I) produces a new default value of type Ml,
lished and the symbolic index values (called the selection Thus,
in this declaration, if Q is FALSE, x will not be shared with another
vector) that lead to x when applied to A. For example, variable.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-5, NO. 4, JULY 1979
416
other hand, suppose an assignment to T precedes the first use
of x.
T <- T2;
OUTPUT(X)
T shares the m-th component of x, provided i = k, on the
paths with q = TRUE. Thus at the OUTPUT statement, the
value of x is
cond(i = k, store(a[i, j], <m>, t 2), a[i, j]),
which becomes
array(<n>,
<length(a[i,j])>,
case(i = k and n = m => t2,
i * k and n# m => a[i,j])).
In general, the value finding procedure will sort the values
for partially or conditionally shared locations along each path
by time number. It then builds a value expression as illustrated by these examples, using conditionals, selections, and
store expressions to relate possible values in the proper order.
The value finding algorithm for the symbolic evaluator is
complicated because of the flexible sharing rules in the ELI
language. However, it has the property that little extra expense is incurred in analyzing programs that make little use of
sharing. We prefer to implement an intricate though infrequently used lookup procedure in our analyzer rather than
restrict the use of sharing in the language and reduce its expressive power.
The symbolic interpreter establishes ENV locations for B, P,
and E. B and P are given unknown-value tokens as initial
values. Call these b and p, respectively. E'S location is initialized
to the REAL constant 1.0, type conversion having been implicitly performed in the shadow. At the beginning of the
loop, a PIV value token is added to the value list of each location that is referenced inside the loop. Thus the current
values of B, P, and E become, say bk, Pk, and ek, respectively.
(As in Section IV, the subscript k is used to suggest the value
of the corresponding location at the beginning of a general
cycle, i.e., the k-th cycle.) The symbolic interpreter then
processes the loop body. At the end of the loop body, the
values for B, P, and E, expressed as functions of the loop
entry values, are
bk+ I
=
bk
Pk+1 =Pk
ek+ I = ek *
1
bk.
A loop termination condition is also derived:
Pk 60.
The number of cycles taken is thus
kL = least(k, 1, infinity, Pk S 0).
The recurrence relations above, together with the constraint
1 . k . kL, are passed to the recurrence relation solver.
Since B is unchanged by the loop the solution for B as b is
obtained trivially. The solution for P, represented as p(k), is
determined to be p-k+l. e(k) may also be solved as
case(k = 1 => 1,
k > I => finite\product(j, 1, k- I, b)),
APPENDIX B
AN EXAMPLE OF SYMBOLIC EVALUATION
which becomes
We now present an example of a program involving a loop.
case(k = 1 => 1,
The symbolic evaluator is intended to support a variety of
k > I => expo(b, k- 1)),
tools, including a query facility to allow users to inspect the
program database. Until that query facility is implemented, and fmally
however, it is impractical to present transcripts of actual use
expo(b, k-l).
of the evaluator that would be intelligible to the general
reader. Nevertheless, we will describe in some detail the The number of cycles taken, kL, can now be expressed as
actions performed and the results developed by the analyzer.
least(k, 1, infinity, p-k+l < 0),
Consider an ELI function that raises a REAL value B to an
14
integer power P, when P is nonnegative:
which simplifies to
EXP <max(l, p+1).
EXPR(B:REAL, P:INT BYVAL; REAL)
Thus the values of P and E on loop exit are
BEGIN
DECL E:REAL BYVAL 1;
PkL = p - kL + 1,
REPEAT
which is normalized as
P LE 0 => 'Exit loop';
E <- E * B;
case(p < 0 => p,
P <- P 1;
pa >=>d),
END;
and
END
ekL = expo(b, max(0, p)),
-
14The keyword EXPR denotes an explicit procedure heading in ELI. In
this case, formal parameters with type REAL, and P, with type INT, are
declared, the latter bound by value. The result type of the procedure, in
this case REAL, follows the semicolon in the procedure heading.
e,
which becomes
case(p < 0 => 1,
p > 0 => expo(b, p)).
CHEATHAM et al.: SYMBOLIC EVALUATION
ekL in turn becomes the result of the whole procedure, whose
output is thereby expressed as a function of its inputs.
REFERENCES
[1] R. S. Boyer, B. Elspas, and K. N. Levitt, "SELECT-A formal
[21
[3]
[41
[51
[61
[71
[81
[91
[101
[11]
[121
[131
[14]
[151
[16]
[171
[181
[19]
[20]
[211
[221
[23]
system for testing and debugging programs by symbolic execution," in Proc. Int. Conf on Reliable Software, Los Angeles, CA,
Apr. 21-23, 1975, pp. 234-245.
T. E. Cheatham, Jr., and J. A. Townley, "Symbolic evaluation of
programs-A look at loop analysis," in Proc. ACMSymp. Symbolic
and Algebraic Computation, 1976, pp. 90-96.
T. E. Cheatham, Jr., and D. B. Washington, "Program loop analysis by solving first order recurrence relations," Cent. Res. Comput.
Tech., Harvard Univ., Cambridge, MA, Tech. Rep. 13-78, May
1978.
T. E. Cheatham, Jr., "Semantic models for programming languages," Cent. Res. Comput. Tech., Harvard Univ., Cambridge,
MA, Tech. Memo, July 1978.
L. A. Clarke, "A system to generate test data and symbolically
execute programs," IEEE Trans. Software Eng., vol. SE-2, pp.
215-222, Sept. 1976.
W. R. Conrad, "Rewrite user's guide," Cent. Res. Computing
Technology, Harvard Univ., Cambridge, MA, Tech. Memo, Aug.
1976.
-, "COST user's guide," Cent. Res. Comput. Tech., Harvard
Univ., Cambridge, MA, Tech. Memo, Nov. 1976.
-, "PROBE user's guide," Cent. Res. Comput. Tech., Harvard
Univ., Cambridge, MA, Tech. Memo, June 1976.
L. P. Deutsch, "An interactive program verifier," Ph.D. dissertation, Univ. California, Berkeley, May 1973.
B. Elspas, "The semiautomatic generation of inductive assertions for proving program correctness," Stanford Res. Inst.,
Menlo Park, CA, Res. Rep., July 1974.
L. D. Fosdick and L. J. Osterweil, "Data flow analysis in software reliability," Comput. Surveys, vol. 8, pp. 305-330, Sept.
1976.
S. M. German, "Automating proofs of the absence of common
runtime errors," in Conf Rec. 5th Annu. ACM Symp. Principles of Programming Languages, Tuscon, AZ, Jan. 23-25, 1978,
pp. 105-118.
S. L. Hantler and J. C. King, "An introduction to proving the
correctness of programs," Comput. Surveys, vol. 8, pp. 331-353,
Sept. 1976.
G. H. Holloway, "User's guide to the expression analyzer and
query facility," Cent. Res. Comput. Tech., Harvard Univ., Cambridge, MA, Tech. Memo, May 1976.
W. E. Howden, "Symbolic testing and the DISSECT symbolic
evaluation system," IEEE Trans. Software Eng., vol. SE-3, pp.
266-278, July 1977.
S. Igarashi, R. L. London, and D. C. Luckham, "Automatic
program verification 1: A logical basis and its implementation,"
Acta Informatica, vol. 4, pp. 145-182, 1975.
M. Karr, "Summation in finite terms," Massachusetts Computer
Associates, Wakefield, MA, Tech. Rep., Feb. 1976.
S. Katz and Z. Manna, "Logical analysis of programs," Commun.
Assoc. Comput. Mach., vol. 19, pp. 188-206, Apr. 1976.
J. C. King, "A program verifier," Ph.D. dissertation, Dep. Comput.
Sci., Carnegie-Mellon Univ., Pittsburgh, PA, June 1969.
-, "Symbolic execution and program testing," Commun.
Assoc. Comput. Mach., vol. 19, pp. 385-394, July 1976.
B. W. Lampson, J. J. Horning, R. L. London, J. G. Mitchell,
and G. J. Popek, "Report on the programming language Euclid,"
Sigplan Notices, vol. 12, Feb. 1977.
D. Loveman, "Program improvement by source-to-source transformation," J. Assoc. Comput. Mach., vol. 24, pp. 121-145,
Jan. 1977.
"ECL programmer's manual," Cent. Res. Comput. Tech., Har-
vard Univ., Cambridge, MA, Tech. Rep. 23-74, Dec. 1974.
[241 C. G. Nelson and D. C. Oppen, "A simplifier based on efficient
decision algorithms," in Conf. Rec. 5th Annu. ACM Symp.
Principles of Programming Languages, Tuscon, AZ, Jan. 23-25,
1978, pp. 141-150.
[25] J. A. Robinson, "A machine oriented logic based on the resolution principle," J. Assoc. Comput. Mach., vol. 12, Jan. 1965,
pp. 23-41.
[26] J. F. Rulifson, J. A. Derkson, and R. J. Waldinger, "QA4: A
procedure calculus for intuitive reasoning," Artificial Intelligence
417
[27]
[281
[29]
[301
[31]
[321
[331
[341
Cent., Stanford Res. Inst., Menlo Park, CA, Tech. Note 73, Nov.
1972.
G. R. Ruth, "Analysis of algorithm implementations," MIT Project MAC, Cambridge, MA, Tech. Rep. TR-130, May 1974.
R. L. Sites, "Proving that computer programs terminate cleanly,"
Ph.D. dissertation, Dep. Comput. Sci., Stanford Univ., Stanford,
CA, May 1974.
T. Standish, D. Harriman, D. Kibler, and J. Neighbors, "The
Irvine program transformation catalogue," Dep. Inform. and
Comput. Sci., Univ. California, Irvine, Jan. 1976.
N. Suzuki and K. Ishihata, "Implementation of an array bound
checker," in Conf. Rec. 4th Ann. ACM Symp. Principles of Programming Languages, Los Angeles, CA, Jan. 17-19, 1977, pp.
132-143.
J. A. Townley, "A symbolic interpreter for ELl," Cent. Res.
Comput. Tech., Harvard Univ., Cambridge, MA, Tech. Memo,
Nov. 1976.
-, "The analysis of pointers in programs," Cent. Res. Comput.
Tech., Harvard Univ., Cambridge, MA, Tech. Memo, Oct. 1978.
-, "An incremental approach to resolution-based theorem
proving," Cent. Res. Comput. Tech., Harvard Univ., Cambridge,
MA, Tech. Rep. 15-78, Aug. 1978.
-, "Program analysis techniques for software reliability," presented at the ACM Workshop on Reliable Software, Bonn Univ.,
Germany, Sept. 1978.
X1~v
_Thomas E. Cheatham, Jr. received the B.S. and
M.S. degrees in mathematics from Purdue University, Lafayette, IN, in 1951 and 1953,
respectively.
Since 1969 he has been Gordon McKay Professor of Computer Science and Director of the
Center for Research in Computing Technology
at Harvard University, Cambridge, MA. His current research interests include symbolic evaluation of programs, mechanical theorem proving
for program verification, and the construction
of systems for program development and maintenance.
Prof. Cheatham is a Fellow of the American Academy of Arts and
Sciences and a member of Sigma Xi and the Association for Computing
Machinery.
Glenn H. Holloway received the B.S. degree in
physics from Yale University, New Haven, CT,
in 1966 and the A.M. degree in physics from
Harvard University, Cambridge, MA, in 1968.
From 1974 to 1978, he was a member of the
research staff at Harvard's Center for Research
in Computing Technology. He is now a Ph.D.
candidate at Harvard. His research interests include program transformation, symbolic evaluation of programs, and code optimization.
Judy A. Townley received the B.A. degree in
mathematics from the University of Texas,
Austin, in 1967 and the Ph.D. degree from
_< Harvard University, Cambridge, MA, in 1973.
Since 1973 she has been a Research Fellow
at Harvard's Center for Research in Computing Technology. She has also held posts as
Lecturer on Applied Mathematics and as
Director of the Master of Information Sciences Program at Harvard. Her current research
activities are in the areas of language design,
program verification, symbolic evaluation, high-level optimization, and
systems for program development and maintenance.
Dr. Townley is a member of the Association for Computing Machinery.
_
© Copyright 2026 Paperzz