Underapproximation of Procedure Summaries for Integer Programs

Underapproximation of Procedure Summaries for
Integer Programs
Pierre Ganty1 , Radu Iosif2 , and Filip Konecny2
1
I MDEA Software Institute, Madrid, Spain
V ERIMAG /CNRS, Grenoble, France
2
Abstract. We show how to underapproximate the procedure summaries of recursive programs over the integers using off-the-shelf analyzers for non-recursive
programs. The novelty of our approach is that the non-recursive program we compute may capture unboundedly many behaviors of the original recursive program
for which stack usage cannot be bounded. Moreover, we identify a class of recursive programs on which our method terminates and returns the precise summary
relations without underapproximation. Doing so, we generalize a similar result
for non-recursive programs to the recursive case. Finally, we present experimental results of an implementation of our method applied on a number of examples.
1
Introduction
Procedure summaries are relations between the input and return values of a procedure,
resulting from its terminating executions. Computing summaries is important, as they
are a key enabler for the development of modular verification techniques for interprocedural programs, such as checking safety, termination or equivalence properties.
Summary computation is, however, challenging in the presence of recursive procedures
with integer parameters, return values, and local variables. While many analysis tools
exist for non-recursive programs, only a few ones address the problem of recursion.
In this paper, we propose a novel technique to generate arbitrarily precise underapproximations of summary relations. Our technique is based on the following idea. The
control flow of procedural programs is captured precisely by the language of a contextfree grammar. A k-index underapproximation of this language (where k ě 1) is obtained
by filtering out those derivations of the grammar that exceed a budget, called index, on
the number (at most k) of occurrences of non-terminals occurring at each derivation
step. As expected, the higher the index, the more complete the coverage of the underapproximation. From there we define the k-index summary relations of a program by
considering the k-index underapproximation of its control flow.
Our method then reduces the computation of k-index summary relations for a recursive program to the computation of summary relations for a non-recursive program,
which is, in general, easier to compute because of the absence of recursion. The reduction was inspired by a decidability proof [4] in the context of Petri nets.
The contributions of this paper are threefold. First, we show that, for a given index,
recursive programs can be analyzed using off-the-shelf analyzers designed for nonrecursive programs. Second, we identify a class of recursive programs, with possibly
unbounded stack usage, on which our technique is complete i.e., it terminates and returns the precise result. Third, we present experimental results of an implementation of
our method applied on a number of examples.
Related Work The problem of analyzing recursive programs handling integers (in general, unbounded data domains) has gained significant interest with the seminal work
of Sharir and Pnueli [24]. They proposed two approaches for interprocedural dataflow
analysis. The first one keeps precise values (call strings) up to a limited depth of the recursion stack. In contrast to the methods based on the call strings approach, our method
can also analyse precisely certain programs for which the stack is unbounded.
The second approach of Sharir and Pnueli is based on computing the least fixed
point of a system of recursive dataflow equations (the functional approach). This approach to interprocedural analysis is based on computing an increasing Kleene sequence
of summaries for control paths in the program of increasing, but bounded length. Recently [11], the Newton sequence was shown to converge at least as fast as the Kleene
sequence. The intuition behind the Newton sequence is to consider control paths in the
program of increasing index, and unbounded length. Our contribution can be seen as
a technique to compute the iterates of the Newton sequence for programs with integer
parameters, return values, and local variables.
The complexity of the functional approach was shown to be polynomial in the size
of the (finite) abstract domain, in the work of Reps, Horwitz and Sagiv [23]. This result is achieved by computing summary information, in order to reuse previously computed information during the analysis. Following up on this line of work, most existing
abstract analyzers, such as I NTER P ROC [19], also use relational domains to compute
overapproximations of function summaries – typically widening operators are used to
ensure termination of fixed point computations. The main difference of our method with
respect to static analyses is the use of underapproximation instead of overapproximation. If the final purpose of the analysis is program verification, our method will not
return false positives. Moreover, the coverage can be increased by increasing the bound
on the derivation index.
Several other teams have applied model checking based on abstraction refinement
to recursive programs. One such method, known as nested interpolants represents programs as nested word automata [3], which have the same expressive power as the visibly pushdown grammars used in our paper. Also based on interpolation is the W HALE
algorithm [2], which combines partial exploration of the execution paths (underapproximation) with the overapproximation provided by a predicate-based abstract post operator, in order to compute summaries that are sufficient to prove a given safety property.
Another technique, similar to W HALE, although not handling recursion, is the S MASH
algorithm [15] which combines may- and must-summaries for compositional verification of safety properties. These approaches are, however, different in spirit from ours,
as their goal is proving given safety properties of programs, as opposed to computing
the summaries of procedures independently of their calling context, which is our case.
We argue that summary computation can be applied beyond safety checking, e.g., to
prove termination [5], or program equivalence.
2
2
Preliminaries
Grammars A context-free grammar (or simply grammar) is a tuple G “ pX , Σ, δq where
X is a finite nonempty set of nonterminals, Σ is a finite nonempty alphabet and δ Ď
X ˆ pΣ Y X q˚ is a finite set of productions. The production pX, wq may also be noted
X Ñ w. Also define headpX Ñ wq “ X and tailpX Ñ wq “ w. Given two strings u, v P
pΣY X q˚ we define a step u ùñ v if there exists a production pX, wq P δ and some words
y, z P pΣ Y X q˚ such that u “ yXz and v “ ywz. We use ùñ˚ to denote the reflexive
transitive closure of ùñ. The language of G produced by a nonterminal X P X is the set
LX pGq “ tw P Σ˚ | X ùñ˚ wu and we call any sequence of steps from a nonterminal
X to w P Σ˚ a derivation from X. Given X ùñ˚ w, we call the sequence γ P δ˚ of
γ
productions used in the derivation a control word and write X ùñ w to denote that the
derivation conforms to γ.
Visibly Pushdown Grammars To model the control flow of procedural programs we
use languages generated by visibly pushdown grammars, a subset of context-free gramp “ Σ Y xΣ Y Σy, where
mars. In this setting, words are defined over a tagged alphabet Σ
xΣ “ txa | a P Σu intuitively represents procedure call site and Σy “ tay | a P Σu reprep δq is a
sents procedure return site. Formally, a visibly pushdown grammar G “ pX , Σ,
grammar that has only productions of the following forms, for some a, b P Σ:
X Ña
X ÑaY
X Ñ xa Y by Z
It is worth pointing that, for our purposes, we do not need a visibly pushdown grammar
to generate the empty string ε. Each tagged word generated by visibly pushdown grammars is associated a nested word [3] the definition of which we briefly recall. Given a
finite alphabet Σ, a nested word over Σ is a pair pw, ;q, where w “ a1 a2 . . . an P Σ˚ , and
; Ď t1, 2, . . . , nu ˆ t1, 2, . . . , nu is a set of nesting edges (or simply edges) where:
1. i ; j only if i ă j, i.e. edges only go forward;
2. |t j | i ; ju| ď 1 and |ti | i ; ju| ď 1, i.e. no two edges share a position;
3. if i ; j and k ; ` then it is not the case that i ă k ď j ă `, i.e. edges do not cross.
Intuitively, we associate a nested word to a tagged word as follows: there is an edge between tagged symbols xa and by iff both are generated at the same derivation step. For
instance looking forward at Ex. 2 consider the tagged word w “ τ1 τ2 xτ3 τ1 τ5 τ6 τ7 τ3 yτ4
resulting from a derivation Qinit
ñ˚ w. The nested word associated to w is
1
pτ1 τ2 τ3 τ1 τ5 τ6 τ7 τ3 τ4 , t3 ; 8uq. Finally, let w nw denote the mapping which given a
tagged word in the language of a visibly pushdown grammar returns the nested word
thereof.
Integer Relations We denote by Z the set of integers. Let x “ tx1 , x2 , . . . , xd u be a set of
variables for some d ą 0. Define x1 the primed variables of x to be tx11 , x21 , . . . , xd1 u. All
Ý
variables range over Z. We denote by Ñ
y an ordered sequence xy1 , . . . , yk y of variables,
Ñ
Ý
Ñ
Ý
Ý
and by | y | its length k. By writing y Ď x we mean that each variable in Ñ
y belongs to
Ý
Ýz of length k, let Ñ
Ý
Ýz stand for the equality Źk y “ z .
x. For sequences Ñ
y and Ñ
y “Ñ
i
i“1 i
ř
A linear term t is a linear combination of the form a0 ` di“1 ai xi , where a0 , a1 , . . . , ad P
Z. An atomic proposition is a predicate of the form t ď 0, where t is a linear term. We
consider formulae in the first-order logic over atomic propositions t ď 0, also known as
Presburger arithmetic.
3
A valuation of x is a function ν : x Ñ
Ý Z. The set of all valuations of x is denoted
Ý
Ý
by Zx . If Ñ
y “ xy1 , . . . , yk y is an ordered sequence of variables, we denote by νpÑ
yq
1
the sequence of integers xνpy1 q, . . . , νpyk qy. An arithmetic formula R px, y q defining a
respect to two valuations ν1 P Zx and ν2 P Zy , by replacing each x P x by ν1 pxq and each
y1 P y1 by ν2 pyq in R . The composition of two relations R1 Ď Zx ˆ Zy and R2 Ď Zy ˆ Zz
is denoted by R1 ˝ R2 “ tpu, vq P Zx ˆ Zz | Dt P Zy . pu, tq P R1 and pt, vq P R2 u. For a
subset y Ď x, we denote νÓy P Zy the projection of ν onto variables y Ď x.
3
Integer Recursive Programs
We consider in the following that programs are collections of procedures that call each
other, possibly according to recursive schemes. Formally, an integer program is an indexed tuple P “ xP1 , . . . , Pn y, where P1 , . . . , Pn of procedures. Each procedure is a tuple
Ý
Ñ
Ý out
init
3
Pi “ xxi , Ñ
x in
i , x i , Si , qi , Fi , ∆i y, where xi are the local variables of Pi (xi X x j “ H
in Ñ
out
Ñ
Ý
Ý
for all i ‰ j), x i , x i Ď xi are the ordered tuples of input and output variables, Si are
the control states of Pi (Si X S j “ H, for all i ‰ j), qinit
i P Si is the initial, and Fi Ď Si are
the final states of Pi , and ∆i is a set of transitions of one of the following forms:
R pxi ,x1 q
i
– q ÝÝÝÝÝÑ
q1 is an internal transition, where q, q1 P Si , and R pxi , x1i q is a Presburger
arithmetic relation involving only the local variables of Pi
Ñ
Ýz 1 “Pj pÑ
Ý
uq
Ý
– q ÝÝÝÝÝÝÝÑ q1 is a call, where q, q1 P Si , Pj is the callee, Ñ
u are linear terms over
in
Ñ
Ý
Ñ
Ý
Ñ
Ý
Ñ
Ý
Ý
x , z Ď x are variables, such that | u | “ | x | and | z | “ |Ñ
x out |.
i
i
j
j
The call graph of a program P “ xP1 , . . . , Pn y is a directed graph with vertices P1 , . . . , Pn
and edges pPi , Pj q if and only if Pi has a call to Pj . A program is said to be recursive if
its call graph has at least one cycle, and non-recursive if its call graph is
Ťa dag. Finally,
let nF pPi q denotes the set Si zFi of non-final state of Pi . Also nF pP q “ ni“1 pSi zFi q.
Simplified syntax To ease the description of programs defined in this paper, we use a
simplified, human readable, imperative language such that each procedure of the program conforms to the following grammar:4
P ::“ proc Pi pid˚ qbegin var id˚ S end
S ::“ S; S | assume f | id n Ð t n | id Ð Pi pt ˚ q | Pi pt ˚ q | return pid ` εq | goto `` | havoc id`
The local variables occurring in P are denoted by id, linear terms by t, Presburger
formulae by f , and control labels by `. Each procedure consists in local declarations
followed by a sequence of statements. Statements may carry a label. Program statements
can be either assume statements5 , (parallel) assignments, procedure calls (possibly with
a return value), return to the caller (possibly with a value), non-deterministic jumps and
3
4
5
Observe that there are no global variables in the definition of integer program. Those can be
encoded as input and output variables to each procedure.
Our simplified syntax does not seek to capture the generality of integer programs. Instead, our
goal is to give a convenient notation for the programs given in this paper and only those.
assume f is executable if and only if the current values of the variables satisfy f .
4
qinit
1
proc: Ppxq
then:
else:
begin
var z;
assume x ě 0;
goto then or else;
assume x ą 0;
z Ð Ppx ´ 1q;
z Ð z ` 2;
return z;
assume x ď 0;
z Ð 0;
havoc x;
return z;
end
t1
x ě 0 ^ x 1 “ x ^ z1 “ z
q2
x ą 0 ^ x1
“ x ^ z1
“ z t2 t5
q3
z1 “ Ppx ´ 1q ^ x1 “ x t3
q4
z1
“ z ` 2 ^ x1
“ x t4
q1f
x ď 0 ^ x 1 “ x ^ z1 “ z
q6
t6
z1 “ 0 ^ x 1 “ x
q7
t7
z1 “ z
q2f
Fig. 1. Example of a simplified imperative program and its integer program thereof
havoc statements6 . We consider the usual syntactic requirements (used variables must
be declared, jumps are well defined, no jumps outside procedures, etc.). We do not
define them, it suffices to know that all simplified programs in this paper comply with
the requirements. A program using the simplified syntax can be easily translated into
the formal syntax, as shown at Fig. 1.
Example 1. Figure 1 shows a program in our simplified imperative language and its corresponding integer program P . Formally, P “ xPy where P is defined as:
1
2
init
1
2
xtx, zu, xxy, xzy, tqinit
1 , q2 , q3 , q4 , q6 , q7 , q f , q f u, q1 , tq f , q f u, tt1 ,t2 ,t3 ,t4 ,t5 ,t6 ,t7 uy.
Since P calls itself (t3 ), this program is recursive.
Semantics We are interested in computing the summary relation between the values of
the input and output variables of a procedure. To this end, we give the semantics of a
program P “ xP1 , . . . , Pn y as a tuple of relations Rq describing, for each non-final control
state q P nF pPi q, the effect of the program when started in q upon reaching a state in
p
Fi . An interprocedurally valid path is represented by a tagged word over an alphabet Θ,
which maps each internal transition t to a symbol τ, and each call transition t to a pair
p In the sequel, we denote by Q the variable corresponding to the
of symbols xτ, τy P Θ.
control state q, and by τ P Θ the alphabet symbol corresponding to the transition t of
P . Formally, we associate to P a visibly pushdown grammar, denoted in the rest of the
p δq, such that:
paper by GP “ pX , Θ,
– Q P X if and only if q P nF pP q;
R
– Q Ñ τ Q1 P δ if and only if t : q Ý
Ñ q1 and q1 P nF pP q;
R
– Q Ñ τ P δ if and only if t : q Ý
Ñ q1 and q1 R nF pP q;
Ñ
Ýz 1 “Pj pÑ
Ý
uq
1
– Q Ñ xτ Qinit
τy
Q
P
δ
if
and
only
if
t
:
q
Ý
ÝÝÝÝÝÝÑ q1 .
j
6
havoc x1 , x2 , ¨ ¨ ¨ , xn assigns non deterministically chosen integers to x1 , x2 , ¨ ¨ ¨ , xn .
5
It is easily seen that interprocedurally valid paths in P and tagged words in GP are
in one-to-one correspondence. In fact, each interprocedurally valid path of P between
state q P nF pPi q and a state of Fi , where 1 ď i ď n, corresponds exactly to one tagged
word of LQ pGP q.
Example 2. (continued from Ex. 1) The visibly pushdown grammar GP corresponding
to P consists of the following variables and labelled productions:
def
def
pc3 “ Q3 Ñ xτ3 Qinit
1 τ 3 y Q4
def
pa4 “ Q4 Ñ τ4
def
pb6 “ Q6 Ñ τ6 Q7
pb1 “ Qinit
1 Ñ τ 1 Q2
def
pb2 “ Q2 Ñ τ2 Q3
def
pb5 “ Q2 Ñ τ5 Q6
def
pa7 “ Q7 Ñ τ7
LQinit pGP q includes the word w “ τ1 τ2 xτ3 τ1 τ5 τ6 τ7 τ3 yτ4 , defining the nested word w nwpwq “
1
pτ1 τ2 τ3 τ1 τ5 τ6 τ7 τ3 τ4 , t3 ; 8uq. The word w corresponds to an interprocedurally valid
path where P calls itself once. Let γ1 “ pb1 pb2 pc3 pa4 pb1 pb5 pb6 pa7 and γ2 “ pb1 pb2 pc3 pb1 pb5 pb6 pa7 pa4
γ1
γ2
init
be two control words we have Qinit
1 ùñ w and Q1 ùñ w.
The semantics of a program is the union of the semantics of the nested words corresponding to its executions, each of the latter being a relation over input and output
p an
variables. To define the semantics of a nested word, we first associate to each τ P Θ
integer relation ρτ , defined as follows:
R
– for an internal transition t : q Ý
Ñ q1 P ∆i , let ρτ ” R pxi , x1i q Ď Zxi ˆ Zxi
Ñ
Ýz 1 “Pj pÑ
Ý
uq
1
Ý
x inj “
– for a call transition t : q ÝÝÝÝÝÝÝÑ q1 P ∆i , we define a call relation ρxτ ” pÑ
Ýz 1 “ Ñ
Ý
Ñ
Ý
xj
x j , a return relation ρ ” pÑ
xi
x out
u q Ď Zxi ˆ ZŹ
τy
j q Ď Z ˆ Z and a frame
1
x
x
i
i
Ýz x “ x Ď Z ˆ Z
relation φτ ” xPxi zÑ
We define the semantics of the program P “ xP1 , . . . , Pn y in a top-down manner. Assuming a fixed ordering of the non-final states in the program, i.e. nF pP q “ xq1 , . . . , qm y,
the semantics of the program P , denoted JP K, is the tuple of relations xJq1 K, . . . , Jqm Ky.
xi
For each non-final control state q P nF pPi q where 1 ď i ď n, we denote by
Ť JqK Ď Z ˆ
x
i
Z the relation (over the local variables of procedure Pi ) defined as JqK “ αPLQ pGP q JαK.
It remains to define JαK, the semantics of the tagged word α. Because it is more convenient, we define the semantics of its corresponding nested word w nwpαq “ pτ1 . . . τ` , ;q
over alphabet Θ. For a nesting relation ; Ď t1, . . . , `u ˆ t1, . . . , `u, we denote by ;i, j
the relation tps ´ pi ´ 1q,t ´ pi ´ 1qq | ps,tq P ; X ti, . . . , ju ˆ ti, . . . , juu, for some i, j P
t1, . . . , `u, i ă j. Finally, we define Jpτ1 . . . τ` , ;qK Ď Zxi ˆ Zxi (recall that α P LQ pGP q
and q is a state of Pi ) as follows:
$
’
if ` “ 1
&ρτ1
Jpτ1 . . . τ` , ;qK “ ρτ1 ˝ Jpτ2 . . . τ` , ;2,` qK
if ` ą 1 and 1 ­; j for all 1 ď j ď `
’
% ˝
Rτ Jpτ j`1 . . . τ` , ; j`1,` qK if ` ą 1 and 1 ; j for some 1 ď j ď `
Ñ
Ýz 1 “P pÑ
Ý
uq
where, in the last case, which corresponds to call transition t : q ÝÝÝÝÝdÝÝÑ q1 P ∆i , we
˘
`
have τ1 “ τ j “ τ and Rτ “ ρxτ ˝ Jτ2 . . . τ j´1 , ;2, j´1 qK ˝ ρτy X φτ .
6
Example 3. (continued from Ex. 2) Given the nested word θ “ pτ1 τ2 τ3 τ1 τ5 τ6 τ7 τ3 τ4 , t3 ; 8uq
its semantics, JθK, is a relation
between valuations of tx, zu, given˘by:
`
ρτ1 ˝ ρτ2 ˝ pρxτ3 ˝ ρτ1 ˝ ρτ5 ˝ ρτ6 ˝ ρτ7 ˝ ρτ3 y q X φt3 ˝ ρτ4
One can verify that JθK ” x “ 1 ^ z1 “ 2, i.e. the result of calling P with an input valuation x “ 1 is the output valuation z “ 2.
Finally, we introduce a few useful notations. By JP Kq we denote the component
of JP K corresponding to q P nF pP q. Slightly abusing notations we define LPi pGP q as
i{o
LQinit pGP q and JP KPi as JP Kqinit . Finally, define JP KPi “ txIÓxin , OÓxiout y | xI, Oy P JP KPi u.
i
4
i
i
Underapproximating the Program Semantics
In the section we define a family of underapproximations of JP K called bounded-index
underapproximations. Then we show that each k-index underapproximation of the semantics of a (possibly recursive) program P coincides with the semantics of a nonrecursive program computable from P and k. The central notion of bounded-index
derivation is introduced in the following followed by basic properties about them.
Definition 1. Given a grammar G with relation ùñ between strings, for every k ě 1
pkq
pk q
we define the subrelation ùñ of ùñ as follows: u ùñ v iff u ùñ v and both u and v
pkq
contain at most k occurrences of variables. We denote by “ùñ‹ the reflexive transitive
pkq
pk q
pk q
closure of ùñ. Hence given X and k define LX pGq “ tw P Σ˚ | X “ùñ‹ wu and we
call the derivation of w P Σ˚ from X a k-index derivation. A grammar G is said to have
pk q
index k whenever LX pGq “ LX pGq for each X P X .7
pkq
pk`1q
Lemma 1. For every grammar the following properties hold: (1) ùñ Ď ùñ for all
pk q
Ť
pk q
‹
˚
k ě 1; (2) ùñ “ 8
k“1 ùñ; (3) BC “ùñ w P Σ iff there exist w1 , w2 such that w “
pk´1q
pk´1q
pkq
pk q
w1 w2 and either (i) B “ùñ‹ w1 , C “ùñ‹ w2 , or (ii) C “ùñ‹ w2 and B “ùñ‹ w1 .
The main intuition behind our method is to filter out interprocedurally valid paths
which can not be produced by k-index derivations. Our analysis is then carried out on
the remaining paths produced by k-index derivations only. We argue that this underapproximation technique is more general than bounding the stack space of the program
which corresponds to filter out derivations which are either non leftmost8 or not k-index.
Example 4. (continued form Ex. 2) P is a (non-tail) recursive
` procedure
˘n and G`P models
˘n
its control flow. Inspecting GP reveals that LQinit pGP q “ t τ1 τ2 xτ3 τ1 τ5 τ6 τ7 τ3 yτ4 |
1
n ě 0u. For each value of n we give a 2-index derivation capturing the word: repeat
pb1 pb2 pc3
pa4
init
n times the steps Qinit
ùñ τ1 τ2 xτ3 Qinit
1
1 τ3 yQ4 ùñ τ1 τ2 xτ3 Q1 τ3 yτ4 followed by the
7
8
pkq
Gruska [17] proved that deciding whether LX pGq “ LX pGq for some k ě 1 is undecidable.
A leftmost derivation is a derivation where, at each step, the production that is applied rewrites
the leftmost nonterminal.
7
pb1 pb5 pb6 pa7
ùñ τ1 τ5 τ6 τ7 . Therefore the 2-index approximation of GP shows that
steps Qinit
1
p2q
LQinit pGP q “ LQinit pGP q. However bounding the number of times P calls itself up to 2
1
1
results in 3 interprocedurally valid paths (for n “ 0, 1, 2).
Given k ě 1, we define the k-index semantics of P as JP Kpkq “ xJq1 Kpkq , . . . , Jqm Kpkq y,
where the k-index semantics of a non-final
Ť control state q of a procedure Pi is the relation
JqKpkq Ď Zxi ˆ Zxi , defined as JqK “ αPLpkq pG q JαK.
Q
4.1
P
Computing Bounded-index Underapproximations
In what follows, we define a source-to-source transformation that takes in input a recursive program P , an integer k ě 1 and returns a non-recursive program H k which has
the same semantics as JP Kpkq (modulo projection on some variables). Therefore every
off-the-shelf tool, that computes the summary semantics for a non-recursive program,
can be used to compute the k-index semantics of P , for any given k ě 1.
Ť
Let P “ xP1 , . . . , Pn y be a program, and x “ ni“1 xi be the set of all variables in
P . As we did previously, we assume a fixed ordering xq1 , . . . , qm y on the set nF pP q.
p δq be the visibly pushdown grammar associated with P , such that
Let GP “ pX , Θ,
each non-final state q of P is associated a nonterminal Q P X . Then we define a nonrecursive program H K that captures
Ś the K-index semantics of P (Algorithm 1), for
K ě 1. Formally, we define H K “ Kk“0 xquerykQ1 , . . . , querykQm y, where:
– for each k “ 0, . . . , K and each control state q P nF pP q, we have a procedure querykQ ;
– in particular, query0Q1 , . . . , query0Qm consists of one assume false statement;
– each procedure querykQ has five sets of local variables, all of the same cardinality as
x: two sets, named xI and xO , are used as input variables, whereas the other three
sets, named xJ , xK and xL are used locally by querykQ . Besides, querykQ has a local
variable called PC. There are no output variables.
Observe that each procedure querykQ calls only procedures querykQ´1 1 for some Q1 , hence
the program H K is non-recursive, and therefore amenable to summarization techniques
that cannot handle recursion. Also the hierarchical structure of H K enables modular summarization by computing the summaries ordered by increasing values of k “
0, 1, . . . , K. The summaries of H K ´1 are reused to compute H K . Finally, it is routine
to check that the size of H K (viz. the number of statements) is in OpK ¨ 7Prodq where
7Prod is the number of productions of GP . Consequently the time needed to generate
H K is linear in the product K ¨ 7Prod.
Given that querykQ has two copies of x as input variables, and no output variables,
i {o
the input output semantics JH Kqueryk Ď Zxˆx is a set of tuples, rather than a (binary)
Q
relation. Given two valuations I, O P Zx , we denote by I ¨ O P Zxˆx their concatenation.
Thm. 1 relates the semantics of H K and the K-index semantics of P . Given k,
i {o
pkq
1 ď k ď K and a control state q of P , we show equality between JH K Kqueryk and JP Kq
Q
over common variables. Before starting, we fix an arbitrary value for K and require that
each k is such that 1 ď k ď K. Hence, we drop K in H K and write H .
8
Algorithm 1: proc querykQ pxI , xO q for k ě 1
start:
pa1 :
pana :
begin
var PC, xJ , xK , xL ;
PC Ð Q;
goto pa1 or ¨ ¨ ¨ or pana or pb1 or ¨ ¨ ¨ or pbnb or pc1 or ¨ ¨ ¨ or pcnc ;
assume pPC “ headppa1 qq; assume ρtailppa1 q pxI , xO q; return;
..
.
assume pPC “ headppana qq; assume ρtailppan q pxI , xO q; return;
a
assume pPC “ headppb1 qq; [ paste code for case tailppb1 q P Θ ˆ X ];
..
.
assume pPC “ headppbnb qq; [ paste code for case tailppbnb q P Θ ˆ X ];
assume pPC “ headppc1 qq; [ paste code for case tailppc1 q P xΘ ˆ X ˆ Θy ˆ X ];
..
.
assume pPC “ headppcnc qq; [ paste code for case tailppcnc q P xΘ ˆ X ˆ Θy ˆ X ];
pb1 :
pbnb :
pc1 :
pcnc :
end
case: tailppbi q “ τ Q1 PΘˆX
1
case: tailppci q “ xτ Qinit
j τy Q P xΘ ˆ X ˆ Θy ˆ X
havoc (xJ );
assume ρτ pxI , xJ q;
xI Ð xJ ;
PC Ð Q1 ; // querykQ1 pxI , xO q
goto start;
// return
In Alg. 1, pαi where α P ta, b, cuord:
refers to a production of the
visibly pushdown grammar GP .
The same symbol in boldface
refers to the labelled statementsrod:
in Alg. 1. The superscript α P
ta, b, cu differentiate the productions whether they are the
form Q Ñ τ, Q Ñ τ Q1 or
1
Q Ñ xτ Qinit
j τy Q , respectively.
havoc (xJ , xK , xL );
assume ρxτ pxI , xJ q ;
assume ρτy pxK , xL q ;
assume φτ pxI , xL q ;
goto ord or rod;
k´1
queryQ
init pxJ , xK q;
/* call relation */
/* return relation */
/* frame relation */
/* in order exec. */
j
xI Ð xL ;
PC Ð Q1 ;
goto start;
k´1
queryQ
1 pxL , xO q;
xI Ð xJ ;
xO Ð xK ;
PC Ð Qinit
j ;
//
querykQ1 pxI , xO q
// return
/* out of order exec. */
//
goto start;
queryk init pxI , xO q
Qj
//
return
One way to prove Thm. 1 consists in first unfolding the definitions of the semantics
Ť
Ť
pk q
as follows: JH Kqueryk “ αPL k pG q JαK, JP Kq “ βPLpkq pG q JβK then establish a
Q
queryQ
H
Q
P
relationship between the α’s and the β’s that implies the equivalence of their semantics
over common variables. Instead, we follow an equivalent, but more intuitive, approach
in which the semantics of H is obtained by interpreting directly its code. After all,
the interprocedurally valid paths in procedure querykQ are in one-to-one correspondence
with the words of Lqueryk pGH q.
Q
An inspection of the code of H reveals that H simulates k-index depth first derivations of GP and interprets the statements of P on its local variables while applying
9
derivation steps. By considering non necessarily leftmost derivations H interprets the
statements of P in an order which differs from the expected one.
Example 5. Let us consider an execution of query for the call query2Qinit pp 1 0 q, p 1 2 qq
1
Qinit
1
pb1 pb2 pc3
τ1 τ2 xτ3 Qinit
1 τ3 yQ4
pa4
ùñ τ1 τ2 xτ3 Qinit
1 τ3 yτ4
pb1 pb5 pb6 pa7
following
ùñ τ1 τ2 xτ3 τ1 τ5 τ6 τ7 τ3 yτ4 .
ùñ
In the table below, the first row (labelled k{PC) gives the caller (1 “ query1Q4 , 2 “
query2Qinit ) and the value of PC when control hits the labelled statement given at the
1
second row (labelled ip). The third row (labelled xI {xO ) represents the content of the
two arrays. xI {xO “ p a b qp c d q says that, in xI , x has value a and z has value b; in xO , x
has value c and z has value d.
2{Qinit
2{Q2
2{Q2
2{Q3
2{Q3
2{Q3
k{PC 2{Qinit
1
1
start
pb1
start
pb2
start
pc3
rod
ip
xI {xO p 1 0 qp 1 2 q p 1 0 qp 1 2 q p 1 0 qp 1 2 q p 1 0 qp 1 2 q p 1 0 qp 1 2 q p 1 0 qp 1 2 q p 1 0 qp 1 2 q
k{PC
1{Q4
1{Q4
2{Qinit
2{Qinit
2{Q2
2{Q2
2{Q6
1
1
a
start
start
pb5
start
pb1
start
p4
ip
xI {xO p 1 0 qp 1 2 q p 1 0 qp 1 2 q p 0 0 qp 42 0 q p 0 0 qp 42 0 q p 0 0 qp 42 0 q p 0 0 qp 42 0 q p 0 0 qp 42 0 q
k{PC
2{Q6
2{Q7
2{Q7
ip
pb6
start
pa7
xI {xO p 0 0 qp 42 0 q p 0 0 qp 42 0 q p 0 0 qp 42 0 q
The execution of query2Qinit starts on row 1, column 1 and proceeds until the call to
1
query1Q4 at row 2, column 1 (the out of order case). The latter ends at row 2, column
2, where the execution of query2Qinit resumes. Since the execution is out of order, and
1
the previous havocpxJ , xK , xL q results into xJ “ p 0 0 q, xK “ p 42 0 q and xL “ p 1 0 q (this
choice complies with the call relation), the values of xI {xO are updated to p 0 0 q{p 42 0 q.
The choice for equal values (0) of z in both xI and x0 is checked in row 3, column 3. Theorem 1. Let P “ xP1 , . . . , Pn y be a program and let q P nF pPi q be a non-final control
Ñ
Ý out
Ý
init
state of some Pi “ xxi , Ñ
x in
i , x i , Si , qi , Fi , ∆i y. Then, for any k ě 1, we have:
i{o
JH Kqueryk “ tI ¨ O P Zxˆx | xIÓxi , OÓxi y P JP Kpqkq u .
Q
Consequently, we also have:
i {o
JP Kpqkq “ txIÓxi , OÓxi y | I ¨ O P JH Kqueryk u .
Q
The proof of Thm. 1 is based on the following lemma.
Lemma 2. Let k ě 1, q be a non-final control state of Pi and I, O P Zx . If the call to
pk q
pk q
querykQ pI, Oq returns then xIÓxi , OÓxi y P JP Kq . Conversely, if xIÓxi , OÓxi y P JP Kq then
there exists I 1 , O1 P Zx such that I 1 Óxi “ IÓxi , O1 Óxi “ OÓxi and querykQ pI 1 , O1 q returns.
Proof: First we consider a tail-recursive version of Algorithm 1 which is obtained by replacing every two statements of the form PC Ð X ; goto start ; by querykX pxI , xO q ; return ;
(as it appears in the comments of Alg. 1). The equivalence between Algorithm 1 and its
tail-recursive variant is an easy exercise.
10
pk q
“ð” Let xI Óxi , OÓxi y P JP Kq . By definition of k-index semantics, there exists α P
pkq
LQ pGP q such that xIÓxi , OÓxi y P JαK. Let p1 be the first production used in the derivation of α and let ` ě 1 be the length (in number of productions used) of the derivation.
Our proof proceeds by induction on `. If ` “ 1 then we find that p1 must be of the form
Q Ñ τ and that α “ τ. Therefore we have JαK “ JτK “ ρτ and moreover xIÓxi , OÓxi y P ρτ .
pkq
Since k ě 1, we let I 1 “ I and O1 “ O we find that queryQ pI 1 , O1 q returns by choosing to
jump to the label corresponding to p1 , then executing the assume statement and finally
the return statement. When ` ą 1, the proof divides in two parts.
1. If p1 is of the form Q Ñ τ Q1 then we find that α “ τ β. Moreover, xI Óxi , OÓxi y P
JαK “ ρτ ˝ JβK by definition of the semantics. This implies that there exists J P Zx such
pkq
that xIÓxi , JÓxi y P ρτ and xJÓxi , OÓxi y P JβK. Hence, we conclude from β P LQ1 pGP q, that
pk q
xJÓxi , OÓxi y P JP KQ1 . Applying the induction hypothesis on this last fact, we find that the
pkq
call queryQ1 pJ, Oq returns. Finally consider the call querykQ pI, Oq where at label start
the jump goes to label corresponding to p1 . At this point in the execution havocpxJ q
returns J. Next assume ρτ pI, Jq succeeds. Finally we find that the call to querykQ pI, Oq
returns because so does the call querykQ1 pJ, Oq which is followed by return.
1
1
1
2. If p1 is of the form Q Ñ xτ Qinit
j τy Q then we find that α “ xτ β τy β for some β , β.
pk´1q
pkq
pkq
Lemma 1 (prop. 3) shows that either β1 P LQinit pGP q and β P LQ1 pGP q or β1 P LQinit pGP q
j
j
pk´1q
and β P LQ1
pGP q. We will assume the former case, the latter being treated similarly.
´`
Moreover, xIÓxi , OÓxi y P JαK “ Rτ ˝ JβK. The leftmost relation can be rewritten, ρxτ ˝
¯
˘
Jβ1 K ˝ ρτy X φτ ˝ JβK which by definition of β, β1 and the semantics is in included in
´`
¯
˘
pk´1q
pkq
ρxτ ˝ JP Kqinit ˝ ρτy Xφτ ˝ JP KQ1 . We conclude from the previous relation that there
j
pk´1q
exists J, K, L P Zx such that xIÓxi , JÓxi y P ρxτ , xJÓxi , KÓx j y P JP Kqinit , xKÓx j , LÓx j y P ρyτ ,
j
pk q
and xLÓxi , OÓxi y P JP Kq1 . Applying the induction hypothesis we obtain that the calls
querykQ´init1 pJ, Kq and querykQ1 pL, Oq return. Given those facts, it is routine to check that
j
querykQ pI 1 , O1 q returns by choosing to jump to label corresponding to p1 , then having
havocpxJ , xK , xL q return pJ, K, Lq and we are done.
The proof for the only if direction is in appendix A.2.
\
[
Kp k q u 8
k “1
As a last point, we prove that the bounded-index sequence tJP
satisfies several conditions that advocate for its use in program analysis, as an underapproximation
sequence. The subset order and set union is extended to tuples of relations, point-wise.
JP Kpkq Ď JP Kpk`1q
for all k ě 1 pA1q
Ť
pkq
pA2q
JP K “ 8
J
P
K
k “1
Condition (A1) requires that the sequence is monotonically increasing, the limit of this
increasing sequence being the actual semantics of the program (A2). These conditions
follow however immediately from the two first points of Lemma 1. To decide whether
the limit JP K has been reached by some iterate JP Kpkq , it is enough to check that the
11
tuple of relations in JP Kpkq is inductive with respect to the statements of P . This can be
implemented as an SMT query.
5
Completeness of Underapproximations for Bounded Programs
In this section we define a class of recursive programs such that the precise summary
semantics of each program in that class is effectively computable. We show for each
program P in the class that (a) JP K “ JP Kpkq for some value of k ě 1, and moreover
(b) the semantics of H k is effectively computable (and so is that of JP Kpkq by Thm. 1).
Periodic Relations Given an integer relation R Ď Zn ˆ Zn , we define R0 as the identity
relation on Zn , and Ri`1 “ Ri ˝ R, for all i ě 0. The closed form of R is a formula defining
p “ Rn . In general, the
a relation Rp Ď N ˆ Zn ˆ Zn such that, for each n ě 0 we have Rpnq
closed form of a relation is not definable within decidable subsets of integer arithmetic,
such as Presburger arithmetic. In this section we consider two classes of relations, called
periodic, for which this is possible, namely octagonal relations, and finite monoid affine
relations. The formal definitions are deferred to appendix A.3.
Bounded languages We define a bounded-expression b to be a regular expression of
the form b “ w˚1 . . . w˚k , where k ě 1 and each wi is a non-empty word. A language (not
necessarily context-free) L over alphabet Σ is said to be bounded if and only if L is
included in (the language of) a bounded expression b.
Theorem 2 ([21]). Let G “ pX , Σ, δq be a grammar, and X P X be a nonterminal, such
pk q
that LX pGq is bounded. Then LX pGq “ LX pGq for some k ě 1.
The class of programs for which our method is complete is defined below:
p δq be its corresponding visibly
Definition 2. Let P be a program and GP “ pX , Θ,
pushdown grammar. Then P is said to be bounded periodic if and only if:
1. LX pGP q is bounded for each X P X ;
p is periodic.
2. each relation ρτ occurring in the program, for some τ P Θ,
p2q
Example 6. (continued from Ex. 4) Recall that LQinit pGP q “ LQinit pGP q which equals to
1
`
˘n
`
˘n
`1
˘˚
`
˘˚
the set t τ1 τ2 xτ3 τ1 τ5 τ6 τ7 τ3 yτ4 | n ě 0u Ď τ1 τ2 xτ3 τ˚1 τ˚5 τ˚6 τ˚7 τ3 yτ4 .
Concerning condition 1, it is decidable [14] and previous work [16] defined a class
of programs following a recursion scheme which ensures boundedness of the set of
interprocedurally valid paths. Moreover, when condition 1 does not hold, one can still
pick a bounded expression b and enforce boundedness by replacing GP with grammar
G1P , such that LX pG1P q “ LX pGP q X b. Hence G1P satisfies condition 1, although at the
price of coverage, since interprocedurally valid paths not in b have been filtered out.
This section shows that the underapproximation sequence tJP Kpkq u8
k“1 , defined in
Section 4, when applied to any bounded periodic programs P , always yields JP K in
finitely many steps, and moreover each iterate JP Kpkq is computable and Presburger
12
definable. Furthermore the method can be applied as it is to bounded periodic programs,
without prior knowledge of the bounded expression b Ě LQ pGP q.
The proof goes as follows. Because P is bounded periodic, Thm. 2 shows that the
semantics JP K of P coincide with its k-index semantics JP Kpkq for some k ě 1. Hence,
pk q
the result of Thm. 1 shows that for each q P nF pP q, the k-index semantics JP Kq is
given by the semantics JH Kqueryk of procedure querykQ of the program H . Then, because
Q
P is bounded periodic, we show in Thm. 3 that every procedure querykQ of program H
is flattable (Def. 3). Finally, since all transitions of H are periodic and each procedure
querykQ is flattable then JP K is computable in finite time by existing tools, such as FAST
[6] or F LATA [9, 8]. In fact, these tools are guaranteed to terminate provided that (a) the
input program is flattable; and (b) loops are labeled with periodic relations.
p δq be
Definition 3. Let P “ xP1 , . . . , Pn y be a non-recursive program and GP “ pX , Θ,
its corresponding visibly pushdown grammar. Procedure Pi is said to be flattable if
p such that JP KP “
and only if there exists a bounded and regular language R over Θ,
i
Ť
αPLP pGP qXR JαK.
i
Notice that a flattable program is not necessarily bounded (Def. 2), but its semantics can
be computed by looking only at a bounded subset of interprocedurally valid sequence
of statements.
Theorem 3. Let P “ xP1 , . . . , Pn y be a bounded periodic program, and let q P nF pP q.
Then, for any k ě 1, procedure querykQ of program H is flattable.
pkq
The proof of Thm. 3 roughly goes as follows: recall that we have JPKq “ JPKq for
pk q
each q P nF pP q and so it is sufficient to consider the set LQ pGP q of interprocedurally
valid paths. We further show (Thm. 4) that a strict subset of the k-index derivations of
pk q
GP is sufficient to capture LQ pGP q. Moreover this subset of derivations is characterizable by a bounded expression bΓ over the productions of GP . Then we use bΓ to give
a subset f pbΓ q of the interprocedurally valid path of procedure querykQ of H that is
sufficient to capture JH Kqueryk . Finally, using existing results, we show (Thm. 5) that
Q
f pbΓ q is a bounded and regular set. Hence we conclude that each querykQ is flattable. A
full proof of Thm. 3 is given in appendix A.6.
Control Sets Given a grammar G “ pX , Σ, δq, we call any subset of δ˚ a control set.
γ
Let Γ be a control set, we denote by LX pΓ, Gq “ tw P Σ˚ | Dγ P Γ : X ùñ wu, the set of
words resulting from derivations with control word in Γ.
Depth-first Derivations are defined as expected:
Definition 4 ([22]). Let D ” X “ w0 ùñ˚ wn “ w be a derivation. Let k ą 0, xi P Σ˚ ,
Ai P X such that wm “ x0 A1 x1 ¨ ¨ ¨ Ak xk ; and for each i, 1 ď i ď k, let fm piq denote the
index of the first word in D in which the particular occurrence of variable Ai appears.
Let A j be the nonterminal replaced in step wm ùñ wm`1 of D. Then D is said to be
depth-first if and only if for all m, 0 ď m ă n we have fm piq ď fm p jq, for all 1 ď i ď k.
13
pk q
We define the set DFX pGq (DFX pGq) of words produced using only depth-first derivapk q
tions (of index at most k) in G starting from X. Clearly, DFX pGq Ď LX pGq and DFX pGq Ď
pkq
pk q
LX pGq for all k ě 1. We further define the set DFX pΓ, Gq (DFX pΓ, Gq) of words produced using depth-first derivations (of index at most k) with control words from Γ.
pk q
The following theorem shows that LQ pGP q is captured by a subset of depth-first
derivations whose control words belong to some bounded expression.
Theorem 4. Let G “ pX , Θ, δq be a visibly pushdown grammar, X0 P X be a nonterminal such that LX0 pGq is bounded. Then for each k ě 1 there exists a bounded expression
pk q
pkq
bΓ over δ such that DFX0 pbΓ , Gq “ LX0 pGq.
Finally, to conclude that querykQ is flattable, we map the k-index depth-first derivations of G into the interprocedurally valid paths of querykQ . Then, applying Thm. 5 on
that mapping, we conclude the existence of a bounded and regular set of interprocedurally valid paths of querykQ sufficient to capture its semantics.
Theorem 5. Given two alphabets Σ and ∆, let f be a function from Σ˚ into ∆˚ such
that (i) if u is a prefix of v then f puq is a prefix of f pvq; (ii) there exists an integer M
such that | f pwaq| ´ | f pwq| ď M for all w P Σ˚ and a P Σ; (iii) f pεq “ ε; (iv) f ´1 pRq
is regular for all regular languages R. Then f preserves regular sets. Furthermore, for
each bounded expression b we have that f pbq is bounded.
6
Experiments
We have implemented the proposed method in the F LATA
Program Time [s] k
verifier [18] and experimented with several benchmarks.
timesTwo 0.7 2
First, we have considered several programs, taken from
leq
0.7 2
[1], that perform arithmetic and logical operations in a
parity
0.8 2
recursive way such as plus (addition), timesTwo (mulplus
3.4 2
tiplication by two), leq (comparison), and parity (parity
Fa“2
3.7 3
checking). It is worth noting that these programs have fiFa“8
45.1 4
nite index and stabilization of the underapproximation seGb“12
5.7 3
quence is thus guaranteed. Our technique computes sumG
19.1
3
b“13
maries by verifying that JP Kp2q “ JP Kp3q for all these
Gb“14
24.2 3
benchmarks, see Table 1 (the platform used for experiTM
R
Table 1. Experiments.
ments is Intel
Core 2 Duo CPU P8700, 2.53GHz with
4GB of RAM).
"
"
x ´ 10
if x ě 101
x ´ 10
if x ě 101
Fa pxq “
Gb pxq “
pFa qa px ` 10 ¨ a ´ 9q if x ď 100
GpGpx ` bqq if x ď 100
Next, we have considered the generalized McCarthy 91 function [10], a well-known
verification benchmark that has long been a challenge. We have automatically computed precise summaries of its generalizations Fa and Gb above for a “ 2, . . . , 8 and
b “ 12, . . . , 14. The indices of the recursive programs implementing the Fa , Gb functions are not bounded, however the sequence reached the fixpoint after at most 4 steps.
14
References
1. Termination Competition 2011. http://termcomp.uibk.ac.at/termcomp/home.seam
2. Albarghouthi, A., Gurfinkel, A., Chechik, M.: Whale: An interpolation-based algorithm for
inter-procedural verification. In: VMCAI ’12. LNCS, vol. 7148, pp. 39–55. Springer (2012)
3. Alur, R., Madhusudan, P.: Adding nesting structure to words. JACM 56(3), 16 (2009)
4. Atig, M.F., Ganty, P.: Approximating petri net reachability along context-free traces. In:
FSTTCS ’11. LIPIcs, vol. 13, pp. 152–163. Schloss Dagstuhl (2011)
5. B. Cook, A.P., Rybalchenko, A.: Summarization for termination: no return! Formal Methods
in System Design 35, 369–387 (2009)
6. Bardin, S., Finkel, A., Leroux, J., Petrucci, L.: FAST: Fast acceleration of symbolic transition
systems. In: CAV ’03. LNCS, vol. 2725, pp. 118–121. Springer (2003)
7. Boigelot, B.: Symbolic Methods for Exploring Infinite State Spaces. Ph.D. thesis, University
of Liège (1998)
8. Bozga, M., Iosif, R., Konečný, F.: Fast acceleration of ultimately periodic relations. In:
CAV ’10. LNCS, vol. 6174, pp. 227–242. Springer (2010)
9. Bozga, M., Iosif, R., Lakhnech, Y.: Flat parametric counter automata. Fundamenta Informaticae 91(2), 275–303 (2009)
10. Cowles, J.: Computer-aided reasoning. chap. Knuth’s generalization of McCarthy’s 91 function, pp. 283–299 (2000)
11. Esparza, J., Kiefer, S., Luttenberger, M.: Newtonian program analysis. JACM 57(6), 33:1–
33:47 (2010)
12. Finkel, A., Leroux, J.: How to compose presburger-accelerations: Applications to broadcast
protocols. In: FSTTCS ’02. LNCS, vol. 2556, pp. 145–156. Springer (2002)
13. Ganty, P., Majumdar, R., Monmege, B.: Bounded underapproximations. Formal Methods in
System Design 40(2), 206–231 (2012)
14. Ginsburg, S.: The Mathematical Theory of Context-Free Languages. McGraw-Hill, Inc.,
New York, NY, USA (1966)
15. Godefroid, P., Nori, A.V., Rajamani, S.K., Tetali, S.: Compositional may-must program analysis: unleashing the power of alternation. In: POPL ’10. pp. 43–56. ACM (2010)
16. Godoy, G., Tiwari, A.: Invariant checking for programs with procedure calls. In: SAS ’09.
LNCS, vol. 5673, pp. 326–342. Springer (2009)
17. Gruska, J.: A few remarks on the index of context-free grammars and languages. Information
and Control 19(3), 216–223 (1971)
18. Hojjat, H., Konecný, F., Garnier, F., Iosif, R., Kuncak, V., Rümmer, P.: A verification toolkit
for numerical transition systems - tool paper. In: FM. pp. 247–251 (2012)
19. Lalire,
G.,
Argoud,
M.,
Jeannet,
B.:
Interproc.
http://popart.inrialpes.fr/people/bjeannet/bjeannet-forge/interproc/index.html
20. Latteux, M.: Mots infinis et langages commutatifs. Informatique Théorique et Applications
12(3) (1978)
21. Luker, M.: A family of languages having only finite-index grammars. Information and Control 39(1), 14–18 (1978)
22. Luker, M.: Control sets on grammars using depth-first derivations. Mathematical Systems
Theory 13, 349–359 (1980)
23. Reps, T., Horwitz, S., Sagiv, M.: Precise interprocedural dataflow analysis via graph reachability. In: POPL ’95. pp. 49–61. ACM (1995)
24. Sharir, M., Pnueli, A.: Two approaches to interprocedural data flow analysis. In: Program
Flow Analysis: Theory and Applications, chap. 7, pp. 189–233. Prentice-Hall, Inc. (1981)
15
A
A.1
Missing material
Proof of Lemma 1
pk q
Proof: The proof of Properties 1 and 2 follow immediately from the definition of ùñ.
Let us now turn to the proof of Property 3 (only if). First we define w1 and w2 . Take the
pk q
pk q
derivation BC “ùñ‹ w and look at the last step. It must be of the form xY z ùñ xyz “ w
and one of the following must hold: either Y has been generated from B or from C. Suppose that Y stems from C (the other case is treated similarly). In this case, transitively
remove from the derivation all the steps transforming the rightmost occurrence of C.
pk q
Hence we obtain a derivation BC “ùñ‹ w1C. Then w2 is the unique word satisfying
pk q
w “ w1 w2 . Since BC “ùñ‹ w1C, we find by removing the occurrence of C in rightmost
pk´1q
position at every step that B “ùñ‹ w1 and we are done. Having Y stemming from B
pk´1q
yields C “ùñ‹ w2 . For the proof of the other direction (if) assuming (i) (the other case
pkq
pkq
is similar), it is easily seen that BC “ùñ‹ w1C “ùñ‹ w1 w2 .
A.2
\
[
Proof of Lemma 2 only if direction
Proof: Recall that in this proof we use the tail-recursive version of Algorithm 1 which
is obtained by replacing every two statements of the form PC Ð X ; goto start ; by
querykX pxI , xO q ; return ; (as it appears in the comments of Alg. 1).
“ñ” Let I ¨ O P Zxˆx such that the call to querykQ pI, Oq returns, that is, with parameters I
and O procedure querykQ has an execution that terminates with an empty call stack. We
pkq
show that xIÓxi , OÓxi y P JP Kq by induction on the number of times ` ě 1 a procedure
of H is invoked. If ` “ 1 then the only invocation is querykQ pI, Oq. So it is necessarily
the case that, at the non-deterministic jump labelled start, the destination has the form
pai for 1 ď i ď na . Further, label pai corresponds to a production of the form Q Ñ τ of
pk q
δ, hence we find that τ P LQ pGP q since k ě 1. Next, because the assume statement
succeeds, we find that xI Óxi , OÓxi y P ρτ , hence that xI Óxi , OÓxi y P JτK, next that xI Óxi
Ť
pk q
, OÓxi y P αPLpkq pG q JαK, and finally that xIÓxi , OÓxi y P JP Kq by definition of k-index
Q
P
semantics and we are done.
If ` ą 1, there are two possibilities for the first call to a procedure of H following
the call querykQ pI, Oq.
– We are in the case tailppbi q “ τ Q1 for some 1 ď i ď nb and so querykQ pI, Oq executes
havocpxJ q, assume ρτ pxI , xJ q, xI Ð xJ , followed by querykQ1 pxI , xO q then return.
Lets us denote by I and J the content of xI before and after the assignment. By
pkq
induction hypothesis, we find that xJ Óxi , OÓxi y P JP Kq1 , hence that there exists
pkq
α P LQ1 pGP q such that xJÓxi , OÓxi y P JαK. Next pbi corresponds to a production of
pkq
the form Q Ñ τ Q1 of δ, hence we find that τα P LQ pGP q since k ě 1. Then, since
16
xIÓxi , OÓxi y “ xIÓxi , JÓxi y ˝ xJÓxi , OÓxi y P ρτ ˝ JαK, we find
Ť that xIÓxi , OÓxi y P JταK
by definition of the semantics, hence that xIÓxi , OÓxi y P αPLpkq pG q JαK and finally
Q
P
pk q
that xIÓxi , OÓxi y P JP Kq by definition and we are done.
1
– We are in the case tailppci q “ xτQinit
j τyQ for some 1 ď i ď nc . We further assume that the rod branch is executed (the ord being treated similarly). Therefore
querykQ pI, Oq executes havocpxJ , xK , xL q, assume ρxτ pxI , xJ q, assume ρτy pxK , xL q,
assume φτ pxI , xL q, xI Ð xJ , xO Ð xK , followed by the calls querykQ´1 1 pxL , xO q, querykQinit pxI , xO q
and then return. Call I, J, K P Zx the values picked by havoc.
pk´1q
Following the induction hypothesis, we find that xLÓxi , OÓxi y P JP Kq1
and xJÓx j
pk´1q
pkq
, KÓx j y P JP Kqinit . This implies that there exists α P LQ1
j
pk q
pGP q and α1 P LQinit pGP q
j
such that xLÓxi , OÓxi y P JαK and xJÓx j , KÓx j y P Jα1 K. Moreover, the definition of pci
pk q
and Lem. 1 (prop. 3) shows that xτα1 τyα P LQ pGP q
Next, xIÓxi , JÓx j y P ρxτ , xJÓx j , KÓx j y P Jα1 K, xKÓx j , LÓxi y P ρτy and xIÓxi , LÓxi y P φτ
shows that xI Óxi , LÓxi y P Rτ as given in the semantics. AgainŤby the semantics,
we find that xIÓxi , OÓxi y P Jxτα1 τyαK, hence that xIÓxi , OÓxi y P γPLpkq pG q JγK, and
Q
P
pkq
finally that xIÓxi , OÓxi y P JP Kq by definition of k-index semantics and we are done.
\
[
A.3
Examples of Periodic Relations
An octagonal relation is defined by a finite conjunction of constraints of the form
˘x ˘ y ď c, where x and y range over the set x Y x1 , and c is an integer constant. The
transitive closure of any octagonal relation has been shown to be Presburger definable
and effectively computable [8].
A linear affine relation is defined by a formula R px, x1 q ” Cx ě d ^ x1 “ Ax ` b,
where A P Znˆn , C P Z pˆn are matrices and b P Zn , d P Z p . R is said to have the finite
monoid property if and only if the set tAi | i ě 0u is finite. It is known that the finite
monoid condition is decidable [7], and moreover that the transitive closure of a finite
monoid affine relation is Presburger definable and effectively computable [12, 7].
A.4
Proof of Theorem 5
Definition 5 ([14]). A generalized sequential machine, abbreviated gsm, is a 6-tuple
S “ pK, Σ, ∆, δ, λ, q1 q where
– K is a finite nonempty set (of states).
– Σ is an alphabet (of inputs).
– ∆ is an alphabet (of outputs).
– δ (the next-state function) is a mapping of K ˆ Σ into K.
– λ (the output function) is a mapping of K ˆ Σ into ∆˚ .
– q1 is a distinguished element of K (the start state).
The functions δ and λ are extended by induction to K ˆ Σ˚ by defining for every state
q, every word x P Σ˚ , and every y in Σ
17
j
– δpq, εq “ q and λpq, εq “ ε.
– δpq, xyq “ δrδpq, xq, ys and λpq, xyq “ λpq, xqλrδpq, xq, ys.
It is readily seen that the second item holds for all words x and y in Σ˚ .
Definition 6 ([14]). Let S “ pK, Σ, ∆, δ, λ, q1 q be a gsm. The operation defined by Spxq “
λpq1 , xq for each x P Σ˚ is called a gsm mapping.
Theorem 6 (Theorem 3.4.1 (if direction), [14]). Let f be a function from Σ˚ into ∆˚
such that (i) f preserves prefixes, that is if u is a prefix of v then f puq is a prefix of
f pvq; (ii) f has bounded outputs, that is, there exists an integer M such that | f pwaq| ´
| f pwq| ď M for all w P Σ˚ and a P Σ; (iii) f pεq “ ε; (iv) f ´1 pRq is regular for all regular
languages R. Then f is a gsm mapping.
Theorem 7 (Theorem 3.3.2, [14]). Each gsm mapping preserves regular sets.
Lemma 3 (Lemma 5.5.3, [14]). Spw˚1 . . . w˚n q is bounded for each gsm S and all words
w1 , . . . , wn .
Finally, Theorem 5 is an easy consequence of the above facts.
A.5
Proof of Theorem 4
The proof is long but technically not difficult. First, we need to introduce some new
material. The Szilard language of a grammar G “ pX , Σ, δq and denoted SzX pGq Ď δ˚
is the set of control words used in the derivations of G starting with X P X . We denote
df
by SzX pGq the set of control words used in the depth first derivations of G starting with
df
X. Moreover let SzX pG, kq denote the set of control words used in depth first k-index
derivations of G starting with X. Next, we recall a couple of know results [22, 20].
pk q
pk q
df
Lemma 4 ([22]). For all k ě 1, we have DFX pGq “ LX pGq and SzX pG, kq is regular.
Given an alphabet Σ “ tu1 , . . . , uk u, let Pkpui q “ ei be the k-dimensional vector having
else. We define Pkpεq “ 0, Pkpui1 . . . uin q “
řn 1 on the i-th position and 0 everywhere
˚
Pkpu
q
for
any
word
u
.
.
.
u
P
Σ
and
PkpLq “ tPkpwq | w P Lu for any language
i
i
i
n
j
j “1
1
L Ď Σ˚ . The following result was proved in [13]:
Theorem 8 (Thm. 1 from [13], also in [20]). For every regular language L there exists
a bounded expression bΓ such that PkpL X bΓ q “ PkpLq.
Next we prove a result characterizing a subset of derivations sufficient to capture a
bounded context-free language.
Lemma 5. Let G “ pX , Σ, δq be a grammar and X P X be a nonterminal, such that
LX pGq Ď a˚1 . . . a˚d where a1 , . . . , ad are distinct symbols of Σ. Then, for each k ě 1 there
pkq
pkq
exists a bounded expression bΓ over δ such that DFX pbΓ , Gq “ LX pGq.
18
Proof: We first establish the claim that for each k ě 1, there exists a bounded expression
df
df
df
bΓ over δ such that PkpSzX pG, kq X bΓ q “ PkpSzX pG, kqq. By Lemma 4, SzX pG, kq is a
regular language, and by Theorem 8, there exists a bounded expression bΓ over δ such
df
df
that PkpSzX pG, kq X bΓ q “ PkpSzX pG, kqq which proves the claim. Next we prove that
pkq
pkq
DFX pbΓ , Gq “ LX pGq.
Let δ “ xXi Ñ vi ym
i“1 be the sequence of productions of G, taken in some fixed order.
For each right-hand side vi of a production in δ, let pkpvi q P Zd be the Parikh image of
the subword of obtained by taking the projection of vi on a1 , . . . , ad . Let Π “ rpkpvi qsm
i“1
γ
be the m ˆ d matrix whose rows are the pkpvi q vectors. Let X ùñ w. Then we have
Pkpwq “ Pkpγq ˆ Π, and consequently, Pkpγ1 q “ Pkpγ2 q implies that Pkpw1 q “ Pkpw2 q
γi
for any two derivations X ùñ wi of G, i “ 1, 2. Moreover, the assumption LX pGq Ď
a˚1 . . . a˚d where a1 , . . . , ad are distinct symbols shows that we further have w1 “ w2 .
pk q
pkq
We prove that LX pGq Ď DFX pbΓ , Gq, the other direction being immediate. By Lemma 4,
pk q
pkq
γ
pkq
we have LX pGq “ DFX pGq. Let w P DFX pGq be a word, and X ùñ w be a depthdf
df
first derivation of w. Since PkpSzX pG, kq X bΓ q “ PkpSzX pG, kqq, there exists a control
β
df
word β P SzX pG, kq X bΓ such that Pkpβq “ Pkpγq, hence X ùñ w1 and w1 “ w as shown
above.
\
[
For the rest of this section, let G “ pX , Θ, δq be a visibly pushdown grammar (we ignore for the time being the distinction between tagged and untagged alphabet symbols),
and X0 P X be an arbitrarily chosen nonterminal, and let b “ w˚1 . . . w˚d be a bounded
piq
piq
expression, where wi “ b1 . . . b ji P Θ˚ , for every 1 ď i ď d. Let Gb “ pX b , Θ, δb q be
!
)
psq
the regular grammar, where X b “ qr | 1 ď s ď d ^ 1 ď r ď js and:
!
)
p sq
psq psq
δb “ qi Ñ bi qi`1 | 1 ď s ď d ^ 1 ď i ă js
!
)
p sq
psq ps1 q
Y q js Ñ b js q1 | 1 ď s ď s1 ď d
!
)
p sq
Y q1 Ñ ε | 1 ď s ď n .
(1)
(2)
Ť
It is routine to check that ds“1 Lqpsq pGb q “ w˚1 . . . w˚d . Next, we define G’ “ pX ’ , Θ, δ’ q:
1
)
( ! p sq p x q
psq
pxq
’
’
– X “ X0 Y rqr Xqy s | X P X ^ qr P X b ^ qy P X b ^ s ď x
psq
pxq
– δ’ contains, for every 1 ď s ď x ď n, a production X0’ Ñ rq1 X0 q1 s, and:
‚ for every production X Ñ τ P δ
if qprsq Ñ τ qpyxq P δb
rqprsq Xqpyxq s Ñ τ P δ’
(3)
‚ for every production X Ñ τ Y P δ
pzq
rqprsq Xqpyxq s Ñ τ rqt Y qpyxq s P δ’
19
pzq
if qprsq Ñ τ qt P δb
(4)
‚ for every production X Ñ τ Z σ Y P δ
p`q
pzq
rqprsq Xqpyxq s Ñ τ rqt Y qpuvq s σ rqk Zqpyxq s
p`q
pzq
if qprsq Ñ τ qt P δb and qpuvq Ñ σ qk P δb
(5)
px q
p sq
The set δ’ contains no other productions. For each nonterminal rqr Xqy s P X ’ , we
px q
psq
define ξprqr Xqy sq “ X. Further ξpX0’ q “ X0 . This notation is extended to productions
’
from δ , hence sequences of productions in the obvious way. Further we define ξpΓq “
tξpdq | d P Γu where Γ is a control set over δ’ . Finally, for a derivation D’ ” X0’ ñ
psq
pxq
rq1 X0 q1 s ùñ˚ w in G’ , let ξpD’ q ” X0 ùñ˚ w be the derivation of G obtained by
applying ξ to each production p in D’ .
Lemma 6. Let G “ pX , Θ, δq be a visibly pushdown grammar, X0 P X be a nonterminal
such that LX0 pGq Ď b for a bounded expression b “ w˚1 . . . w˚d . Then for every k ě 1, the
following hold:
pk q
pk q
1. LX0 pGq “ L ’ pG’ q
X0
pkq
pΓ, G’ q “
X0’
pk q
pk q
DFX0 pΓ1 , Gq “ LX0 pGq.
2. Given a control set Γ over δ’ such that DF
then the control set Γ1 “ ξpΓq over δ satisfies
L
pk q
pG’ q
X0’
Proof: (sketch) The proof of point 1 is by induction. So we actually show the following
p sq
pkq
puq
stronger statement. Let k ě 1 and let w P Σ˚ . We show that rqr Xqv s “ùñ‹ w iff
G’
psq
puq
qr ñ˚ wqv
pkq
and X “ùñ‹ w. The proof for the if direction is by induction on the
G
pk q
‹
length of X “ùñ w.
G
pkq
psq
“i “ 1” Then X “ùñ‹ w for some production X Ñ τ of δ with w “ τ. Also qr Ñ
G
puq
psq
puq
τ qv in δb and so by definition of G’ we have rqr Xqv s Ñ τ in δ’ and we are done.
pkq
“i ą 1” We do a case analysis according to the tail of the first production in X “ùñ‹
G
w.
pkq
pkq
G
puq
w qv
G
p sq
– X ùñ τ X 1 “ùñ‹ τ w1 “ w which implies that X Ñ τ X 1 is in δ. Further, qr ñ˚
p s1 q
p sq
puq
ps1 q
psq
shows that there exists qr ñ τ qr1 ñ˚ τw1 qv , hence that qr Ñ τ qr1
psq
p s1 q
puq
puq
is in δb , and finally find that rqr Xqv s Ñ τ rqr1 X 1 qv s belongs to δ’ . Also we
pkq
ps1 q
conclude form the hypothesis that X 1 “ùñ‹ w1 and qr1
G
p s1 q
puq
puq
ñ˚ w1 qv
and so, by
pk q
induction hypothesis, we find that rqr1 X 1 qv s “ùñ‹ w1 and we are done.
G’
pk q
pk q
– X ùñ τ X1 σ X2 “ùñ‹ τ w1 σ w2 “ w and so there exists X Ñ τX1 σX2 in δ. MoreG
G
prq
puq
over, since qs ñ˚ w qv
prq
pbq
pb1 q
we find that there exist qs ñ τ qa ñ˚ τ w1 qa1
20
ñ
prq
puq
pd q
puq
τ w1 σ qc ñ˚ τ w1 σ w2 qv . Hence, the definition of G’ shows that rqs Xqv s Ñ
pb1 q
pbq
pkq
puq
pd q
τrqa X1 qa1 sσrqc X2 qv s. On the other hand, since X1 X2 “ùñ‹ w1 w2 (simply
G
pk´1q
pkq
‹
delete τ and σ), Lemma 1 shows that either X1 “ùñ w1 and X2 “ùñ‹ w2 ; or
G
G
pk´1q
pkq
‹
‹
X1 “ùñ w1 and X2 “ùñ w2 . Let us assume the latter holds (the other being
G
G
pbq
pb1 q
pkq
treated similarly). Applying the induction hypothesis, we find that rqa X1 qa1 s “ùñ‹
G’
pd q
puq
pk´1q
w1 and rqc X2 qv s “ùñ‹ w2 , hence we conclude the case with the k-index derivaG’
prq
puq
pk q
pbq
pb1 q
pd q
puq
pkq
pkq
pb1 q
pbq
tion rqs Xqv s “ùñ‹ τrqa X1 qa1 sσrqc X2 qv s “ùñ‹ τrqa X1 qa1 sσ w2 “ùñ‹
G
G’
G’
τ w1 σ w2 .
pkq
To conclude the “if” case, observe that LX0 pGq Ď w˚1 ¨ ¨ ¨ w˚d implies that for every
pk q
pkq
psq
ps1 q
w P LX0 pGq we have X0 “ùñ‹ w and also q1 ñ˚ w q1 , hence that w P L
G
p sq
puq
pk q
pGq.
X0’
pk q
Using a similar induction on the length of derivation rqr Xqv s “ùñ‹ w, the “only
G’
if” direction is easily proved.
For the proof of point 2. the “Ď” is obvious by definition of depth-first derivation.
pkq
pkq
For the reverse direction “Ě” point 1 shows that LX0 pGq “ L ’ pG’ q, hence using the
X0
assumption we find that DF
pk q
X0’
pkq
pkq
pΓ, G’ q “ LX0 pGq. So let D ” X0’ “ùñ‹ w be a depthG’
first k-index derivation of G’ with control word conforming to Γ. Now consider ξpDq,
it defines again a depth-first k-index derivation. Further, the definition of ξ shows that
the word generated by ξpDq is w.
\
[
Let A “ ta1 , . . . , ad u be an alphabet disjoint from Θ, and a language homomorphism
h : A Ñ Θ˚ , defined as hpai q “ wi , for all 1 ď i ď d. We now obtain from G’ a grammar
Ga , over A , such that LX0 pG’ q “ hpLX0 pGa qq. Define Ga “ pX ’ , A , δa q, as the result of
applying onto G’ the following transformation on every p P δ’ : if p was defined using
p sq
pyq
a production qr Ñ γ qx P δ p where r “ js then replace the corresponding occurrence
of γ in p by as , else (r ‰ js ) replace the corresponding occurrence of γ by ε. In this way
we can map the productions of G’ onto productions of Ga . This mapping is extended to
the derivations of G’ . The information kept within the nonterminal of X ’ is sufficient
to also define the reverse mapping, from the productions (derivations) of Ga back to
the productions (derivations) of G’ . We define the mapping ν : δa Ñ δ’ as follows, for
a, b P A Y tεu:
` psq pxq
˘
p sq
px q
psq
– ν rqr Xqy s Ñ a “ rqr Xqy s Ñ br
˘
` psq pxq
pxq
p sq
pzq
pxq
p zq
pxq
psq
– ν rqr Xqy s Ñ a rqt Y qy s “ rqr Xqy s Ñ br rqt Y qy s
˘
` psq pxq
psq
pzq
pvq
pvq
p`q
px q
p`q
px q
p sq
px q
pzq
pvq
– ν rqr Xqy s Ñ a rqt Y qu s b rqk Zqy s “ rqr Xqy s Ñ br rqt Y qu s bu rqk Zqy s
Lemma 7. Let G “ pX , Θ, δq be a visibly pushdown grammar, X0 P X be a nonterminal,
and w˚1 . . . w˚d be a bounded expression over Θ. Also let A “ ta1 , . . . , ad u be an alphabet
21
disjoint from Θ, and h : A Ñ Θ˚ be the homomorphism defined as hpai q “ wi , for all
1 ď i ď d. Then for every k ě 1, the following hold:
pk q
pkq
1. L ’ pGa q “ h´1 pL ’ pG’ qq X a˚1 ¨ ¨ ¨ a˚d
X0
X0
pkq a
pkq
pΓ , Ga q “ L ’ pGa q,
X0’
X0
pk q
pk q
DF ’ pΓ1 , G’ q “ L ’ pG’ q
X0
X0
2. given a control set Γa over δa such that DF
set Γ1 “ νpΓa q over δ’ satisfies
then the control
Proof: (sketch) The proof of Point 1 is by induction showing the following stronger
psq
puq
pkq
psq
puq
pkq
‹ ´1
h pwqX
statement: let w P Θ˚ then we have rqr Xqv s “ùñ‹ w iff rqr Xqv s “ùñ
a
G’
G
a˚1 ¨ ¨ ¨ a˚d . The proof is done by induction on the length of the derivations similarly to
pkq
pk q
i
i
Lemma 6. It follows that L ’ pGa q “ tai11 ¨ ¨ ¨ add | wi11 ¨ ¨ ¨ wdd P L ’ pG’ qu, hence that
X0
hpL
pkq
X0’
pGa qq
“L
pk q
X0’
X0
’
pG q by definition of h and since L
pkq
X0’
pG q Ď w˚1 ¨ ¨ ¨ w˚d . For point
’
pkq a
pkq
pΓ , Ga qq “ hpL ’ pGa qq,
X0’
X0
2, applying h on both side of the assumption to obtain hpDF
hence
pkq
pk q
hpDF ’ pΓa , Ga qq “ L ’ pG’ q by point 1. To conclude the proof, it is sufficient to show
X0
that
X0
pk q
a
a
hpDF ’ pΓ , G qq “ DF ’ pνpΓa q, G’ q.
X0
X0
pk q
Again an induction proof is called for.
\
[
Before proving Theorem 4 we recall the following result about homomorphisms
and bounded languages. Let g : Σ Ñ A ˚ be a homomorphism that maps each symbol of
Σ into a word over A , and L Ď w˚1 . . . w˚d where w˚1 ¨ ¨ ¨ w˚d is a bounded expression. Then
gpLq is also bounded.9
Finally the actual proof the Theorem 4 goes as follows.
Proof: Let A “ ta1 , . . . , ad u be an alphabet disjoint from Θ, and let h : A Ñ Θ˚ be the
language homomorphism defined by hpai q “ wi , for all 1 ď i ď d. By applying Lemma 6
pk q
pk q
(first point), and then Lemma 7 (first point) we find that LX0 pGq “ hpL ’ pGa qq. Next,
X0
applying Lemma 5 on L
pkq
a
a
’ pG q we obtain a bounded expression bΓa over δ such
X0
pkq
pkq
X0
X0’
that DF
a
’ pbΓa , G q “ L
pGa q. Our next step is to apply the results of Lemma 7
pk q
(second point), and Lemma 6 (second point) in that order to obtain that LX0 pGq “
pk q
DFX0 pξpνpbΓa qq, Gq. Finally, since bΓa is a bounded expression, and ξ and ν are homomorphisms (and so is the composition ξ ˝ ν) we have that ξpνpbΓa qq is bounded, hence
included in a bounded expression and we are done.
\
[
A.6
Proof of Theorem 3
Lemma 8. Let G “ pX , Θ, δq be a visibly pushdown grammar such that for all productions p P δ all nonterminals occurring in tailppq are distinct. Let X P X and γ P δ˚ , then
there exists at most one depth-first derivation of G with control word γ, hence at most
one word resulting from it.
9
Alternatively, it can also be shown using Theorem 5.
22
Proof: By contradiction, suppose there exist two depth-first derivations from X with
p1
control word p1 . . . pn . This means that there exists a i, 1 ď i ď n such that X “ w0 ùñ
pi
w1 ¨ ¨ ¨ wi´1 ùñ wi and wi contains two occurrences of the nonterminal headppi q, that is
wi “ αA1 βA2 γ where A1 “ A2 “ headppi q and α, β, γ P pΣ Y Θ˚ q. Two cases arises:
1. A1 and A2 result from the occurrence of some p j with j ă i which contradicts that
all nonterminals occurring in tailpp j q are distinct.
2. A1 and A2 result from the occurrence of pk and pl with k ‰ l respectively. Following
the definition of depth-first derivation pi must be applied to A1 if k ą l; and to A2
if k ă l. In either case pi can be applied to only one of the two occurrences which
contradicts the existence of two depth-first derivations.
\
[
Note that because the grammars of this paper stems from programs we can then
assume without loss of generality that the condition on tailppq for every production p
holds for every grammar in this paper.
Finally the proof the Thm. 3 goes as follows:
Proof: (sketch) Since P is bounded periodic we can apply Theorem 4 showing there
pk q
pk q
exists a bounded expression bΓ over δ such that DFQ pbΓ , GP q “ LQ pGP q. Hence we
Ť
Ť
pk q
find that JP Kq “ αPLpkq pG q JαK “ αPDF pkq pb ,G q JαK.
Q
P
Q
P
Γ
pkq
Let α P DFQ pbΓ , GP q and let γ be the control word of the derivation Dγ thereof
which is unique by Lemma 8. We then prove that Dγ corresponds to a unique interprocedurally valid path β of querykQ , that is β P Lqueryk pGH q. This is however easily
Q
seen looking at the code of querykQ whose control flow follows precisely a depth-first
k-index derivation. Because γ the control word over δ determines uniquely Dγ hence β
p a unique
we conclude that there exists a function f that associates each word over Θ
p (call ∆
p the alphabet of G ). Moreover define f pεq “ ε. Basically, f maps each
word ∆
H
production p of Dγ to a labelled statement p in H . Moreover between two consecutive
labelled statements p and p1 f is stuffing a sequence of statements of H which is unique
for the reason that Dγ is unique.
p using Thm. 5. To this end,
Next, we show that f pbΓ q is a bounded regular set over ∆
we need to show that f satisfies the properties (i) to (iv) in Thm. 5. Following the previous explanations on f that is stuffing sequences of statements between consecutive
productions it is seen that (i) holds, also (ii) holds because the number of statements
added between any two consecutive productions is bounded, (iii) holds by definition
and finally (iv) holds because f ´1 consists in deleting statements not referring to productions which clearly preserves regularity.
We then conclude from Thm. 5, that f pbΓ q is a bounded and regular language. Back
to JH Kqueryk , we find that
Q
JH Kqueryk “
Q
Ť
γPL
querykQ
JγK “
Ť
γPL
X f p bΓ q
querykQ
JγK
and that JH Kqueryk is flattable since f pbΓ q is a bounded regular set.
Q
23
\
[