Converting Nested Algebra
Flat Algebra Expressions
Expressions
into
JAN PAREDAENS
University
of Antwerp
and
DIRK VAN GUCHT
Indiana
University
Nested relations generalize ordinary flat relations by allowing tuple values to be either atomic
or set valued. The nested algebra is a generalization
of the flat relational
algebra to manipulate
nested relations. In this paper we etudy the expressive power of the nested algebra relative to its
operation on flat relational
databases. We show that the flat relational
algebra is rich enough to
extract the same “flat information”
from a flat database as the nested algebra does. Theoretically, this result implies that recursive queries such as the transitive
closure of a binary relation
cannot be expressed in the neeted algebra. Practically,
this result is relevant to (flat) relational
query optimization.
Categories and Subject Descriptors:
guages; H.3.3 [Information
Storage
H.2.3 [Database
and Retrieval]:
Management]:
Languages—query
lanInformation
Search and Retrieval—query
formulation
General
Terms: Algorithms,
Languages,
Theory
Additional
Key Words and Phrases: Algebraic
calculus, nested relations, relational
databases
1.
transformation,
nested
algebra,
nested
INTRODUCTION
In 1977 Makinouchi
by
query
removing
the
[29] proposed
first
normal
generalizing
form
introduced
a generalization
of the
relations
with set-valued
attributes
assumption.
the relational
Jaeschke
ordinary
relational
and by adding two
database
and
model
Schek
model
by
restructuring
[24]
allowing
opera-
tors, the nest and the unnest
operators,
to manipulate
such homogeneously
nested relations
of powerset
type. A similar
generalization
was proposed by
Authors’ addresses: J. Paredaens, Department
of Mathematics
and Computer Science, University of Antwerp,
B-261O Antwerpen,
Belgium;
D. Van Gucht, Computer Science Department,
Indiana University,
Bloomington,
IN 47405.
Permission to copy without fee all or part of thie material is granted provided that the copies are
not made or distributed
for direct commercial advantage, the ACM copyright notice and the title
of the publication
and its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery.
To copy otherwise, or to republish, requires a fee and/or
specific permission.
An abstract of this paper appeared as “Possibilities
and Limitations
of Using Flat Operators in
of the 7th ACM
SIGACT–SIGMOD–
SIGART
in Proceedings
Nested Algebra Expressions,”
Symposium
on Principles
of Database
Systems (Austin, Tex.), 1988, pp. 29-38.
@ 1992 ACM 0362-5915/92/0300-0065
$01.50
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992, Pages 65-93.
66
J. Paredaens and D. Van Gucht
.
Ozsoyoglu,
manipulate
Ozsoyoglu,
one-level
and Mates [31].
nested relations
(In addition,
and showed
they defined a calculus to
equivalence
of their alge-
braic and calculus-like
data manipulation
languages.)
Thomas
and Fischer
[43] generalized
Jaeschke and Schek’s model and algebra
and allowed
nested
relations
of arbitrary
(but fixed) depth (see also [2, 4, 12, 18, 21, 32, 381).
Roth,
the
Korth,
nested
and Silberschatz
relational
[37] defined
model
of Thomas
SQL-like
query
languages
guages [22], and datalog-like
duced
for this
[7, 13-17,
relational
model
a calculus-like
and
query
Fischer.
Since
[28, 34, 35, 37], graphical-oriented
languages
[1, 5, 6, 9, 25, 30] have
or for slight
generalizations
of it.
Also,
for
numerous
query
lanbeen intro-
various
27, 33, 39, 411 have started with the implementation
database
model, some on top of an existing
database
system and others from scratch.
In this paper we focus on the nested algebra
as proposed
Fischer
[43]. The key problem
we address
is the following:
language
then,
groups
of the nested
management
by Thomas
and
“What
is the
expressive
power of the nested algebra
when we are only interested
in its
operation
on flat relations
and when the result is also flat?”
In other words,
“IS it possible to write
queries
in the nested algebra
with flat operands
as
input and with a flat output
that cannot be expressed
relational
algebra?”
We show that the answer to this
Hence,
from
the flat
a flat
theoretical
algebra
database
as well
is rich
as the
enough
nested
as practical
to extract
algebra
in the ordinary
(flat)
question
is negative.
the same
does. This
“flat
result
information”
has interesting
consequences:
—Recursive
queries such as the transitive
closure of a binary
relation
cannot
be expressed
in the nested algebra,
because if they could, the transitive
closure could be expressed in the flat algebra,
contradicting
and Unman
[3]. In contrast,
when the powerset
operator
a result of Aho
is added to the
nested algebra,
then it is possible, as shown by Abiteboul
and Beeri [1] and
by Gyssens and Van Gucht [19, 20], to express recursive
queries.
Hence,
the nested algebra
with
the powerset
operator
is strictly
more powerful
than the classical
nested algebra
studied
in this paper. In this light,
it is
also interesting
to mention
the results
of Hull
and Su [23] and of Kuper
and Vardi [26] regarding
the expressiveness
of the powerset
algebra.
They
show that
there
exists
set algebra
expressions.
a noncollapsing
In particular,
hierarchy
of classes of powerthey
show
that
the
class of
powerset
algebra
expressions
in which one can use up to n (n = O) (nested)
applications
of the powerset
operator
is strictly
less expressive
than the
class of nested
algebra
expressions
that
can use up to n + 1 (nested)
applications
of the powerset
operator.
Our result
shows that an analogous
property
does not hold for the nested
algebra;
that
is, the number
allowed
(nested)
applications
of the nest operator
plays no role in
expressive
power of the nested algebra
expressions
(for a related result,
of
the
see
[81).
—Several
researchers
systems that
flat relational
[33, 41] have
proposed
building
database
management
support
the nested relational
database
model and provide
a
interface.
Our result
indicates
that such systems have the
ACM Transactions on Database Systems, Vol 17, No. 1, March 1992
Converting Nested Algebra Expressions
potential
to optimize
the user’s
language,
by using
it is an interesting
relational
result.
properties
research
query
query,
expressed
in a flat
.
relational
67
query
of the nested algebra
[10, 40]. Alternatively,
problem
to study the degree to which current
optimizers
can
be extended
to
take
advantage
of this
2. FORMALISM
In this
section
and
nested
2.1
Scheme
we define
relation
schemes,
relation
instances,
nested
algebra,
calculus.
of a Relation
We define
schemes to be lists of attributes.
Two kinds
of attributes
are
considered:
atomic
attributes,
represented
by their
name,
and structured
attributes,
represented
by their name followed
by their scheme.
( Attribute)
Such an attribute
is called
+ (Identifier)
atomic,
and
(Identifier)
is called
the
name
of the
attribute.
(Attribute)
Such
an attribute
is called
+ (Identifier)
structured,
and
(Scheme)
(Identifier)
is called
the
name
of
the attribute.
(List-of-attributes)
+ (Empty
-1ist)
I (Non_ empty _list-of-attributes)
(Non_ empty -list-o
f_attributes)
lists
A ~ and
+ ( Attribute)
I ( Attribute),
(Non_ empty-list-of-attributes)
Two
of attributes
Az are called
compatible
(i.e.,
the
attributes
have the same nesting
structure)
if and only if (iff) they
empty,
or Al = CYl, A’l and
Az = az, h’z with
A’l, A’z compatible
attributes
with
and
compatible
al and
az both
atomic
attributes
or both
structured
(ordered)
are both
lists
of
attributes
schemes.
(Scheme)
All of the identifiers
compatible
iff their
+ (_(Non-empty-list-o
f-attributes)_)
in a scheme have to be different.
Two schemes are called
respective
lists of attributes
are compatible.
A scheme is
called fZat iff all of its attributes
are atomic.
These are some examples
of schemes:
(A,
(A,
A database
scheme
2.2
Instances
Let
S be the
atomic
B,c)
E(D(B),
C))
is a set of schemes.
of a Relation
scheme (al,
or structured.
The
{s I s is a finite
Scheme
. . . , a.),
instances
subset
where
ai stands for an attribute,
either
of S, denoted
Inst(S),
are the set
of ualues(al)
ACM Transactions
x . . . x ualues(am)},
on Database Systems, Vol. 17, No, 1, March 1992.
68
J. Paredaens and D. Van Gucht
.
A
E(D(B)
C)
ABC’
0
000
100
101
I
110
111
1
200
201
0
1
(b)
(a)
(a) Instance
o
El
210
Fig. 1.
u
El
1
.sl of lnst( A, B, C); (b) instance
S2 of Znst( A, E(D/)(B),
C))
where
ualues( A) is the set of natural
numbers
for
and zmlues( A( h)) = Inst(( h)), where A is a nonempty
We need to make the following
remarks:
A, an atomic
attribute,
list of attributes.
—All
namely,
atomic
natural
–Two
schemes
In Figure
Inst(
2.3
attributes
have
the
same
ualues
set,
the
set of the
numbers.
are compatible
1 we show
A, lZ(D(B),
Nested
We define
iff their
an instance
sets of instances
s ~ of Inst(
are equal.
A, B, C) and
an instance
Sz of
C’)), respectively.
Algebra
a nested
algebra
for
manipulating
schemes
and
their
instances,
similar
to the one introduced
by Thomas and Fischer
[43]. It should be noted
that the empty operator
and the renaming
operator
were not considered
in
[43]. The empty
operator
is introduced
here to create
empty
structured
component
because
values;
nesting
an instance
with
of nine operators,
(1)
the
nest
an empty
operator
instance
cannot
again
create
yields
such
a single tuple equal to the empty
which are defined
as follows:
operator
–: Let
(x ~) and
intuitively
instance
set. The
Union
operator
U: Let (h ~) and (Az) be compatible
SI e Inst(( A ~)) and Sz e lnst(( Az )). Then SI U S2 is the
SI and SZ and is an instance
of the scheme (A ~).
(2) Difference
values,
an empty
and
algebra
schemes,
(standard)
( Az) be compatible
and let
union of
schemes,
SI e Inst(( A ~)) and Sz e lnst(( AZ )). Then SI – Sz is the (standard)
of SI and s~ and is an instance
of the scheme (l ~).
not
consists
and
let
difference
(3) Cross product
operator
x: Let (h ~) and ( Az) be schemes with no common
identifiers,
and let SI e lnst(( h ~)) and SZe lnst(( Az )). Then SI x SZ is the
(standard)
(A,, A2).
cross product
of SI and
SZ and
is an instance
ACM Transactions on Database Systems, Vol 17, No. 1, March 1992.
of the
scheme
70
J. Paredaens and D. Van Gucht
.
A
D(B)
000
C
El
1
Fig.
3.
(a) Instance
PE(D(B), ~)(sz); (b) in-
D
()
()
stance p~cEJp~c~c~),c)(sz))
()
ABC
—.
100
000
n
1
010
1
El
100
(b)
(a)
(8) Empty
operator
@: Let
(A) be
a scheme,
and
let
A
be
no
identifier
of A. Let s e Inst(( h)). Then DA(s) is an instance
of the scheme ( A( h))
that has only one element,
being the empty instance
of the scheme (h).
(9) Renaming
operator
p: Let
,0(s, A, B) is the (standard)
the scheme
Algebraic
way.
(A), with
the identifier
expressions
An
expression
Ffexpression)
of expressions
(h) be a scheme, and let s e Inst((h)).
Then
renaming
of A by B in s. It is an instance
of
in
of the
the
nested
nested
n,(a2=,(vc;
rrl(sl)
first,
2.4
second,
of the
Nested
are
is called
defined
a
and
second,
third
third,
in
flat-flat
B.
the
usual
expression
are some examples
SI of Figure
1,
x ~2(%)
P~,C,(~D(~)(a2=
results
algebra
algebra
by the identifier
iff its operands
and its result are flat. Here
in the nested algebra.
Reconsider
instance
~1(%)
The
A replaced
3(vC;~(v~;D(sl)
E(vB;
x
OD(sl)
examples
and
fourth
))))
D(s1))))
x ~3(%)
are
ff-expressions;
expressions
the
are
fourth
shown
in
is not.
Figure
The
4.
Calculus
We define a calculus
for manipulating
schemes and their instances,
similar
to the ones introduced
by Roth, Korth,
and Silberschatz
[371 and by Abiteboul
and Beeri [1].
A query in the nested calculus
has the form
{t[h]/f(t)}
list of
t is an identifier,
called the target variable;
X IS a nonempty
The scheme of this
attributes,
called the scheme of t;and f(t)is a formula.
query is (A). The free variables
of this query are those of f, except for t.
where
ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992
Converting Nested Algebra Expressions
A
D(B)
000
.
69
C
n
A
El
El
El
D(B)
E(C)
100
1
‘mm
mm
101
1
2ENII
200
1
201
n
2EH21
(b)
(a)
Fig. 2.
(a) Instance
VB,~(sl);
(b) instance
VC,~(u~, ~(sl)).
operator
II:
(4) Projection
(k > 1) be the indexes
Let (h) be the scheme (al, . . . . an), let il, . . . . ‘k
of different
attributes
a,,, . . . . a,, of h, and let
,,, (s) is the (standard)
projection
on the a,,
s e Inst((h)).
Then
H,,,
through
u ,k attributes
and is an instance
of the scheme (a ,1, ,aLh)
(5) Selection
different
subset
operator
Vi,, A(s)
is
the
the
scheme
and let
(standard)
of the
with
equal
the
A be no identifier
nest
of
s on
i and
j
h)). Then
(~) have
(A ~, A(XZ)).
scheme
let
s e Inst((
of elements
V: Let
list,
(h) be a scheme;
of ( h), and let
of s consisting
(6) Nest operator
not the empty
in~tance
u: Let
attributes
the
v~zi A(s)
be the
indexes
of
o, =~(s) is the largest
and j components.
i
form
of
(h 1, Az), where
h. Let
attributes
is the
s ● Inst((h)).
of
A2
set of all
X2 is
Then
is
an
elements
and
t,
t in s such that t.restricted
to h ~ is equal
for which there is an element
to Al, and tv(
A(XJ)
= { u restricted
to Xz I u 6s and u
to t restricted
to A ~}. Notice
that
in this
restricted
to h ~ is equal
to tv restricted
definition
an end
we have
sequence
SI of Figure
Figure
2.
(7)
made
the notational
of h and
1. The
not
instances
simplifying
a subsequence
V& D( S1) and
assumption
of h. Reconsider
Vc, ~(VBi
D( s1))
are
that
Az is
instance
Shown
in
Unnest operator
p.’ Let the scheme (N have the form (A 1, @ Az)), and let
s e Inst(( x)). Then
p~(k,)(s)
is the (standard)
unnest
of s on the A( X2 )
attribute
of k and is an instance
of the scheme (X 1, Az). pA(~z)( S) is the set
tk for which
there
is a element
t in s such that
tK
of all elements
restricted
to h ~ is equal to t restricted
to A ~ and tw restricted
to h2 is an
element
of t( A( A2)). Observe that the application
of the unnest operator
can lose information
when applied
to instances
containing
empty-valued
tuple
components.
Reconsider
instance
S2 of Figure
1. The instances
~mmm,
CJ(S2) and
~U~)(PE(~(m,
C)(SZ)) are shown in Figure
3.
ACM Transactions on Database Systems, Vol 17, No
1, March 1992
Converting Nested Algebra Expressions
A
o
0
D
1
000
1
A
2
101
110
0
111
1
Fig. 4.
A formula
can have
of attributes,
the forms
called
is a query
2
Results of some nested algebraic
o
El
1
Cl
o
u
1
u
expressions.
one of the forms
where all f,s are formulas;
ts are different
and differ
A term
L.J
1
71
C
o
ABC’
100
D(B)
.
t is an identifier,
called
from the target
variable);
the scheme
of t.Furthermore,
(term,)
= (term,)
(terml)
e (term,).
a tuple variable
(all these
and h is a nonempty
list
a formula
can have
one of
and has one of the forms
TRUE
FALSE
t
t(a)
(t(a)
is called
a component
oft)
R
where
t is a tuple
bute (atomic
attribute,
is
variable,
R is a relation
identifier,
or structured).
The scheme of t(A),
A. The scheme
of t(A( h)), where
where
A is
and
A
a
a is an attriis an atomic
nonempty
list
of attributes
and A(A) is a structured
attribute,
is (h).
Note that (terml)
= ( termz) is only a formula
if the schemes of (terml)
and
(term,)
are compatible,
and that
(term,) = (term,)
is only a formula
if the
scheme of (terml)
is A and the scheme of (termz) is compatible
with (N.
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992
72
J. Paredaens and D, Van Gucht
.
Here
are some examples
{t[A]
of queries;
13t1[A,
B(C,
{~[A,
{tl[C]
l~t,[B(C)](t,(B(
2.5
D)](tleR~
B(C,
C))
The first query is a projection,
In the third query, note how
R has the
scheme
t(A)
B(C,
(A,
D)),
= tl(A)))
D)]lmteR}
= {t,[C]
I t,et,(B(C))}
Atlet,(B(C)))}
and the second, an (unsafe) complementation.
tz(B(C))
is defined in terms of itself.
Flat Algebra
An expression
of the
nested
algebra
of every
relation
is also an expression
of the
flat
algebra
iff
(1) every
attribute
(notice
that
this
eliminates
such an expression),
(2) neither
2.6
example
nor the empty
in Section
expression
from
operator
occurs
is atomic
being
present
in
in the expression.
2.3 is an expression
of the flat
of the nested
is also a query
flat
calculus
(1) every
attribute
of every
(2) every
attribute
of the scheme
atomic,
algebra.
OF THE
section
we
translated
into
the nested
algebra.
occurs
of every
a relation
with
to
We suppose
that
respectively,
in the query
variable
that
occurs
iff
is atomic,
in the query
is
how
ALGEBRA
expressions
calculus.
scheme
(h)
TO THE
of the
Let
rzae, nael,
(R
denotes
nested
and
an
NESTED
CALCULUS
can
be
rzaez be expressions
of
element
algebra
of
lnst((h)))
is
{ t[ A] I t e R}.
the
translated
nael
NESTED
show
the nested
translated
– Union:
that
calculus
is a query.
3. TRANSLATION
this
relation
of the
and
(3) no term
–R,
in the
operator
Flat Calculus
A query
In
occurs
unnest
and
the nest operator
The first
that
the
U naez
nested
to
algebraic
{ t[h]
(where
I f(t)},
A ~ and
expressions
{ t[hll
I fl(t)},
Xz are
nae,
and
nael,
and
nae2
are,
{ t[ Aql I fz(f)}.
compatible)
is translated
to
is translated
to
{~[~,ll(f,(~)vfz(t))}.
–Difference:
{t[hl]l(fl(t)
ACM Transactions
nael
– naez (where
h ~ and
Az are compatible)
A~f2(t))}.
on Database Systems, Vol
17, No. 1, March 1992
Converting Nested Algebra Expressions
– Cross product:
nael
is translated
Al and
Az have
(nae)
k = 1 and
no common
73
identifiers)
to
–Projection:
of
x naez (where
o
Hi, ....,
different
(where
attributes
ail,
il, . . . . i~
of
. . , u,~
h)
are
is
the
indexes
translated
to
t(~i,)= t~(~~,))}.
{t[~~,).
...~i,]/3t~(A)(f(t~)~~=~
—Selection:
ui.J( nae) (where i and j are the indexes of different
and aj of h) is translated
to { t[ N I ( fl t) A t( a,) = ~(~j))}.
–Nest:
Suppose
attributes),
that
h has the
and let
x /yY)
— Unnest:
Suppose
form
that
t[i,
aA(nae)
Let
A2 is a nonempty
of h. Vkz;A( nae) is translated
pyd
= t,(~)
h has the form
hz] 13tJA]u2[A7
{
—Empty:
Al, h2 (where
A be no identifier
~, A(h2).
of
‘o
(f(h)~ t(~)
C@?
(h) be the
is translated
scheme
of
nae,
and
let
A be no identifier
EXPRESSIONS
we
of
L
to
4. TRANSLATION
OF ff-EXPRESSIONS
FLAT ALGEBRA
section
list
to
pA(~2)( nae) is translated
the
this
a,
W)})}~
=
—Renaming:
,0( nae, A, B) is translated
to
every occurrence
of A is replaced
by B.
In
attributes
show
that
INTO
ff-expressions
translation
of the
nested
of
nae
where
OF THE
algebra
can
be
translated
into equivalent
expressions
of the flat algebra.
To establish
this
result, we first translate
the ff-expression
into a calculus
query that satisfies
ACM Transactions on Database Systems,Vol. 17, No. 1, March 1992.
74
J. Paredaens and D. Van Gucht
.
certain
syntactic
translate,
query
into
finally
calculus
The
by
the
a safe
use
the
to find
syntactic
the
flat
algebra.
that
of calculus
441
We
h,=
such that
and
c lJ
the
on the
relevant
queries
tl(al)
of the
flat
[441)
equivalence
the
would
like
to
strategy
state
that
this
We then
constructive
of
to achieve
intermediate
calculus
would
calculus
formulas
form is broad
but restricted
that
calculus.
We
the
safe
flat
an
open
it
is
result.
otherwise
enough
enough
require
existential
queries
are
in
are captured
a certain
normal
to allow the formulation
to disallow
the formulaa powerset
normal
form
operation
in
(enf) iff
k~l,
..vh~),
i, 1 < i < k,
-“”
3u,~t[h,~L](c/?
d, is in enf or
=
this
of Unman
setting
(see also [11).
calculus)
formula
g is in
~Uzl[h,l]
constructive.
rules,
Form
this normal
expressions,
for all
a query
about
algebraic
g=(hlv.
such that
such
transformation
sense
[11,
restrictions
requiring
an algebraic
A (nested
(in
call
several
theorem
Normal
Existential
We
of
a completely
form. Intuitively,
of nested algebra
tion
help
query
Codd’s
and
problem
4.1
restrictions.
with
=
d,
is
‘--
FALSE
~c,@~d,),
(in which
n,>
case “A
~ d,”
0,1,
>O,
is not written)
t2(ci2),
or
ciJ = t1et2
(a),
or
where
f is in enf,
= tl(A(h)),
where
f is in enf,
f}>
where
f is in enf,
The formulas
of the three examples
We now define
the disjunction,
existential
quantification
again are in enf. Let
in Section 2.4 are in enf.
the conjunction,
the negation,
for formulas
in enf so that
g =
,~lh,
()
ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992
the
resulting
and
the
formulas
Converting Nested Algebra Expressions
.
75
with
h, = 3ui1[AtJ
““”
LZL
~~ %,[
and let
g’=
,:lh~
()
with
with
where
h;
= ~ t[h]h,.
In the following
LEMMA
1.
lemma,
we establish
For each formula
fl
some simple
If t2 is not a free variable
of f,,
equivalent
to 3 tz[ h]( fl A f2( ta)).
(B)
If tz is not a free variable
of fl(tl)
then
and
then
(3 tl[hllfl(tl)
A ~ t2[A21 f2(t2))
3tl[h113t2[A21(
fl(tJAfJta)).
3 t[N( fl(t)
LEMMA
PROOF.
4.2
2.
(resp.
(gVg’)
v f2(t))
(g y g’)
(g Ag’),
This
Constructive
is logically
equivalent
(resp.
(g A g’),
~ g, i’~[~]g).
results
from
Lemma
equivalences:
and f2, we have
(A)
(C)
logical
~ g,
1 and
tl
(f,
A ~ tz[ N f2(t2))
is
is not a free variable
is
logically
tO (3 d Nfl(t)
?t[~lg)
is
elementary
logically
of fa(ta),
equivalent
to
v 5 t[Nf2(t)).
logically
logic.
equivalent
to
❑
Queries
We start
this
section
class of queries
admits
by defining
constructive
the formulation
of all
queries.
Intuitively,
this
nested algebra
expressions
ACM Transactions on DatabaseSystems,Vol 17, No 1, March 1992.
76
J. Paredaens and D. Van Gucht
.
(Lemma
6); however,
algebraic
setting
In constructive
tuple
variables
cursive
have
(acyclic)
variables
property;
these
in
established
4.3,
The
in Lemma
into
or sub query
to the result
.@ety
algebra
(in
will
then
the
the components
have
to satisfy
this
by F.
constructive
[Ill
query
sense
regarding
the
a flat
be
translation
of
the
facilitate
into
of [441) will
final
step in the
. . . . u z(aPz)) be a set 1 of components
the variables
of F) with
are called
...
~U,~,[hz~L](c,l/?
the constructibility
the
graph
set of possible
values
possible values of x. The constructibility
is the directed
graph with the following
(1) all
Also,
in a rzorzre-
..vh~),
h, = ~uzl[hzl]
that
query
F = { ul(aP,),
g=(hlv.
nodes.
in an
quantified
graph
of a query
every
theorem
Ul, . . ., ul (these
operations
in a set denoted
of this
8. Codd’s
powerset
in a constructibility
we transform
g be in enf, and let
We now define
require
as constructive
queries.
components
of existentially
are collected
relational
of the variables
informally
would
relation
components
query.
safe queries
translation.
Let
from
corresponding
Section
constructive
that
to be reachable
way
of tuple
Later,
queries
cannot be expressed
queries,
all of the
components
of the
tuple
‘-4 AC,l,A
of ( h,, F). An edge (x, y) denotes
of y is “bounded”
graph of (h,,
nodes:
variables
~dz).
U,l,
by the
F), notated
. . . . Uzn,
and
set of
as C(h,,
all
the
F),
elements
of F;
(2) all of the tuple
(3) all queries
(4) all relations
and with
(1) If
R that
= tz(az)
U,l, . . . . u,., and the variables
appear
is one
(Q, t( A(N))
(4) if
= Q or
hL on the highest
of the
c,,s,
C,JS, then
Q = t( A(h))
Ul, . . . . Ul;
in some
level
in some
c,~; and
C,J;
then
t,(a,))is an edge
(tl(al),
if
is
(t2(a),
tl)is an edge
one
of the
if
t2(a)# fi
CZ,S, Q being
a query,
then
is an edge;
te R is one of the
(5) if t= Q is one of the
(6) if
in
level
tl(al))
is an edge if tz(az) # F;
# F and (tz(az),
t(A(h))
h, on the highest
edges:
(2) if t,G t,(a)is one of the
(3) if
in
appear
the following
tl(al)
tl(al)
variables
Q that
t is a variable
and
C,js, then
(R,
C,JS, Q being
t) is an edge;
a query,
t(a) is a component
(Q, t)is an edge; and
then
of t,then
(t, t(a))is an edge.
t(a)is called reachable in a subgraph
C’(h,, F) of C(h,, F) iff
A component
a query
Q or a relation
R to t(a).A
there
is a path in C’(h,, F) from
component
t(a)is called
reachable
in (h,,
F) iff t(a) is reachable
1 The notions of constructibility
graph, reachability,
acyclicity,
F These variable
relative
to a set of variable
components
variable components of the involved enfs (see, in particular,
query { d N I g} in Section 4.2).
ACM TransactIons on Database Systems, Vol
in C( h,, F).
and constructiveness
are defined
components correspond to free
the definition
of a constructive
17, No, 1, March 1992
Converting Nested
Algebra
Expressions
.
77
R
o
tl(A)
t(A)
t
\
+—
o~o~o
tl(B(C,
D))
● tl
o
(a)
t/-(A)
R
e
e
\
●
t(B(C,D))
(b)
t,(c)
‘1
/“+2
.~a~e
+2( B(C))
\
●
{t3[cl
I t3 E t2(B(C))}
(c)
Fig. 5.
Constructibility
R A t(A)
= tl(A)),
graphs of the three examples
{t(A)});
(b)
C(7 t e R, {t(A),
●
in Section 2.4. (a) C(3 tl[ A, B(C, D)](tl
@(C’, D))});
(C)
C(3
tz[B(C)l(tJB(C))
=
{QCI I ~3G~JB(C))} A ~1GtJMC)), {tI(C)}).
The component
(h,, F), for all
reachable.
tl(C)
In Figure
5b no component
in (g, F) iff it is reachable
t(A),
tl(A),
and tl(B(C,
D))
is reachable.
In Figure
5C tz( B(C))
in
are
and
are reachable.
Informally,
elements
(hi,
t(a)
is called
reachable
5a
i, 1 s i s k. In Figure
F) is called
(1) all
(h,,
F) is called
of F are reachable
of the
acyclic
iff
C(hz,
components
variables
in the
A(h,, F); and
acyclic
of the
sequel)
if all
in an acyclic
and
of its
u variables
subgraph
F) has a subgraph
tuple
all
variables
of the
elements
and
of C( h,, F).
A(h,,
all
of the
Formally,
F) such that
ULI, . . . . Vz., (called
of F are
the
u
reachable
in
(2) A(hi, F) augmented
with all the edges (t(a), Q), with
t(a) appearing
the query Q, is acyclic. We call the latter
graph
A U( h,, F).
in
Informally,
(h,, F) is called constructive
if it is acyclic and if the negated
part, di, and all of the subqueries
that occur in h, are constructive.
Formally,
(h,, F) is called
(1) (h,,
constructive
iff
F) is acyclic;
(2) (o?i, @) is constructive;
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992.
78
J. Paredaens
.
(3) if some
c,Jhas
and D. Van Gucht
the form
u(a)
= {Lv[~]
I f},
or
{~[7]
I f}
= u(~),
or
Ue{w[y]l
then
{ W[Y] I f}
(4) if some
then
f},
is constructive;
c,] has the form
{ W1[Y 11I f’l}
(g, F) is called
and
{~1[~11
I f-l}
=
{ wz[-r ~1 I tz}
are constructive.
constructive
iff for
all
{%[y,]lf,},
i, 1 s i < k, (h,,
F) is constructive.
is called
constructive
iff (g, F) is constructive,
where
F = { u(a) I a ● h}. Note that,
whenever
G C F, (g, G) is constructive
if (g, F)
is also constructive.
recursiue
(see
The definition
of the constructiveness
of (h,, F) is clearly
The
{ U[ h] I g}
query
points (2), (3), and (4)). However,
it is clear that in spite
(purely
syntax-oriented)
definition
is well defined.
The first
example
of Section
2.4 is constructive;
of the recursion
the
second
this
is
not
of t is reachable
in (= t e R, { t( A), t(B(C,
D))}))
(since neither
component
be cyclic
and
(Figure
5b), and neither
is the third (since the graph
AU would
therefore
In
does
not
exist,
3-5
we
Lemmas
as can
show
be seen
how
from
Figure
constructiveness
5c).
is
preserved
in
various
situations.
LEMMA
(A)
If
3.
(g,
Let g and g’ be formulas,
F)
and
(g’,
F)
are
and let F and F
both
be sets of components.
constructive,
then
(g ~ g’, F)
is
constructive.
(B)
If (g, F) and (g’, F)
no common
components
are both constructive
or common
variables,
and
then
if g and g’ have
is
(g ~ g’, F U F)
(C)
If (g, F) and
constructive.
(D)
If (g, F) is constructive
and F contains
the set of all of the components
is constructive,
F,
being
the
the variable
t, then
(~ t[ AIg, F,)
of components
of F that are not components
oft.
constructive.
(g’,
@)
are
both
constructive,
then
(g ~
~ g’, F)
is
of
set
PROOF
(A)
Trivial,
(B)
Clearly,
F U F’)
since
(gvg’)
C(h ,,,
= A(hl,
= (g ~ g’).
F U F’) = C(h,, F) U C(h:., F’).
F) U A(h~,, F’);
then
AU(hii,,
Now
take
A(htc,
F U F’) = AU(hL,
or common
variF) U A U( h~, F’). Since g and g’ have no components
ables (and, hence, clearly
no common
queries),
and since A U( h,, F) and
ACM TransactIons on Database Systems, Vol. 17, No 1, March 1992
Converting Nested Algebra Expressions
(C)
AU(h;,,
F’) are both
F’) that
contains
Clearly,
(~ g’, a)
ness),
that
equal
to
which
is an acyclic
@)
A(hi,
LEMMA
4.
constructive
tz(az)), F)
PROOF.
is
This
FJ
It
(g
since
Let
5.
F)
clear
if
that
(h:,
a query
that
(h;,
occurs
FJ
, Fl)
is acyclic.
A(EJ,
Fl)
= Ai7(h,,
F),
We know
that
as a term,
is constructive,
it
and
remains
therefore,
❑
= t2(a2))
is in enf and ((hi A tl(al)
= tz(a~)),
= tz(az)),
F) = A(h,, F).
choose
A((hi A tl(al)
L tl(al)
we
can
❑
F) is constructive.
g be a formula,
and
let F be a set of components.
Let
be constructive.
in
(B)
If t( A(AI))
of t and no component
is a component
in
~t[hlg,
constructive.
then
L tl(a)
~t[h]g,
(~t[hl(g
Q is a constructive
(St[N(g
R.
of constructive-
AU(h;
Fl)
If tl(al)
does not appear
is constructive.
{ tl(a,)})
If
F U
We can choose
(A)
(C)
AU(h,z,,
since no arc ends in
Let g be a formula,
and let F be a set of components.
If (g, F) is
and tl(al)
and tz(az) are two elements of F, then ((g ~ tl(al)
=
is constructive.
Clearly,
LEMMA
be
Also,
((hi A tl( CYJ = tz(az)),
(1 t[ Ng,
in
79
: g’, F) is constructive.
and hence,
proves
be a cycle
is constructive.
should
graph,
is constructive.
acyclic,
Hence,
F).
only
is impossible
(see (2) of the definition
(g L
(h”, FJ
is constructive.
(2 t[ Ng,
could
is constructive
to prove
constructive.
F)
there
R. This
so by (B) we see that
(D) We have
(di,
acyclic,
a relation
.
(~t[ h](g
tlG t(A(Al))),
L
query
then
and
tl(a)
~ t(a)
= tl(al)),
of tl or tl itself
F U {tl(a)l
does not
appear
F U
appears
a e Al})
in I t[ Ng,
is
then
= Q), F U { tl(a)})is constructive.
PROOF
(A)
Let
be the ith component
of ~t[hl(g
We have to prove that
{ t,(al)}.
that
it is acyclic.
We can choose
and
~ t[a)= tl(al)),
(i,, F) is constructive.
A(~,,
~) as A(3 t[ A]hi,
let ~ be F U
We first show
F) plus
the edge
tl(al)).
Hence,
AU(~i,
~) is AU(3 t[~lhi,
F) plus the edge (t(a,
It is easy to see that all of the components
of the u variables,
the
tl(al)).
t and the gle~ents
of F and tl(al)are reachable
in A(~,, ~).
variable
tl(al)does not appear in
Furthermore,
A U(hi, F) is acyclic
because
(t(a),
(B)
~ t[ A]g.
Clearly
remain
constructive.
(d,,
@)
is
constructive,
and
queries
appearing
in
~,
Let
be
the
ith
component
I~a G Al}. We
{ tl(ci)
have
of
~t[h](g
to
show
ACM Transactions
~ t, e t( A(AJ)),
that
(~,, ~)
and
is
let
~ = F U
constructive.
We
on Database Systems, Vol. 17, No. 1, March 1992
.
80
J. Paredaens and D. Van Gucht
first
show that
it is acyclic.
the edge (t(A(hI)),
A U(3 d h]hz, F) plus
the
of
components
{ tl(a)
We can choose
tl) and the edges (tl,
the edges just
of the
I a e A ~} and
of
F
are
A U(%,, ~) is acyclic
because
Clearly,
constructive,
(d,,
@)
is
for all
mentioned.
variables,
u
A(%z, ~) as A(3 t[ h]h,,
tl(a))
the
reachable
and
~) is
It is easy to see that
all of
in
t and
A(Z,,
~).
queries
appearing
the
elements
Furthermore,
of t or t itself
no components
F) plus
AU(%,,
variable
a e Al.
app~ars
in
in g.
h remain
constructive.
Let
(c)
~,=
be the
ith
component
We have
acyc~c.
to show
ye
variables,
in A(~,,
appear
in g. Clearly,
constructive.
3-5
(d,,
~) as A(3 t[hlhi,
~ = F U { t,(a)}.
show
that
it is
F) plus the edge (Q, tl(a)).
and queries
appearing
in ~,
Section
into
3, where
a nested
we discussed
calculus
query,
allow
translation
of a
us to establish
expression
of the nested
algebra
can be translated
into
a
query. z
Let
nae
by induction
Induction
be an expression
on the number
hypothesis.
Every
n (n > O) operators
Basis
let
We first
@) is constructive,
with
expression
Every
6.
ROOF.
than
= Q), and
result:
constructive
lemma
A(~,,
L t,(a)
is constructive.
❑
together
algebra
LEMMA
F)
the variable
tl and the elements
of F and tl(
a) are reachable
A U(Z,, ~) is acyclic
because
tl(ct)
does not
~). Furthermore,
the following
be
can choose
remain
Lemmas
(h,,
and the edges ( x, Q),
F) is AU(3 t[hlh,,
F) plus the edge (Q, tl(a))
x appears
in Q. It is easy to see that the components
of the u
All(h,,
where
nested
of_qt[~](g
that
= Q)
3t[~](hLAt1(a)
step.
into
the
expression
query
into
nae = R (with
{
nested
in
of the
can be translated
(n = O). Then
translated
of the
of operators
t[h]Ite R},
algebra.
nested
algebra
a constructive
scheme
which
We
prove
the
nae.
(h)).
This
is clearly
with
fewer
query.
expression
can
a constructive
query.
Induction
Binary
step.
operators.
Suppose
nae has
n operators
(n > O).
Then
nae = nael
U naez,
nae = nael
– naez,
nae = nael
x nae2.
or
or
2 Although
it is not necessary for our results, we conjecture that the converse of this lemma is
also true. A possible way to prove this would be to compare constructive
queries with the
calculus queries introduced in [37] and with the strictly safe calculus queries introduced in [1].
ACM Transactions
on Database Systems, Vol. 17, No 1, March 1992,
Converting Nested Algebra Expressions
Since nael and
the constructive
(i)
(ii)
nae can be translated
nae = nael
U nae2.
is constructive
nae = nael
L ~,(tl))}.
(iii)
naez have fewer than n operators,
they can be translated
queries { tl[ Al] I ~l(tl)}
and { tz[ Az 1I ~z( tz)}, respectively.
This
query
– naez.
This
nae = nael
identifiers.
because
nae
query
can
be
is constructive
Unary
because
operators.
into
+JCY)
of Lemmas
y fz(tl))}.
L
3(C).
Az have
no common
= t,(a))
~a,h,t(cl)
because
into
{ tl[ All I ( fl( tl)
of Lemma
case h ~ and
A2]l(:t,[hJ(f,(tJ
is constructive
{ tl[ A ~1I ( ~l(tl)
81
3(A).
translated
Notice that in this
nae can be translated
into
~:t2[A2](f2(t2)
query
into
of Lemma
X naez.
{t[h,,
This
.
= t,(a)))}.
3(D),
5(A),
and 3(B).
Then
nae = II ,I,,,,,,(nael),
or
nae -
u, =J ( nael),
or
~ae = W;A(n%))
or
nae = fl~,,)(nael),
or
nae = @~( nael),
or
nae = p(nael).
Since
nael
constructive
(i)
nae - II,,,,,,
This
(ii)
has fewer
than
n operators,
A ~1I fl( tl)}.
query { tl[
query
(nael).
because
(iii)
query
is constructive
nae = v~,, ~(nael),
{t[h’,
with
t(A(hg))
= t,(~)
ACM Transactions
3(D)
the
and 5(A),
3(B).
nae can be translated
‘~~~lt(a)
= {tz[~2]/sts[~](f(t3)
~aEA2t2(@)
into
= h(~,))}
of Lemma
Al = Al, h2. Then
A(h2)]lstl[~](~(tl)
translated
into
A h(%)
because
be
into
of Lemmas
nae can be translated
{~l[~l]l(fl(h)
This
can
nae can be translated
is constructive
nae = O,=J(nael).
it
= tl(a)
into
L
~ue~’t[a)
= ~3(~))})}.
on Database Systems, Vol. 17, No. 1, March 1992.
82
J Paredaens and D. Van Gucht
.
This query
fact that
is constructive
is a constructive
(iv)
query
nae = ~~(~l)(nael),
{[)+,
(which
with
)i2]
(v)
query
ncze = p(rzael,
4.3
because
The
A, B).
Between
of Lemmas
=
3(D),
and the
and 5(A)).
into
b(cY))).
5(A),
and 5(B).
into
of Lemmas
renaming
Logically
query
te {d N I f(u)}
form
3(D)
5(C),
~aexl(a)
of
a
3(C) and 5(C).
constructive
Equivalent
query
is
a
transformation
rules
Formulas
that
ness and that will allow us to “flatten”
constructive
rules are considered:
(1) The query elimination
(Tl)
the
5(A),
❑
query.
several
because
Lemmas
‘a,x2t(cY)
rme can be translated
Transformations
We now define
from
3(D),
h’. nae can be translated
~ t2et,(Aoi1))
is constructive
constructive
follows
13t,[h1]3t2[h2]
(f,(t,)
—
is constructive
nae = 0~( nael).
This
(vi)
query
of Lemmas
Al = A(hl),
= t,(a)
This
because
by the
equivalent
f(t)
preserve
constructive-
calculus
queries.
Four
replaces subformulas
of
subformulas.
(2) The
q~er~
propagation
(T2) substitutes,
if t(a) = Q, every occurrence
of t(a) by Q.
Applying
the query propagation
repeatedly
results in a query in which each
structured
component
occurs at most once. (3) The structured
component
elimination
(T3) then allows the removal
of these components.
After applying
the structured
component
elimination,
the query no longer
contains
structured components.
The only reason for a query to be nonflat
is occurrences
of
subformulas
of the form Q1 = Q’ (where Q1 and Q’ are subqueries).
(4) The
query equality
elimination
can then be used to transform
these subformulas.
Let g be a formula,
let F be a set of components,
and let (g, 1’) be
constructive,
(Tl)
query
Query
with
membership
membership
g’=(hlv
~=3u1[AJ
ACM Transactions
elimination.
elimination
. ..vh._lvh;
””” 3un[~n]
rule
vh,+lv”
((cl
A”””
Let
substitutes
CJ = t1e{t2[A211
g by g’, with
““vh~),
AcJ_lAcj+l
A””
on Database Systems, Vol. 17, No. 1, March 1992
.Ac1A7~)A~(tl)).
f(t2)}.
The
Converting Nested
(T2)
Query
Q - {tz[~zl
propagation
where
propagation.
I ~(~2)},
rule
CL ( p #j)
(1) Delete
the term
the attribute
(3) If the scheme
LEMMA
7.
.
83
Cj = Q = tl(ci)where
of
A(h,,
~).
The
query
with
every
occurrence
the
formula
h,,
from
the scheme
empty,
of u~.
remove
elimination.
Query
structured
3 Un[ 1.
h’,, and let
equality
membership
and
hi.
A(h)
equality
Consider
term in which
u~( A( h)) occurs. The
substitutes
g by g’ in three steps:
c, from
formula
The query
query
only
rule
elimination.
of Vn becomes
the resulting
f2(t2)}.
or
edge
is CP(d, respectively)
component
(2) Delete
Query
= Q
Expressions
by Q.
Structured
(T4)
Cj = tl(u)
(d’, respectively)
suppose that
Cj is the
component
elimination
Call
Let
(Q, h(~))
be an
g by g’, with
let
substitutes
of tl(a)replaced
(T3)
and
Algebra
Let
elimination
substitutes
propagation,
query
I fl(tl)}
= {tz[~zl
I
g by g’, with
structured
and
elimination,
c, = {tl[yl]
component
equality
elimination,
elimination
preserve
constructiveness.
ROOF.
rules
(Tl),
In this
(T2),
proof
(T3),
we use the notation
let
graphs,
Therefore,
F be a set of components,
C(h,,
1’) contains
in the transformation
To show that
query
membership
we first need to establish
a property
(Tl)
Query membership
elimination.
elimination
preserves
constructiveness,
of constructibility
introduced
and (T4).
a subgraph
—that
satisfies
the reachability
definition
of constructibility
let
and assume
A’(h,,
that
(h,,
3’) is constructive;
then
F)
and acyclicity
graphs; and
conditions
mentioned
in the
ACM Transactions on Database Systems, Vol. 17, No, 1, March 1992.
84
J, Paredaens
.
and D, Van Gucht
–for
which,
for each component
of the u variables
and
there
exists
a unique
query
or relation
from which
element
can be reached in A’(hi,
F).
each element
of F,
the component
or
To see that this is so, consider
C(h,, F), and let x(a) be a component
there
a u variable
or an element
of F. Since (h,, F) is constructive,
subgraph
A( h,,
F) of C’( h,, F) that
satisfies
the
reachability
and
of either
exists
a
acyclicit
y
conditions
mentioned
in the definition
of the constructibility
graph. Suppose
there exist two different
nodes Ql, Qz (each of which is either
a query or a
relation)
in A( h,, F) from which
x(a) is reachable
(see Figure
6). Let y be
the node where the paths from QI and Qz to x(a) join for the first time,
let (z, y) be the edge of the path from Qz to x(a) pointing
to y. Consider
subg-raph
of
that this
mentioned
subgraph
in the
that
in
A( h,, F) where
this
the
edge
(z, y) is removed.
still satisfies
the reachability
definition
of constructibilit
y
graph
x(a)
can
be reached
via
It
should
and
the
be clear
and acyclicity
conditions
graphs
and, furthermore,
one
less
in A( hi, F). Repeated removal
of such edges eventually
A’(h,, F) with the desired properties.
(This establishes
path
than
it
could
results in a subgraph
a result
about con-
structibility
graphs that will be necessary
in the rest of the proof about the
preservation
of constructiveness
by query membership
elimination.)
Let Q = {tz[~z] I f(tz)}
(remember
that CJ = tl e {tz[hzl
I f(tz)}
in (Tl) and,
hence,
that
Cj = tl ~ Q, let
f(t2)
= (p,(t2)v
““”
vpm(t2)v
““’
Vpr(t,)),
and let
Then
h;=
Let
him
j
aul[~l]
m=l
be the
constructive.
rrzth
Consider
““ . 3un[An]3wm1[AmJ
disjunct
h, and
“ - “ %Jm.m]
to prove
that
(h~m, l’)
of h;. We have
p~( tz). We know that (h,, F) and ( Pn(ta),
(where
G = { tz( 13)I tz(i3) e Az } )
A( hi, F) and A( pn( tz), G). There
are
constructive.
are two cases:
Consider
the
is
G)
graphs
7). Consider
what happens
in this
(1) (Q, tl) is an edge of A’(h,, F) (Figure
situation
when
we “join”
the graphs
A’(hi,
F) and
A( p~(tz), G) by
applying
the query membership
elimination
rule.
Figure
8 shows the
resulting
graph
A U(hjm, F). Clearly,
AU(hjm,
F) is A(h’lm, F) plus
ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992.
Converting Nested
Fig, 6.
x(a) is reachable
“\
.
85
from both QI and Q2
●+-----e-
●
Q1(t2)
•~o
/
•~*
A’(hi
ExpressIons
tz(pl)
tl (al)
Q
Algebra
,F)
e~e
●
tl (Ciq)
Qq(t2)
;~’d
A(pm(t2
),G)
(a)
(b)
Fig. 7.
(a) Graph
A’(h,,
F); (b) graph
A( pm(tz),
G).
the edges of the form
( z(-y), P) if .z(T) occurs in the query
P. (It
important
to observe that t2 is replaced by tl in these graphs and that
has disappeared.)
We have to show the reachability
the w variables,
and the elements
AU(hl~,
F). Let
of F. We consider
(a)
(b)
of the components
of the u variables,
of Fin
A(h’Lm, F), and the acyclicity
of
z(y) be a component
two cases:
Z(Y) is not reachable
the reachability
from
of a u variable
tl(CYU). Then
of Z(T) in
is
Q
or of an element
the reachability
follows
from
A( h,, F).
z(y) is reachable
from tl(a”).
Then the reachability
reachability
of tz( @v) in A( pn(tz), G).
follows
Let z(-y) be a component
of a w variable.
The reachability
from the reachability
of z(y) in A( Pn(tz), G).
from
of z(y)
the
follows
Next we have to show that A U( h;n, F) is acyclic. To show this, we first
identify
two parts of the graph A U( h:m, F), called L and R, respectively.
L is the subgraph
of AU(h;~,
F) corresponding
to A’ U( h,, F), and R is
the
subgraph
of
A U(h’in,
F)
corresponding
to
A U( p~( tz ), G)
(see
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992
86
J. Paredaens and D. Van Gucht
.
‘2(P1)
●+-----exl
Q1(t2)
o~e
(3)
Xq (5 )
A’(hi
,F)
;~”d
+ (aq)
Qq(t2)
M+Jt2
))G)
(b)
(a)
Graph
Fig. 8
Figure
8). It should
.Z(7) occurs
e~e
be clear
in the query
that
P, might
(a) Z(7) is not a component
rise to cyclicity:
of tl.
A U(h~m,
only
edges of the form
create
cycles.
We consider
(i) z(y) occurs in L and appears
path
in AU(h~m, F), giving
a path necessarily
since z(y) occurs
F).
(z(y),
We consider
two
P), where
two cases:
cases that
in P in R. Suppose that
rise to a cycle. Notice
may
give
there is a
that
such
has to go through
a component
of tl. Now,
in Q before the query
in P, it also occurred
membership
elimination
that
this would
have
rule
was applied.
given
rise to a cycle
It is easily
seen
in A’ U( h,, F), a
contradiction.
(ii)
(b) z(~)
z(y) occurs in R and appears
in P in L. Notice
that the only
Because of
paths from L to R originate
in tl or in its components.
the special characteristics
of A’(h,, F), there is no path from P to
tl or to any components
of tl;hence, the edge (z(y), P) cannot
create a cycle in A U(h: ~, F).
There
of tl,say, tl(au).
is a component
are two cases:
(i) Suppose that
P is in L. Then the edge (tl(CYU), P) was added to
F). This edge cannot give rise to a cycle, since in A’(h,, F)
A(h;~,
there is not
A’77(hz, F’).
(ii)
suppose
A(h’,m,
that
a path
from
P is in
F). There
P to
R. Then
tl( au) because
the
edge
(tl(au),
of the
acyclicity
P) was
added
of
to
are two cases to consider:
(ii.1)
tl(au) occurred
in P before the query membership
elimination rule was applied.
In this case, tl(aU) occurred
in Q,
giving
rise to a cycle in A’ U( h,, F), a contradiction.
(ii.2)
tz(DU) occurred
in P before the query membership
elimination
rule
was applied.
In this
case, the
acyclicity
of
ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992
Converting Nested Algebra Expressions
L
Q1(t2)
‘1/”
“~o—”
“k
Fw
,F)
( pn(tz),
AU(h;m,
Qq(t2)
Graphs
G) guarantees
F).
(2) (Q, tl) does not appear
in
queries
that
(Q, tl)
case.
A’(h,,
so no reachability
of variable
existence
of Q; in particular,
other
F) and A(p~(t2), G).
A(h,,
We have thus shown that, when
acyclic. We now turn to the other
from
.—.
‘1 (aq)
Fig. 9.
results
87
R
+(~1)
A(h’im
.
there
can
be
is an edge of A(hl,
F). Hence,
in
A’(h,,
no
F),
cycle
in
(h’im, F)
is
F) no edges leave
components
results as a consequence
the reachability
of the components
or relations
(see Figure
Q,
of the
of tl
9).
For A(h~m, F) we propose using the graph shown in Figure
10. Clearly,
AU(h’im,
F) is A(h’im, F) plus the edges (z(-y), P) if z(~) appears in the
query P. Again,
we introduce
the notation
L and R.
It should be clear that the reachability
of the components
of tl and the
components
of the u variables
follows from their reachability
in A’( hi, F).
The reachability
of the components
of the w variables
follows from their
reachability
in
cause there
Hence,
(hjm,
A( pm( t2 ), G).
are no paths
1’) is acyclic
from
in this
The
acyclicity
of
A U( h’i ~, F)
L to R and vice versa
case also. Furthermore,
is constructive,
it follows
from Lemma
3(C)
Hence, (h:, F) and (g’, F) are constructive.
that
follows
(see Figure
since
( hjm, F)
be-
10).
(d V eJ(tl),
Ql)
is constructive.
(T2) Query propagation.
We have to prove that (h;, F) is constructive.
tl(a) (remember
that
tl(a) = Q or Q = tl(a)
in (T2)) can
First
note that
occur within
a subquery
P of h ~. In P, tl( u) is certainly
a free variable
component.
constructibility
This
implies
graphs,
and (2) in the edge part
that
at the query
we have
definition
that
P, when
no edge leaves
of constructibility
we consider
tl(a)
its associated
(see conditions
graphs)
(1)
and, therefore,
of cyclicity
in the
that the substitution
of tl(a)by Q cannot lead to problems
subquery
P. The
reachability
of the
components
of the
u variables
and the elements
of F in A( h’,, F) follows
from the reachability
of these
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
.
88
J. Paredaens and D. Van Gucht
L
%
‘“F‘:,7“
Xq (8)
e
=---—e_____e__e
and
“
“d” ~
Fig. 10.
components
o~o
xl (a)
‘\
‘~
Q,(L)
.--)
elements
in
Graph
A(hjm,
A( h,, F).
The
Qq(t2)
F).
only
interesting
observation
to
from a query P and on the path from P
make is that, when x(8) is reachable
F) the reachability
of x(~)
will
to x(6) the node tl(a) occurs, then in A(hj,
result from the fact that there is a path from Q to x(/3) in A( hi, I’). We now
turn
to the acyclicity
occurrence
of
consider
F). We first
of (h’,,
tl(a);otherwise,
two cases in which
remark
that
the
query
Q contains
no
there
would
be a cycle
in A U( h,, F). We
substitution
of tl(cx) may potentially
give rise
the
to cyclicity:
(a) Let
x( f?) be reachable
occurs
on the
path
from
a query
P in
P to x((3). After
from
A U(h,,
substituting
F),
tl(a)
such
by
that
tl(a)
Q, there
is
a path from
Q to x(/3). Suppose that
x(8) occurs in Q. But then there
F), namely,
the path consisting
of the edge
would be a cycle in A ll(h,,
going
from
going
edge
Q to
from
tl( a),
x(6)
the
path
going
from
tl(a)
to
x(~),
and
the
to Q, a contradiction.
from a query P in A i7(h L, F), and suppose
(b) Let x(@) be reachable
of tl(
a). Hence, after the substitution,
contains
an occurrence
that
P
P
is
transformed
into a query P’ in which tl(a) is replaced by Q. Suppose that
of
x((3).
This would
introduce
a cycle in
Q contains
an occurrence
A U(h;, F). This cannot happen, however,
because there would be a cycle
in
A U( h,, F),
tl(a),
the
namely,
the edge going
edge going
from
the
Hence, (h;, 1’) is acyclic. Also,
for tl(CY) in d does not affect
structive
sub queries
are not
Hence, (hj, F) is constructive,
(T3)
eliminate
Structured
terms,
ACM Transactions
component
attributes,
path
consisting
of the
from tl(a) to P, the path
x(~) to Q.
going
edge going
from
from
P to x(6),
it can easily be seen that the substitution
the constructiveness
of (d’, @). Finally,
affected
by the substitution
of Q for
and therefore,
so is (g’, F’).
elimination.
and,
eventually,
This
is obvious
quantifiers.
on Database Systems, Vol. 17, No. 1, March 1992.
since
we
Q to
and
of
Q
contl( a).
only
Converting Nested
(T4)
Query
structive.
equality
We first
choose for
elimination.
show
A( h’i, F) the
that
We
(h:,
graph
Algebra
have
to show
F) is acyclic.
that
This
follows
the
nodes
A( h ~, F) without
and
{
the queries
{ tl[y ~1I ~l(tl)}
queries are preserved
by the query
Expressions
.
(h;,
F)
because
89
is
corresponding
t2[-y21I f2(t2)}.
Clearly,
constructive
elimination
rule. We only
equality
con-
we can
to
subneed
to show that
is constructive.
LEMMA
This
from
A constructive
8.
PROOF.
follows
According
Lemmas
query
to Unman
3(C), 3(D),
in the flat
[44],
calculus
a query
{
❑
and 3(A).
is safe.
t[h]I f}
of the flat
calculus
is
safe iff
(1) whenever
t satisfies
relation
f
each
of the database;
component
of
t occurs
somewhere
in
a
and
(2) whenever
3 u[yl fl occurs in the query,
if fl is satisfied
by u for any
values of the other free variables
in fl, then each component
of u occurs
somewhere
in a relation
of the database.
query in the flat calculus.
Let { t[N I f} be a constructive
( f,F) is constructive,
where F = { t(a) I a e X}. In particular,
(1) all of the elements
of F, hence,
(f, F), which
implies
components;
and
(2) in
a subformula
item
3 v[-YI fl
all
of the
since
4.4
t only
components
( fl, Fl), F1 = { u(a) I a e ~}, which implies
components
of the u variables
are atomic
implies
that
of t,are reachable
all the components
(1) above,
This
has atomic
of
u are
in
attribute
reachable
item (2) above, since
attribute
components.
in
all of the
❑
Translate
Algorithm
We now present an algorithm
to construct
a query in the flat algebra that is
logically
equivalent
to a formula
of the nested algebra with flat operands and
a flat
result.
Algorithm
translate
Input.
An ff-expression
Output.
An expression
of the nested
algebra.
of the flat algebra.
Method
(1) Transform
6.
(2) Repeat
steps
(2. 1) Repeat
(2.2)
the given
ff-expression
(2. 1) and (2.2)
qu..y
until
membership
Repeat
query
propagation
elimination
until
no longer
into
a constructive
no longer
elimination
immediately
possible.
calculus
query,
using
Lemma
possible.
until
no longer
followed
by
possible.
structured
component
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
90
J. Paredaens and D. Van Gucht
.
(3) Repeat
query
equality
elimination
(4) Apply Codd’s theorem
algebra
expression.
We illustrate
step
following
constructive
until
for translating
(2) of the
query:
{t[A]l~t,[B(C)]
no longer
algorithm
with
~t2[D(E(F))](te
t2(D(E(F)))
possible.
a safe calculus
query
an
tl(B(C))
so we apply
{t[A]\~tl[B(C)]
flat
Consider
the
example.
= {t4[I]lt4~
step (2.2).
(tetl(B(C))
an equivalent
At1et2(D(E(F)))A
= {t3[G(H)]lt3(G(H))
Step (2. 1) is not applicable,
into
Atl=
This
R}})}.
yields
the query
{t,[G(@]lt,(G(~))
={t4[I]lt4GR}})}.
Now
we can apply
step (2.1),
{t[A]l~t,[B(C)]
Now
we apply
and we get the query
(tGtl(B(C))
step (2.2),
apply
= {t4[I]\t4GR})}.
and we get
{t[A]lte
We finally
Atl(B(C))
{tA[l]ltleR}}.
step (2.1) to get the q~ery
{t[A]lteR}
LEMMA
Algorithm
9.
Translate,
taking
nested algebra,
stops and produces
equivalent
to the given ffiexpression.
PROOF.
prove
performed.
nent
First
that
it
The
of every
component
we
ends.
result
prove
be
the
algorithm
7 indicates
is a constructive
variable
can
that
Lemma
that
left
occurs
after
as input
an expression
is correct,
how
query.
in
step
the
query
(2).
Nor
t(a) G Q, t(a) = Q or Q = t(a)
left.
the form
QI = Qz are eliminated
by step
created by this step, all of the queries are
the result
after step (3) is a constructive
query
is safe by Lemma
8 and can be
form
an ffiexpression
of the flat
if
steps
(2)
Since
every
is reachable,
are
any
it
and
of the
algebra
ends.
(3)
no
Then
have
structured
formulas
that
to
is
we
be
compostructured
of
the
Since
all of the formulas
of
(3), and since no new query
is
eliminated
after step (3). Hence,
query in the flat calculus.
This
transformed
into
an equivalent
expression
in the flat algebra.
In order to prove that the algorithm
ends, we
call n~ the number
of queries and n~ the number
of occurrences
of structured
components
in the query.
Suppose
that
we apply
step (2) each time
on
sub queries
without
occurrences
of structured
components.
Each time
step
(2. 1) or step (2.2) is executed, the value n~ + ns becomes smaller,
since every
❑
(h,, F) is acyclic. Each time step (3) is applied,
n~ becomes smaller.
ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992
Converting Nested
We are now able to formulate
For
THEOREM.
equivalent
every
expression
ffiexpression
in the flat
The transitive
COROLLARY.
our main
Algebra
Expressions
.
91
result:
in
the
nested
algebra,
there
is
an
algebra.
closure
is not expressible
in the nested
algebra.
5. CONCLUSION
Our main result
states that the flat algebra
is as expressive
as the nested
algebra
operating
on a flat database
and with a flat relation
as a result.
A
simple consequence
is, therefore,
that recursive
queries,
such as the transitive
closure,
are
not
expressible
in the
nested
separates the nested algebra and the powerset
that recursive
queries
are indeed expressible
again
suggests
flat algebra,
Although
(either
that
whereas
we did
atomic
the
nested
algebra
the powerset
not consider
or structured)
algebra.
This
is a conservative
algebra
is not.
selection
operations
or selection
result
clearly
algebra
since it is well known
in the powerset
algebra.
This
operations
extension
involving
involving
of the
constants
subset
condi-
tions, it is straightforward
to generalize
our results to include these options.
This generalization
notably
expands the range of practical
queries that are
affected by our results.
We believe that our
mization.
tional
This
queries
results
will
clearly
can
be
can contribute
depend
processed
on how
in
to (flat)
efficiently
relational
Related to this issue, it would be interesting
proof of our main result,
that is, a proof
relational
flat-to-flat
database
query
opti-
nested
rela-
implementations.
to obtain
a complete
algebraic
without
having
to introduce
an
intermediate
calculus.
Finally,
algebraic
query languages
have also been introduced
for objectoriented
databases
[421. It would be interesting
to investigate
if and how our
translation
strategies
and results generalize
to such databases.
ACKNOWLEDGMENTS
The authors
are grateful
to the referees
for their
many
excellent
comments
and suggestions.
REFERENCES
1, ABITEBOUL, S., AND BEERI, S. On the power of languages for the manipulation
of complex
objects. INRIA Tech. Rep. 846. 1988.
2. ABITEBOUL, S., AND BIDOIT, N. Non first normal form relations: An algebra allowing data
restructuring.
J. Comput. Syst. Sci. 33, 3 (Dec. 1986), 361-393.
of
3. AHO, A. V., AND ULLMAN, J. D. Universality
of data retrieval languages, In Proceedings
the 6th ACM Symposium
on Princ~ples
of Programming
Languages
(Jan. 1979, San Antonio,
Tex.). ACM, New York, 1979, pp. 110-117.
4. ARISAWA, H., MORIYA, K., AND MIURA, T. Operations
and the properties
on non-firstof the 9dz VLDB
(Florence).
1983, pp.
normal-form
relational
databaaes. In Proceedings
197-205.
ACM
SIGMOD
Rec. 15, 3
5. BANCILHON, F. A logic-programming/object-oriented
cocktail.
(Sept. 1986), 11-20.
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
92
J. Paredaens and D. Van Gucht
.
6. BANCILHON, F., AND KHOSHAFIAN, S.
5th
ACM
A calculus
SIGACT-SIGMOD-SIGART
for complex
Sympos~um
on
objects. In %oceedmgs
Princ~ples
of
Database
of the
Systems
(Mar 1986, Cambridge, Mass.). ACM, New York, 1986, pp. 53-59.
On line processing of compacted relations
In
7 BANCILHON, F., RICHARD, P., AND SCHOLL, M
Proceedings
of the 8th VLDB
(Sept. 1982, Mexico City). VLDB Endowment,
1982, pp.
263-269.
of
8. BEERI, C., AND KORNATZ~Y, Y. The many faces of query monotonicity.
In Proceeclmgs
Aduances
in Database
Technology
–EDBT
’90 (Mar. 1990, Venice)
Lecture Notes in Computer Science, vol. 416. Springer-Verlag,
New York, 1990, pp. 120-135
9. BEERI, C., NAQVI, S., RAMAKRISHNAN, R., SHMUELI, O., AND TSUR, S. Sets and negation in a
of the 6th ACM SIGACT-SIGMOD-SIGART
logic database language (LDL1). In Proceedings
SYmposzum
on Principles
of Database
Systems (San Diego, Calif.)
ACM, New York, 1987,
pp. 21-37.
10. BIDOIT, N.
The Verso algebra
Sci. 35, 3 (Dec. 1987),
11.CODD, E. F.
Rustin,
Relational
Ed. Prentice-Hall,
12 COLBY, L.
A recursive
of the A CM– SIGMOD
1989, Portland,
and how to answer
queries
without
joins.
J. Comput.
Syst.
321-364.
completeness of database sublanguages.
In Database
Englewood Cliffs, N.J , 1972, pp. 65-98.
algebra
and query optimization
International
Ore.). ACM,
Conference
New York,
for nested relations.
on Management
of Data
R.
Systems,
In proceedings
(May 1–June 2,
1989, pp 273-283.
13 DADAM, P. Advanced
information
management
IEEE Data Eng 11, 3 (Dec. 1988), 4-14
relations.
(AIM):
Research
m
extended
nested
14. DADAM, P , KUESPERT, K., ANDERSEN, F., BLANKEN, H., ERBE, R , GUENAUER, J , LUM, V.,
PISTOR, P., AND WALCH, G
A DBMS prototype to support extended NF2 relatlons:
An
of the ACM- SIGMOD
integrated
view on flat tables and hierarchies.
In Proceedings
Internat~onal
Conference
on Management
of Data (May 1986, Austin, Tex.). ACM, New York,
1986, pp. 356-366.
A storage
15. DEPPISCH, U., PAUL, H. B., AND SCHE~, H. J
Proceedings
of the
Grove, Calif.).
International
Workshop
and
of the GI Conference
Scientific
system
Object-Oriented
for complex
Database
objects. In
(Pacific
Systems
1986, pp. 183-195
16. DESHPANDE, A., AND VAN GUCHT, D.
Proceedings
on
Applications
(Apr.
A storage
on Database
structure
Systems
1987, Darmstadt,
for
for unnormalized
Office
Germany)
relations.
Automation,
Inf
In
Engineering
Fach berzchte
136
(1987),
481-486.
17. DESHPANDE, A., AND VAN GUCHT, D.
Proceedings
of
the 14th
18. DESHPANDE, V., AND LARSON, P. A.
Dept. of Computer
Science, Univ.
19. GYSSENS,M., AND VAN GUCHT, D.
constructs to the nested relational
tional
Conference
An implementation
VLDB (Los Angeles, Calif.).
on Management
An algebra
of Waterloo,
for nested relational
1988, pp 76-88.
for nested relatlons.
Ontario,
databases
Res. Rep
In
CS-87-65,
1987.
The powerset algebra as a result of adding programming
of the ACM- SIGMOD
Internaalgebra
In Proceedings
of Data (June 1988, Chicago, 111.). ACM, New York, 1988,
pp. 225-232
20. GYSSENS,M., AND VAN GUCHT, D. The powerset algebra
J. Comput.
SYst. SCZ. To be published.
database relatio~s.
as a natural
tool to handle
nested
21. GYSSENS, M., PAREDAENS, J., AND VAN GUCHT, D. A umform approach towards handling
atomic and structured
information
in the nested relational
database. J. ACM 36, 4 (Ott.
1989), 790-825.
22. HOUBEN, G., AND PAREDAENS, J. The Rz-algebra: An extension of an algebra for nested
relatlons.
Tech. Rep., Computer Science Dept., Techmcal Univ., Eindhoven,
The Netherlands, 1987.
23. HULL, R., AND Su, J. On the expressive power of database queries with intermediate
types.
of the 7th ACM
SIGACT-SIGMOD-SIGART
Symposium
on Principles
of
In Proceedings
Database
Systems (Austin, Tex. ). ACM, New York, 1988, pp. 39-51.
24. JAESCHKE, G., AND SCHEK, H.
ACM Transactions
J.
Remarks
on the
algebra
on Database Swtems, VO1 17, No. 1, March 1992.
on non
first
normal
form
Converting Nested Algebra Expressions
relations.
In
Principles
of Database
Proceedings
of
the
1st
ACM
SIGACT-
SIGMOD-
(Los Angeles, Calif,). ACM,
programming
with sets. J. Comput.
Systems
25. KUPER, G. M. Logic
44-64.
26. KUPER, G. M., AND VARDI, M.
Syst.
93
Symposium
on
1982, pp. 124-138.
41, 1 (Aug. 1990),
Sci.
in the logical data model. In
Theory
(Bruges, Belgium).
Lecture Notes in Computer Science, vol. 326. Springer-Verlag,
New York, 1988, pp. 267-280.
27. LARSON, P, A. The data model and query language of Laurel, IEEE Data Eng. 11, 3 (Sept,
1988), 23-30.
28. LINNEMANN, 1’. Non first normal form relations
and recursive queries: An SQL-based
Proceedings
of the 2nd
On the complexity
SIGART
New York,
.
International
Conference
of queries
on Database
of the3rd
IEEE International
Conference
on Data Engineering
approach. In proceedings
Angeles, Calif.). IEEE, New York, 1987, pp. 591-598.
29. MAKINOUCHI, A. Aconsideration
ofnormal
form ofnot-necessarily-normalizedrelationsin
of the 3rd VLDB
(Oct. 1977, Tokyo).
the relational
data model. In Proceedings
Endowment,
1977, pp. 447-453.
(Los
VLDB
ALogical
Language
for Data and Knowledge
Databases.
Freeman,
30. NAQW,S., ANDTSUR,S.
San Francisco, Calif., 1989.
31. OZSOYOGLU, G., OZSOYOGLU, Z. M., AND MATOS, V. Extending
relational
algebra and
relational calculus with set-valued attributes andag~egate
functions, ACM Trans. Database
Syst.12,4
(Dec.
32. PAREDAENS, J.,
Relational
Model.
1987),566-593.
DE BRA, P., GYSSENS, M.,
EATCS
Monographs
AND VAN GUCHT, D.
on Theoretical
Computer
The
Science.
Structure
of
the
Springer-Verlag,
New York, 1989.
33. PAUL, H. B., SCHEK, H. J., SCHOLL, M. H., WEIKUM, G., AND DEPPWCH, U. Architecture
and
Proceedings
of the
implementation
of the Darmstadt
database
kernel
system.
In
ACM- SZGMOD International
Conference
on Management
of Data (May 1987, San Francisco,
Calif.) ACM, New York, 1987, pp. 196-207.
34. PISTOR, P., AND ANDERSEN, F. Designing
a generalized
NF 2 model with an SQL-type
of the 12th VLDB
(Aug. 1986, Kyoto). VLDB Endowlanguage interface. In Proceedings
ment, 1986, pp. 278-288.
35. PISTOR, P., AND TRAUNMUELLER, R. A database language for sets, lists and tables. Zn~ Syst.
11, 4 (1986),
323-336.
36. ROTH, M. A.,
KORTH, H. F., AND BATORY, D. S. SQL/NF:
relational
databases. Znfi Syst. 12, 1 (1987), 99-114.
37. ROTH, M. A., KORTH, H. F., AND SILBERSCHATZ,A. Extended
Syst. 13, 4 (Dec.
relational
databases. ACM Trans. Database
38. SCHEK, H. J., AND SCHOLL, M. H. The relational
model with
Syst.
11, 2 (1986),
A query
language
for ~ lNF
algebra and calculus for nested
1988), 389-417.
Infi
relation-valued
attributes.
137-147.
39. SCHEK, H.
40.
41.
42.
43.
44.
J., PAUL, H. B,, SCHOLL, M. H., AND WEIKUM, G. The DASDBS
project:
Data Eng. 2, 1 (June
Objectives, experiences, and future prospects. IEEE Trans. Knowledge
1990), 25-43.
SCHOLL, M. H. Theoretical
foundation
of algebraic optimization
utilizing
unnormalized
of the 1st International
Conference
on Database
Theory (Aug. 1986,
relations. In Proceedings
Rome). Lecture Notes in Computer
Science, vol. 243. Springer-Verlag,
New York, pp.
380-396.
SCHOLL, M. H., PAUL, H. B., AND SCHEK, H. J. Supporting
flat relations
by a nested
of the 13th VLDB
(Sept. 1987, Brighton,
Great Britain).
relational
kernel. In Proceedings
VLDB Endowment,
1987, pp. 137-146.
SHAW, G., AND ZDONIK, S. A query algebra for object-oriented
databases. Tech. Rep.
CS-89-19, Dept. of Computer Science, Brown Univ., Providence, R. I., 1989.
in Computing
THOMAS, S. J., AND FISCHER, P. C. Nested relational
structure.
In Advances
Research III, The Theory of Databases,
P. C. Kanellakis,
Ed. JAI Press, Greenwich, Corm.,
1986, pp. 269-307.
Principles
of Database Systems. 2nd ed. Computer Science Press, Rockville,
ULLMAN, J. D.
Md., 1982.
Received July 1989; revised February
1991; accepted March
1991
ACM Transactions on Database Systems, Vol
17, No. 1, March 1992.
© Copyright 2026 Paperzz