SQL Query Optimization: Reordering for a General Class of

SQL
Query
Optimization:
Reordering
for
a General
Class
of Queries
Piyush
IBM
Santa
Goel
Teresa
Bala
Laboratory
IBM
Santa
[email protected]
bala@vnet
Abstract
Partial
The strength
from
their
of commercial
ability
equivalent
reordering
are no known
for a SQL
query
containing
lower
groupby
a SQL
aggregations.
to capture
the SQL
However,
outer
joins,
identities,
semantics,
these
joins,
operators
it is during
with
hypergraph
outer
current
few
results
in query
optimizing
joins,
specified
with
predicates
reordering
query
outer
expressions
joins,
that
optimization
and full
generated
rithms
lations
full
binary
outer
joins)
in the
Gupta
query
et.
ex-
al.
in
appear adjacent
unary
predicate
to each other,
operators.
references
by an aggregation
However,
a column
operator,
if
that
known
is
algo-
do not tell us how to reorder the binary reand assign operators
to produce the same
1 in which
either
algorithms
are unable
containing
predicate
c generated
(~3.b
02 VI .c)
by aggregation
oper-
ator count:
joins
reference
to exhaustively
aggregations,
outer join
column
involving
complex
View
more
two relations
or reference columns generated
To the best of our knowledge,
aggregations.
VI: Select rl .C as a, m.d as b, c = count(m)
From
by
the
rl,
r2
rl .b 01 rz.b
Where
Groupby
rl .c, rz .d
reorder
outer
Query 1: Select rs.a, rq.b, VI .b
From (Select * from VI Left OuterJoin
rs. b .92 V1.c), rq
Where rq.b = VI .b
join predicates and outer join predicates that reference
columns generated by aggregations.
Bhargava et. al.
[BHAR94,
BHAR95a]
have proposed
algorithms
to
reorder queries containing
outer join predicates
that
reference two or more relations, but their work can not
reorder
and
other
for re-
that
result.
For example, let T1, r2, r3 and r4 be four
relations such that attributes
rl .b and rl .C are in relation rl; attributes
r2 .b and r2 .d are in relation T2;
has
outer
than
queries
(we know)
require
BHAR95a].
operations
an outer join
Query
BYs,
SQL
joins
to each
intervening
references
GROUP
existing
outer
joins
T3; and atattributes
r3. u and r3 .b are in relation
tribute r4 .b is in relation ~4. Consider the following
state-of-the-art
for
inner
[BHAR94,
without
Motivation
The
adjacent
the binary
and
Introduction
1.1
(joins,
be brought
a method
that we identify a powerful primitive needed for a dbms. We
report our findings of a simple, yet fundamental operator,
generalized selection, and demonstrate its power to solve the
problem of reordering of SQL queries containing joins, outer
joins, and groupby aggregations.
1
operations
joins,
are sufficient
their
Algorithms
and
[GUPT95]
and Bhargava et. al. in [BHAR95d]
presented
algorithms
to “push-up/push-down”
proj ections, aggregations
and selections in such a way that
and groupby
Using
.ibm.com
outer
pression
reordering
we propose
cent aining
While
there
of the reordering
be missed.
all
Laboratory
reordering:
ordering
like DB2 comes
by generating
all equivalent
some
cost may
query
order
operators.
joins,
and a set of novel
to reorder
optimizers
to generate
Consequently,
significantly
model
of binary
methods
aggregations.
query
to select an optimal
Iyer
Teresa
where
the following:
01,02
G
{=, +,~,
~,<,>}.
rs On
Because
of
column
Permission to make digitahhard copy of part or all of this work for personal
or classroom use is granted without fee provided that copies are not made
or distributed for profit or commercial advantage, the mpyright notice, the
title of the publication and its date appear, and notice is given that
copying is by permission of ACM, Inc. To capy otherwise, to republish, to
post on servers, or to redistribute to lists, requires prior specific permission
andlor a fee.
c which is generated by aggregation
operator
count,
and no generally
known definition
of outer
joins based on cartesian product, view V1 cannot be
merged
existing
47
the
are
unable to reorder the outer and inner joins specified
in the query. Specifically,
the join specified in view
VI
SIGMOD ’96 6/96 Montreal, Canada
(D 1996 ACM 0-89791 -794-4/9610006 ...$3.50
with the rest of Query 1. Consequently,
algorithms
in [BHAR95a,
BHAR95d]
cannot
be reordered
with
other
outer
and inner
joins
in Query
specified
1- if predicate
(I4 .SUPKEY
T4 .b = VI .b is
highly filtering
then it may be beneficial to perform
this join first, before performing
the aggregation that
counts the tuples obtained after combining
relations
This
T1 and Tz. A motivating
example
the
Example
three relations
is included
VZ.QTY
next.
query,
Consider
94AGG,
95 DE-
if executed
it
TAIL,
and SUP.DETAIL,
where relation
94AGG
contains aggregated information
about different parts
if
94AGG
This
relation
is an index
has three
and QTY,
PARTKEY
cent ains
unique
unique identifiers
contains
QTY
supplied
by a supplier.
contain
four
DATE
the
date
tributes
the total
of
SUP.DETAIL
SUPKEY,
SUP
attribute
tifier
for
the
SUPDETAIL
supplier,
attribute
relation
to relation
may generate
supplier
volume
of approved
is no longer
with
suppliers,
a reliable
the supplier
at-
attributes
in
SUPRATING
and
attribute
information
about
is relatively
either
supplier
is dropping.
alternative
to the potenial
FROM
Thus,
grouping
if there
on attributes
the reduction
of
can be used as a good
reduction
through
join.
The
motivation,
a class of queries
exam-
called
“Join
by employing
queries
For example,
Tuple
(TIS).
consider
following
co-related
query in which rl, r2, and r~ de-
the
note three relations
such that rl .a, rl .b, rl .C and rl. ~
rs.
because
and/or
rl .a From
(Select
the
rl .b 6’1
Where
From
r2
Where
r2.c = r-l..
and
count(*)
From
rs Where
rz.e = r, .e and
rl. ~ = rs. f ))
the
In order
rl
count(*)
rz. d &
to
where 01, 62 c {=, #, ~, <, <, >}. In tuple iteration
semantics, first, every tuple in T1 is combined with
tuples in r2 by applying predicate r2 .C = ~1.c. Next,
each selected tuple
in relation
r2 (along with
the tu-
ple in rl ) is substituted
in predicate r2 .e = r3 .e and
rl. ~ = r3. ~ in order to select and count the selected
tuples in relation r3. This process is repeated for every tuple in RI and may result in a very inefficient
like processing
strategy.
Ganski [GANS87] and Murlikrishna
[MURA92]
have
proposed an algorithm
to unnest the above query
and to transform
it into the following
two queries
PARTKEY;
VZ.PARTKEY,
Vz LeftOuterJoin
specially
Semantics
(Select
95 DETAIL
VZ.SUPKEY,
V9.95AGGQTY
Select
it
Iteration
that
Query:
plan
queries:
Join-aggregate
in relation
VS: Select SUPKEY,
PARTKEY,
95AGGQTY
= COUNT(*)
SUPKEY,
) then
95 DETAIL,
a. SUPKEY
as SUPKEY,
a.QTY as QTY, a. PARTKEY
as PARTKEY,
b
From 94AGG a, SUP~ETAIL
= b. SUPKEY
and
Where a. SUPKEY
b. SUPRATING
= “ BANKRUPT”;
GROUPBY
from
b. SUPKEY
are attributes
in relation ~1; r2 .C) r2 .d and r2 .e are
attributes
in relation r2; ~3.e and r3. ~ are attributes
Vz: Select
From
selected
=
95 DETAIL
and PARTKEY.
“nested-loops”
View
a better
relation
ecute the join- aggregate
generate this list, the business analyst could define
the following
views and pose the following query on
them:
View
are
This
cardinality.
“BANKRUPT”
through
Select
list
=
join.
the
(a. SUPKEY
cardinality
Suppose a business analyst is interested in identifying
those suppliers
that are to be discontinued
from
the
tuples
done
Aggregate Queries” has been discussed in [GANS87,
MURA92]
and references therein. To the best of our
knowledge, the majority
of commercial
RDBMS ex-
94AGG.
three
94AGG
outer
reduce
few
ples and issues with
and SUPDETAIL,
contains unique iden-
detailed
Typically,
small in comparison
remaining
of a supplier,
contains
a supplier.
contain
RATING,
rating
●
contains
as in relation
SUPKEY
every
contains
95 DETAIL
DATE
the
the
that
be
of a part
PARTKEY,
and
meanings
Let relation
where
of parts, and
Also, let relation
a translation,
have similar
very
RATING
SUPKEY
attribute
quantity
SUPKEY,
attributes
where attribute
QTY,
and
SUPKEY
of suppliers,
in
entail
table
may be beneficial to first combine relations 94AGG
and SUP-DETAIL
with relation 95 DETAIL,
before
performing
the aggregation
on relation
95 DETAIL.
SUPKEY,
where attribute
identifiers
contains
PARTKEY
attribute
attributes
b. SUP
and
would
95 DETAIL
potenial
by predicate
supplied
by different
suppliers
in the year 1994,
relation
95 DETAIL
contains
detailed
information
about every transaction
recorded so far in the year
1995, and relation
SUP_DETAIL
contains
inforLet us assume that
mation
about each supplier.
94AGG
as written
the
participate
may
However,
and
on
can
aggregation
and
= V8.PARTKEY
< 2 * V3.95AGGQTY);
aggregation
before
1.1
= V3.SUPKEY
Vz .PARTKEY
&.QTY,
employ
T1 key,
relations
V3 On
48
outer joins and do not require TIS.
and r3. key denote
key attributes
r2. key
rl,
r2 and
r3, respectively.
Let
of
Query
2: Select into
rl key,
rl .a, rz. key,
From
(Select
* from
LeftOuterJoin
r3
Groupby
rz .d
Query
attributes
and
rl Left OuterJoin
rl .a, rz key,
tb count(rs
3. Select
From
V2 are virtual
ceptualize
rz On rl .C = rz .c),
the
key)
rl .a
the
r-l .b 61 cozmt(rz.key)
each
from
However,
the
cannot
a~gorithms
reorder
2 because
then,
predicate
[BHAR95a,
outer
complex
joins
~Z .e
rl .j = ~s ~ – if the cardinality
large then it may be beneficial
Tz and
r3 first,
before
in Query
=
p.
joins
We
employ
T and
sch(p)
We
assume
reference
that
at least
to
denote
predi-
one relation
operand
relations.
If,
references
exactly
references
to
to as complex
we assume
that
a predicate
as a simple
more
than
two
if
relations
Through
are
re-
predtcate;
two
predzcate.
predicates
) to
the
operator
is referred
of
sch[r
the
relation
We represent
rs .e and
of relation
to combine
accessing
of relation
to con-
extensions
null
then
out
this
m-tolerant2.
BHAR95b]
specified
predicate
is a way
the
a join
it
it is referred
in
the left
of
both
identifiers
E2 denote
respectively.
with
with
paper,
and
of predicate
specified
the
r2,
schema
cates
lations
(row
EI
and
schema
specified
r-l key
Having
rl
denote
rz.b
TEMP
Groupby
it)
relations
rz. e = r~ .e and rl .f = rs, f
I-1 key,
Having
TEMP
rz.b
minology
~1 is very
relations
lier
work),
which
~1.
by
in
statement,
[GUPT95]
a Generalized
accepts
a new
groupby
SQL
employed
also
Ikojectzon
as its argument
relation
using
(which
according
a relation
the
the tercites
ear-
(G P),
7rx,f(y
T and
produces
subscripts
X
and
attributes
referenced
j,
~(Y)
[GUPT95].
In
this
ized
paper,
we introduce
selectzon
ent
(GS),
schedules
example
in
for
queries
in query
2, this
which
if this
substantially
new
join
nary
operations.
that
queries
have
in
[BHAR95b,
sented
contain
any
we assume
are
then
of the
query.
1. Through
been
out
simplified
redundant
queries
(full)
outer
are stmple
the
Further,
this
no aggregate
join
with
we assume
predo
not
edges;
that
is,
~(Y)
rest
of the
1.2 provides
provide
some
the
3,
we
to
“break-up”
paper
basic
definition
provide
is organized
definitions.
of the
association
In
Some
aggregate
max,
rein,
new
operator.
identities
complex
provide
an outline
of the
Section
5 provides
the
In
that
predicates.
In
enumeration
tains
Section
Section
by
Section
process.
4,
Basic
From
now
of SQL,
convey
more
on,
awg(distinct
confusion,
SQL
the
The
algebraic
semantics.
pertinent
in
ing
semantics.
the
SQL
(R2,, Vz, Ez)
where
column
1 We
names
predicates
conjunctive,
that
is,
the
form
Ap2
A...
full/left/right
the
outer
join
in
the
such
that
where
&
to
@ denotes
where
DIs-
TX(~)
has
it.
(if present).
represents
FROM
the
SQL
r GROUPBY
duplicate-insensitive,
If
etc.
DISTINCT
~
E
then
GP
{rnaz,
rein,
)}.
a
eg.
GP
either
statement
function
), sum(distinct
r2
or con-
such
a GP
and
is denoted
count
(distinct),
Whenever
r to denote
T1 U r2, of two
is the
rl
(RI
U R2,
E’
= {tl(%
relation
of
W T2,
both
there
6 and
is
no
~.
RI nR2
= 4,
Note,
that
VI and
operations
predicate
either
p
join
is
rows
2A
schema
or
evaluates
attribute
operator.
49
~ (R2 -
G (RI
-
and
rz
relations
The
is the
outer
relation
where
= t’ A
R,)
U (V2 -
predicate
of
p,
to
in
R2) U (Vl
in rl WT2 are padded
are not present
of
U E2).
T1
c EI)(t[RIVl]
(VA
containing
compatible
V,))(t[A]
= NULL))
V2))(t[A]
= NULL))}.
(3t”e E2)(t[R2V2]
= t“ A
V
rz =
union
(R, V, El
relations
VI U V2, E’),
(YA
and
binary
rz,
are
to the
names),
with
rl
lieu
of captur-
(sets
attribute
expression
A pn,
intent
(r)
(distinct),
we employ
set of definitions
= (.??l, VI, El)
specified
in
attention
schemas
or, equivalently,
are
p=pl
rl
two relations
R2 denote
that
our
with
Let
is used
is constructed
– a complete
[BHAR95a],
denote
R1 and
assume
We restrict
definitions
are provided
syntax
syntax
GP
with
duplzcate-znsensitzve
union,
and
union,
algebraic
FROM
Finally,
conclusion.
formal
as
us
definitions
albeit
this
X, count(Y)
SELECT
JX,f(y),
rl
1.2
X
the aggregation
functions
count
the
we
enable
in
TX(r)
SELECT
that
nx,co~n~(y)
a duplicate-insensitive
is termed
2, we
Section
SELECT
specified
specifies
GP
GP
x.
GAL192b].
as follows.
instance,
M@TLkntly,
Note
SELECT
represents
The
Or,
function
example,
statement
they
[GAL192a,
For
For
X statement.
2. Subscript
bi-
the
statements
x
TINCT
method
so that
SQL
GR.OUPBY
may
of those
specifies
statement.
join
this
paper,
by
BHAR95c]
T
Conse-
predicates
this
X
groupby
represents
schedules
enumeration
conjunctive
the
3. For
first.
selective
1. Subscript
differ-
2 and
joined
complete
only
General-
us to generate
rl
the cost
contain
to Query
allows
facilitates
that
operator,
us to generate
similar
is very
reduce
operator
queries
enables
T4 and
relations
quently,
a new
that
p
is
over
null
FALSE
R
either
[BHAR94].
a set
tuples
with
in relation
of
in-tolerant
for
-
real
in
that
nulls
rl
r2.
sch(p), called
R Q sch(p)
attributes
attributes
have
for attributes
or in relation
NULL
values
in
the
if
any
p
The
antz join,
rl
relation
(Rl,
sion
E’
is a set
tion
T1 (with
jozn,
rl
~
VI, E’),
r2,
T2,
unmatched
to
predicate
relations
of
(RI RZ, VI V2, E’),
where
?’I and
p is a predicate
containing
respect
-%
of relations
where
rl
and
extension
The
rz
E’
in
is the
is the
outeT
union
of re-
relation
tributes
relatzon
and
The
defined
in
T2 is the
&
2
Generalized
this
the
T1 is
null
Tz, . . ..Tn
t~ons
~i,
&
re-
similarly
be
in the
7’2 and
with
nulls.
In
we
(GS’)
define
which
set
T2 &
reordering
of queries
predicates
that
containing
are
For
on
definition
relations
generated
then
expression
(TI
2.1
with
to Query
pl,s
~
(T1.f
=
SiIICe
(Tl
‘$
Tz)
““~p’”
T3
#
T2 and
(TI
‘~
T2)
““~p’”
T3
#
~
(TI
(r2.e
‘~
=
T3)
“’2-%’2’3
outer
joins
executed
follows
with
are
from
‘%
the
fact
that
cannot
join
operator.
similar
queries
up complex
predicates.
tion
of other
plans
and
~~,,,[7’~~Z](7’~
Intuitively,
where
retains
In
~1 that
padding
This
such
(T2 ‘%$
Consider
the
attributes:
a
blclf
those
tuples
addition,
are
it
not
tuples
general,
also
by
may
specify
and
‘$
be
Consider
spending
defined
applied
for this
ation
m
el
of this
2 in
The
‘Y
Section
following
expression
for
p@2,3
T2)
1.1 with
table
the
r3
provides
above
corre-
predicates
as
the
evalu-
T3
which
tables:
blclflcldlel
a
elf
generaT3) ‘~
T2)
those
in
attributes
than
one
=
(Ri,
expression
following
(T1
‘~
T2)
‘~
table:
r and
relation
T2 :
in T – T1.
relation
operator,
U, Ei)l
to the
predicate
p, appropriately
selection
consider
evaluates
selection
satisfy
tuples
predicate
Now,
a; [Tl] (T),
p to relation
T that
more
ri
(T1
k
of the
to
as for-
Notice
null
and
shown
query
next.
{R, V, E),
Query
earlier.
1 <
i <
n,
applying
50
that
values
in table
r =
relations,
‘2:EEEI
lc21f2\
expression
to
us to break-
conventional
nulk+ for
in a generalized
defined
the
retains
in ?’1 with
we
be preserved
Let
following
Em
specified
operator
predicate
in relation
selected
by
T3)).
to
– it, applies
a~(T)
is to
is obtained
1
J
7’3:
This
facilitates
selection
C T, is similar
T1
up
enables
operator
T2)
that
~.
their
full
only
the definition
that
as U~,,3[7’1T2]((T1
generalized
can
is specified.
Reordering
o’,
‘~
note
re-
and
and
predicates
motivates
r3)
If no relation
relation
only
real
‘$
Also,
relation.
resultant
(GS),
are specified
[T1T2]((T1
p to relation
1.1,
Ts.f)
outer
query
it
be broken
(GS),
‘~
this
complex
join
operator
2 in Section
joins,
which
outer
other
mally
in
some other
selectton
In
then
manner
generalized
p.
if only
T,)),
employed
in the
any
with
and
(TZ
where
selection
that
preserved.
el
P1, ay,3
(TI
r3.e).
a;,,,
is
‘l:m
‘$
T3 corresponding
P2,3
rela-
(R, V, E’),
generalized
a base
the
predicate
I
by
~
T2.c),
of
T1T2,
is not
T1T2
be preserved
and
T2) ‘1’3~p”3
=
(GS),
~R,V,(~,(~))}
-
are preserved
relation,
relation
TI,
where
(T1.c
R,
then:
preserves
relation
— — {~R,v.(T)
la21b1
pl,z
~
complete
predicates
columns
consider
that
that
&
generalized
perform
complex
specified
example,
one
applying
operator
us to
is the
i # j,
selectzon
r
LkJl<,<n
above
Example
a novel
n,
- eg. in expression
only
here
enables
<
~p(T)
those
lations
selection
section
gene?’a~tzed
such
R,
by:
=
the
only
outer
extension
‘r~ &
padded
E’
and
full
i
in
at-
relation
TZ, the
TZ,
the
The
relations)
= ~, where
]( T ) , 0 f relation
1 <
is given
p~esemed
supplying
can
supplying
T1 and
TI
for
the
null
T2,
Similarly,
appropriately
aggregations.
nulls
the
T1 ~
relation.
of relations
tuples
selection
Join,
of relations
TZ,
union
with
In
r2 is called
outeT
which
with
T1 is called
Relation
relation
?’tght
preserved
TI
is the
& r2 are padded
sch(7-2 ).
lation.
Join,
rl
in
base
predicate
K (lVj
2.1
o*[Tl,
E’
from
= # and
Definition
relation
necessarily
a null-intolerant
Ri flRj
rela-
lefl
(not
p denote
exten-
tuples
p).
be relations
TZ is the
and
the second
in attributes
T1 contains
null
the generalized
in table
T2 contains
e and f whereas
tuple
the second
values
selection
in these
operator
attributes.
a ~,,,[rlrz]
nontuple
By
at
the top,
the
wecompensate
correct
result
P1.3$P2,3
T3, as given
7’2 )
By
for
appropriately
selectlon
full
operators
outer
join
on the
difference
to
table
in
se~ecting
a generalized
selection
this
corresponding
generate
.............
(~1 ‘$
T’l.
the
preserved
operator,
the
product
relatlons
inner,
can be defined
cartesian
and
expression
m
outer
and
as a generalized
as follows:
H
=
({T1,
T2, T3, T4$T5},
Figure
3
Deferred
application
of
and
of
predicates
moving
top.
up
E
reordering
queries
referencing
aggregations
This
that
push-up
and
contain
columns
associated
predicates
to
require
breaking
of a complex
in
following
predicate,
may
as shown
the
and
Consider
the
following
expression:
(Vl,
r;
~ rl
and
pression
T cannot
In order
to reorder
gation
r;
in predicate
to the
~ 772. Let
pl,3
and
be reordered
and
break-up
A p2,3.
Since,
and
in the base relations,
.
presslon
[BHAR95a,
the
rect
the
reorderable
GAL192a,
modified
result
top
GAL192b].
by applying
– eg.
the
can
in this
the
even if a query
reference
the
aggregated
complex
does not
In order
hypergraph
model
presented
association
trees.
hypergraph
query,
and
which
correspond
full
outer)
join
the
sets of relations
3.1
pair
where
A
in its
2V).
For
we call
VI
V1 and
V2 as simply
specified
we
an outer
we say
outer
of both
join
with
edge
In the
a hyperedge
operation
is
in
the
is bz-diTected
operation
inner
and
an
nodes.
say
a hyperedge
join
VI
e as simply
joins
always
it
if
in the query.
Note,
generate
un-
edges.
Example
3.2
the
f& =
Tl ‘$
where
Consider
the
following
query
Ti =
(7’2‘2’4s’2’5((TA ‘h’7’5)
paths3
this
p1,3 at
(Ri,
Vi, Ei),
is a predicate
Pij
cor-
reordering
.=count(?.,)((n
‘~
that
a complete
this,
we employ
earlier
in
[BHAR95a]
the
set
of nodes
involving
two
predicates
to generate
relations
hypergraph
cardinalities
hyperedge
represents
a full
predicates
Figure
to break-up
represents
operation
Definition
(V, E),
to
-+
VI, V2 6 2V,
hypergraphs,
Further,
directed
in
it is desirable
to show
Intuitively,
a hyperedge
is a mapping
Qq:
T3))
‘h’
no
presented
each
contain
in order
set of reordering.
E
1? : 2V
1 shows
between
nodes
T1
h2, the
we break
?’lhl
(even
T2h2T5,
and
predicate
1 <
though
<
Note
5.
that
there
are 3
?’~hl?’zhz?’4h4T5
Note,
are
and
i, j
query.
and
T5).
hypernodes
complex
for
TJ,
for this
has no cycles
— r1h1r3h3T5,
hyperedge
1 < i < 5, is a relation
T, aid
the hypergraph
hypergraph
—
for
between
for
and
{T2}
directed
{T4,
p2,4 A p2,5 then
If
T5}.
we obtain
section.
columns,
predicate
that
(i.e.,
where
the
to
hypernodes
if it
new
Furtherl
V2),
refer
dmected
ex-
r3)
predicate
m~, , [TlT21(7h/3,3,:.:,,1
712
)‘~ T3)),as shown
can
modified
generate
remaining
such
of V
V2 ) and
of query
aggrega-
pl,3
T2) ‘%
For
we
3.2
count.
by
predicate
by the algorithms
expression,
Example
predicate
c is generated
The
Ex-
the aggre-
complex
be applied
with the outer join.
.
1s flv3T3T:T; ,c=countfT1) ((Tl ‘~
is completely
of
column
the
tion
longer
be ref-
p2,3.
r, we can move
pl,3
not
c only
in predicate
due to aggregation
expression
top
column
not
(Vl,
context
represents
where
hyperedges,
subsets
1, we
query.
erenced
of
e =
If e =
and
3.1
for
V2 hypemodes.
V2 are
example.
Example
set
non-empty
hyperedge
requires
the
is the
on
aggregations
aggregated
of aggregation
1: Hypergraph
complex
predicates
The
h2, h3, h4})
{hi,
referenced
an
(inner,
hypernodes,
H
V is a non-empty
set
of the
in
the
or
between
to
be
of nodes
and
intend
Q;
=
=
(T I ‘~
(T1 ‘%
to show
that
and
T2)
T2 ) ‘%
‘+
((T4
expression
( (T4
‘h’
‘M
T5)
T5)
‘h’
‘M
T3)),
Q4 is equivalent
to
aj,,4[T1T2](Q~).
the
and
as follows.
is defined
T3))
We
Q:
C7J2,4[T1r2](Qj)
outer,
a predicate
expressions
the
and
A
query
hypergraph
senting
predicate
information
ence
regarding
more
than
broken
up into
erators
and
For a given
‘Please
is a useful
connectivity,
refer
two
outer
identities
query,
to
join
relations
smaller
abstraction
as well
[BHAR95a,
we can construct
[BHAR95a]
for
predicates
and,
components
the
for
as the
that
therefore,
using
precise
be
op-
GAL192b].
hypergraph
definition.
refer-
cannot
the known
GAL192a,
its
repre-
semantic
and
then
determine
satisfy
the
lowing.
reordering
the
in
which
(TI .r,).((~~.~s).m))
is
(~~ .((~z.7’z).(~~.~s)))
definition
should
and
tree
in
[BHAR95a]
be combined
ation
trees
for
tree
is an association
tree
requires
only
after
Q;
and
association
for
to
for
Q41
generate
Q~,
and
tree
for
Q?.
process
of
establish
relations
r4
we consider
complex
definiassoci-
association
trees
the
Then,
equivalence
trees
and
the
next
nodes
in
optimal
be
is
to
expression
We
trees.
contain
any
step
order
also
as a mean
at
generalize
containing
of
cent ain
selected.
association
that
then
order
do not
operators
of the
expression
trees
the
can
assigning
predicate,
expression
internal
cost
the
they
trees,
the
trees.
minimal
~2
The following
by permitting
to
the
relation
that
determine
association
expression
the
only
operations;
operators
with
use
Note,
Given
assign
for
trees
different
operators.
–
tree
combining
Q~ as valid
fed-
combined
association
association
an
T54 – eg. (~1 .7’2). ((r4. r5). r3)).
allows more association
trees
tion
are
The
execution
that
in the
of a hypergraph
relations
is an
)))
operations
as defined
an association
order
(7’1
.((T2.T5).(T4.T3
the
of the
y criterion,
Intuitively,
specifies
eg.
those
connectivity
most
our
number
to
First,
one
results
of
to
complex
predicates.
for
Q4.
3.1
Definition
3.2
we define
an
For
hypergraph
a query
assoczatzon
T for
tree
H
H
=
(V, E),
as a binary
Splitting
Initially,
tree
most
complex
assume
one
that
complex
of transforming
in which.
expression
1. leaves(T)
than
=
V,
and
no
relation
appears
in
either
more
the
one leap.
or
2. for
any
subtree
3. for
any
subtree
the
set
T,
T, H
of
\kaves(T,)6
at
the
root.
with
the
of TI
of all
with
leaves
used
TL and
leaves(~)
and
VI
~
(27 .Tr ) of T,
hyperedges
hyperedges
subtrees
=
in
of T,,
E
and
let
in subtree
Tr.
that
E;,
Ts in
Then
connect
Ve = (VI,
and
V2 ~ leaves,
leaves
(T. ); and
all
of the
with
the
complex
child
into
complex
child
predicate
or grand
or
an equivalent
of the
where
of
root
then
the
the complex
1<
conjunctive
child
at the
root,
can be used to break-up
the
appears
grand
appears
child
at
consists
predicate
i <4,
predicate
ri
and
rj,
be four
between
then7:
the
n
P;,2AP; ,2
+
T2
r~
P:,2M,2
++
2
= a~: ~[rl](rl
‘S
following
(1)
r2)
VI ~
or Vz ~ Jeaves(Zl)
either
or
-pi. + denote
relations
to ~ombine
Vz) E ET,,
expression
the
contains
approach
!.
leaves
denote
order
expression
Our
Let I-, = (RA, Vi, E,),
relations,
ET, denotes
let
root
identities
predicate.
T,
given
which
If the
following
is connected.
a given
predicate.
the
in
predicates
2
r2
= uj:
~[rl,
r2](rl
‘S
7-2)
(2)
is
true:
egET,
●
o 3(v;
, vj’ )
PI,2
@ 7-2) “’3S2’3
(rl
;or
E ET,
s~ch
that
either
V1 s
the
above
definition
r3 =
P1,2
@ r2)
rrj1,3[rlr2]((rl
V; and
‘Y
(3)
r3)
V2~V20rVl~V2andVz~V1.
The
key
the
difference
definition
in
that
a hyperedge
only
the
subsets
hyperedge)
are
{r,,
rz
r5.
tion
trees
for
query
for
Q4
in
can
That
Item
be
is,
for
Q;
–
can
above
and
some
((rl.rz).
definition
Q!
of
as
the
rI
‘~
2.3
6 As
and
relations
been
in
SQL
in
uj:
=
relation
774
,[rl](rl
occurring
renamed
to
more
have
than
unique
association
trees
association
trees
((rl.rz).((rA.(7’s.
?’s)),
rl
‘&
a
rl ‘%
query
we
assume
expression
Vz)
G Ev(3(V~,
V,’)
(r2
r2r3](rl
‘N
(r2
‘v
r3))
(6)
P:,3AP:,3
+r3)=
, [rl,
((r2
LTji Jrl,
6 E
rs)=
VI ~ v~Av2
n.](rl
‘M
(r2
‘2
7-s))
(7)
P;,3AP; ,3
rg)p~rq)=
W
have
names.
usc H l~=aveS(T,) h denote the induced sub-bpewaph
where
E’
=
(leave. (T. ),E’),
{(VI, VZ)IVI ~ lea~es(T,) A Vz ~
)A((Vl,
(5)
7-3))
that
6 We
leaves(T.
d,3f%3
M
(r2
UJ: Jrl,
n ‘X
in
(7-2 ‘i’
etc.
literature,
once
‘Y
associa-
valid
optimization
(4)
rs)
two
hz
[BHAR95a].
other
r2)’&
P;, SAP; ,3
M
7-9) =
(r2
u;;
4Definition
‘N
so that
allows
(r~.((rz.r~).(r~.rs))),
rs =
uj1,3[rlr2)r3]((rl
the
valid
((~A.rs).rs)),
(r~.((rz.r~).(rs.rs))),
the
hyperedge
either
“’34?2’3
that
to
be broken-up
with
P1,2
@ r2)
states
way
before
example,
Q4
which
such
(rI
and
(corresponding
combined
the
3,
in
be combinedj
For
query
Q4
are:
is
broken-up
need
combined.
r~})
relation
be
of hypernodes
subsets
or
[BHAR95a]
can
original
({rz},
between
~ vi))}.
52
r4](rl
‘%
((r2
‘k3
rs)
‘2
r4))
(8)
where
are
o
E {M, +,
not
associat
T2) ‘h3
an operator
For
ives
and
r3),
that
predicate,
then
the
the
the
~3))
root.
rl
(rz
grand
the
the
root.
no
By
expression
‘x
complex
?’,)
operator
expression
appears
Thus
either
in
into
‘w
1 A
p~edicate
p can
query
Q
contaznzng
be transformed
complex
or a child
into
pTedicate
or a grand
chzld
u
((b
%
applying
can
break-up
Identity
the
3, the
complex
((?’4
generate
Q;
a~,,. [rlTz](Q2).
pression
‘h’
child
The
of
above
= ~j,,,[rl,
expression
Q~ (Q~)
can
Q4 W applying u;,,
generate
Q;
different
and
Q;
For
nition
be made
[7’17’2]
to expression
Q;
is identical
G&,, (Qi,4)
tity 3.
- (8)),
we
ments,
and
then
be
[T1T2])
their
assigning
plex
‘$
broken-up
(Q~)
and
to ex-
at
assign
ilar
operators
to the
trees
Q4 according
operators
refer
apply
unused
consists
to the internal
method
please
operator,
approach
to
Section
generalized
ing the preserved
[BHAR95a]
with
the
select ion.
relations
for
(or,
This
steps
(for
); and
novel
out
using
the
MGOJ
operator
T2}
((TZ
‘%
the
tree
approach.
top
~
In
of expression
to obtain
7’3)))
~
((r2
~
b
T4)
r3)))
to the
original
this equivalence,
we
GAL192a,
GAL192b]
a~, , , [rl,
r2](Q4)
The
expression
equivalence
Q4 follows
query
query
apply
the
to trans-
where
Q~,4.
tree
and
(h)
presz
requires
in
similar
that
for
of
from Iden-
query
presl(h)
= VI and
=
(Vi,
does
and
=
(h)
hi-directed
into
(~,
E
an as-
sets presl
Let
H
Hz
rj
predi-
is also
preserved
as follows.
pTesz(h)
h, then
(h),
contain
Q‘
hypergraph
E1)
argua com-
edge
not
of query
Q, and
h disconnect
HI
above
a~, [~i c presl
Q’
tree
are determined
ponents
the
Q contains
a hi-directed
to query
query
to
if query
A pz with
the association
sociation
For
and
– a)
those
sim-
to
detail,
b)
expressions
predicate
the
conf?ict
belongs
two
com-
E2)’,
then
= V2.
hl
then
to
concepts
to
conf(ho
cannot
9 From
selection,
be
[BHAR95a]
described
53
1 in
hypergraph
expressions
in
of closest
[BHAR95a].
the
employ
edge
ho, the
[BHAR95a],
can
into
conj?ictzng
operator
be
of
the
inner,
(hi)
connected
reorder
push
by
the
(5)-(8)),
outeT joins
if hyperedge
hl
corresponding
operator
(full)
set of closest
every
two
not
Identities
Intuitively,
that
H
we do
(eg.
a descendant
trees
Lemma
disconnects
which
root
), then
only 10 . For a join
det ermin-
in
the
set [BHAR95a].
in expression
compensation
the generalized
pl
(h)] (Q’ ), where
10 These
8 With
‘~
association
our
h4](r5
original
presz
interior
by a method
in [BHAR95a]
4 in
predicates
6 H}
to Defi-
to the
of two
nodes
presented
for
of
at
TZ]((TI
the
Q is equivalent
we require
our
h
hz is {Tl,
(T1
~4](7’5
expression
be shown
predicate
complex
general,
a)
to
query
hyperedge
the root.
We
for expressions
nodes.
In
r3)))
the arguments
it can
cat e pl,
and
to expression
association
for
by
(rl
equivalent
expressions
trees
example,
u~,,~ [rlrz]
equivalent
(~~,,
generating
association
3.2)
can
T3))
expression
equivalent
by
equivalently,
‘h’
any
‘$/
step
is equivalent
form
the
queTy
in expression
expressions
Thus,
hyperedge
T4)~Go~[~lT2,
order
to show
in [BHAR95a,
either
with
in
Q4 ~ In
identities
Q;
complex
((1)
predicate.
T5)
its preserved
hyperedge:
... ~
expression
ikfGOJ[~lr2,
complex
szngle
identity
predicate
equivalent
for
of the Toot.
appropriate
complex
H,
of the
r =
contains
Formally,
the
an equivalent
p2,.l~p2,5
Tz)
to
the
it,
as
Following
Now,
set for
generate
a~,,g [? ’1, TZ] is applied
b),
(TI
expression
grand
p appears
hyperedge
by
is a path
T2, P4,5](T5
complex
the
a
join
left”
constructIntuitively,
expression:
appears
the
or
step
an
employing
which
child
the
there
We
results
we conclude:
Lemma
in which
with
by
joins.
--$ in hypergraph
preserved
Q4.
(T1 .((~z.rq).(Ts.T’3,)))
((r.
the given
in
outer
preserved
“to
/
‘4)~Go~[Tl
‘2’442’5
by
{T
example,
in query
that
predicate
case,
=
computed
outer
are
relations
predicate,
Q4
containing
this
For
of (full)
that
hyperedge
p?’es(h)
example,
operator
ancestor.
the
and
set
relations
set are the
are
sets of (full)
one complex
complex
(T I
root
root
relations
pTeserved
a directed
the
the
For
only
ancestor
this
we transform
equivalent
root.
the
identity.
of
preserved
the
those
in which
at
of
3.2 contains
In
ing
complex
expression
either
transform
push
The
with
transforming
child
of the
in [BHAR95a],
predicate
a single
of
appears
which
(7’1 ‘3
is specified
containing
is
next.
a preserved
an equivalent
can
#
as described
root.
consists
or
identities
On the other hand,
in expressions
such
P1,3:P1,3
r3), due to the lack of associativity,
to
an
root,
the
there
in
not
predicate
predicate
appropriate
we
at the
Q
?3)
expression
‘~
the
‘!%
application
equivalent
into
(rz
application
[BHAR95a],
results
T1 ‘x
into
and
requires
we can
(eg.
Q4 in Example
predicate,
‘g
four
complex
child
apply
r~)
last
predicate
the
before
the
approach
complex
expression
Note,
expression
our
with
+}.
is not
expression
the
or
the
a general
given
+,
for
outer
ho
joins
conj%cting
directed
hyperedge
components.
employing
MGO
J
outer
]oms,
{hl
denoted
Rk%R,
there
M.~Ri~RJ
can
two
in
the
Definition
outer
a set
H
these
The
hypergraph
by Conf(ho
Next,
=
all
outer
into
join
conj?act
of join
The
edges
as follows:
T.))
. . . if ho is hi-directed
4
Note
to
Rk +$ Rl
{h/R&2j~~
if ho is undirected
{h}
~ ~ if ho is undirected
is a join
addition,
preserved
or a left/right
we
set:
slightly
for
following
and
outer
modify
in H]
ccoj(ho
into
the
following
p?’e$h, (h)
or
preserved
By
identity
/$~H
By
pushing
~
and
definition
directed
Theorem
hl },
p~es(h),
and
●
●
Let
the
pre$cute
Q’
that
main
a single
(V, E).
IV2 I >1,
such
the
for
result
1 Let the hypergrnph
contamzng
complex
T
of
be the
from
for
then
edge
-
If conf(h)
complex
applying
for
(Vl,
assoczatzon
pl
tree
@ Q’.
query
P1 A PZ
coTTespondzng
V2),
whe~e
tree
fOT
for
to
= {hi,
/V1 I >
1 or
hypergraph
assoctatzon
Similar
equivalent
H;
expression
T
ated
tTee
pres2(h)](Q’).
In
Q = u~, ~res(h)](Q’).
then:
— If conf(h)
= {hi,
then
Q = Ujl bres~(h),...
preserved
from
the
sets and
original
the
either
root
equivalent
last
expression
expressions:
in
of its dependent
be
generated
trees
that
from
are
predicate
gener-
PI
or P2
or Q:.
which
one
of some
predicate(s)),
ing
can
complex
Q;
application
independent
example,
‘~
r4)),
of complex
complex
then:
,pTes~(h~)](Q’).
conflict
above
expression
to either
expressions
in which
Q = u~l O(Q’).
. . . . hn},
expressions
Note,
the
(r3
= #,
the
equivalent
the
to
new
complex
other
our approach
predicate
first,
predicate
predicate(s)
consists
followed
by
re(de-
of breaking
the
break-
predicate(s).
p?’es(h)](Q’)
then:
If conf(h)
Q;.
by breaking-up
For
e If o =W
we generate
example,
following
are equivalent
~., pr’es~(hn),
(3),
be
Then:
. . . . h~},
For
the
p4,5 A P4,6
predicate
identity
the
quires
= @, then
If conf(h)
the
which
Q = Q1 “$’
predicate
Q = a~l ~resl(h),
Q = ~&b~es~(h),
once
in
root:
section.
pendent
Note,
expressions
at the
hl
then:
[~ =-+
only
application
(4):
generates
of this
hyperedge
expression
predzcate
If
-
the
predicates
P1; and viceQ5 can be transformed
the
~
complex
be h =
be an
If@ =+
–
two
PZ = p4,5 A p4, G.
hyperedge
by h away
includes
we state
=
contains
‘~
= {h}
expressions.
H
(Tz
) = @
ccoj(ho)
nopathr~...
=
{
Q2
which
?’,))
does not require
appear
broken-up
pl,2&p1,3
(rl
equivalent
predicates
to
= pI, Z A pl,3 and
Pi
man-
predicates.
to be equivalent.
=
‘&f
consists
set:
{rl
Finally,
(T5
PI
predicate
complex
join.
the
a hi-directed
h, the set of relations
is the
and
“’5~4’e
approach
correspond
Q5
our
any
in a top-down
be shown
consider
how
containing
the complex
that
can
explain
U co:j(h)
[
In
is a path
trees
predicates
that
Our
expressions
of predicate
P2, before applying
versa.
Therefore,
expression
~Rk&RlisapathinH)
. . . if ho is directed
{h~R%%l?j~
~
(T4
we
expressions
breaking-up
example,
‘~
to
predicates.
predicates
complex
examples,
given
expressions
For
set for a hyper-
the
recursively
complex
=
where
of complex
ner and
exactly
of
be extended
of traversing
hyperedge.
), is defined
means
can
number
join
join
by
results
Note,
conflicting
then
conflicting
edge,ho,denoted
Ccoj(ho)
H’ disconnects
closest
3.3
set
if removing
hypergraphs,
same
conj(ho)
closest
Also,
, hn from
ho, hl,
have
one
Ccoj(ho).
connected
is the
isapathin~}
be at most
hyperedge
edges
Ccoj(ho),
sets are computed
hypergraph.
.54
consider
where
complex
=
F’l = pl,z
predicate
predicate
predicate
Q6
P2.
PI:
T1 “’2AP”4
Ap~,4
and
P1 requires
In
this
case,
(r2
‘2’3$P2’4
P2 = pz,~ Apz,Q,
the
application
we first
break
and
then
break
complex
predicate
dependencies
P2:
isting
ordering
and
are generated,
enumeration
and
employ
view,
join
gated
to
expression
up either
trees
complex
one of the
above
that
are
predicate
PI
generated
by
or P2 are
equivalent
columns.
may
to
and
benefit
data
reorder
than
operator,
point
queries
two
join
of
outer
that
contain
relations
or aggre-
operation
is used
other
applications,
re-
propose
generalized
hierarchies,
to
applications
“object”
extended
our
we
novel
the
now
the outer
“object”
from
to
more
child
hierarchical
cations
breaking-
Since
parent
this,
powerful
to ex-
possible
an implementation
is similar
We can
all
For
very
From
referencing
traverse
like
Note,
yet
operator
operator.
modifications
generate
operators.
selection.
this
predicates
assign
a simple
generalized
slight
algorithms
data
relational
dbms
applisystems
findings.
six expressions.
References
4
Generation
In
this
ation
section,
process
imum
we
Our
then
the
the root
join
lent
more
expression
hypergraph.
Then,
combined
specified
satisfied.
to
technique
found
cost
with
in
of MGOJ
only
one
in,
presents
Note
that
selection
it
operator
operator
in
the
dy-
[BHAR95d]
[BHAR95a]
G.,
Algorithm
Bhargava,
[GAL192a]
[GAL192b]
the
cost
of
similar
to
ployed
for
taining
tions.
exhaustively
joins,
(full)
We employ
to capture
all
computed
only
reordering
outer
the
the
joins}
query
inter
once
paper
for
that
of SQL
and
or GOJ
op-
hypergraph
operator
dependencies
the
query.
given
queries
P. and
joins,”
Joins
87-99,
1995.
Iyer,
B.,
“No
of
and
“Simpli-
pp. 63-75,
CASCON,
P. and Iyer,
joins
To appear
and
B.,
“Efficient
aggregate
func-
of Data
in proceedings
Engi-
1996.
Galindo-Legarial
C., and Rosenthal,
a conventional
[GUPT95]
C. A.,
queries,”
Ph.D.
Science,
Harvard
of outer
of Applied
Ganski,
55
“Algebraic
optimiza-
dissertation,
University,
SQL
“Outer
joins
pp. 348-358,
R. A., and Wong,
H. K. T.,
Queries
as dis-
1994.
“Optimiza-
Revisited”,
Proc.
pp. 23-33, May 1987.
A.,
proach
1995.
C. A.,
SIGMOD,
of nested
Gupta,
of
1992.
Galindo-Legaria,
that
are
join
“How
to handle
proceedings
tion
“Generalized
these
join,”
Galindo-Legaria,
SIGMOD,
con-
outer
A.,
optimizer
pp. 402-409, 1992.
Engineering,
in order
Once
pp.
with
of outer
tion
aggrega-
model
B.,
Iyer,
Enumeration
G., Goel,
Cambridge,
can be em-
Groupby
and
the
Bhargava,
Dept.
[GANS87]
in this
P.
for
processing
Data
[GAL194]
are presented
queries
1995, pp.
Queries
Goel,
one- and two-sided
Conclusion
results
“Hyper-
join
SIGMOD,
CASCON,
G.,
to extend
in [GAL192a].
New
SQL
of outer
junctions,”
5
Goel,
in
Joins,”
neering,
of such
The
Bhargava,
tions”,
enumeration
is very
B.,
op-
considers
operator.
selection
Iyer,
of outer
1995.
assignments
the
so that
and outer
TR03. 567, July
P. and
predicates,”
Regression
fication
of aggrega-
of operator
be extended
[BHAR95C]
enumeration
the enumeration
Details
are
details
Goel,
reordenngs
complex
Outer
condi-
trees
say,
G.,
Projections
are
RDBMS
programming
[BHAR95b]
leaf
subtrees
of existing
Report
“Reorder-
to
eg.
for as-
the
Techwcal
B.,
joins
304-315.
the given
if all
involving
based
with
trees.
create
two
included
GAL192al
generalized
is to
Bhargava,
graph
equiva-
of association
is easily
joins,
to
possible
up from
step,
tree,
dynamic
generalized
cost
erator
step
approach
[BHAR95b].
has
of the
of the
first
[BHAR95a]
pushed
section,
algorithm
bottom
and also explains
along
technique
the
check
the
all
P. and Iyer,
queries
1994.
columns,
and
IBM
joins,”
pred-
the association
definition
[BHAR95a,
extensions
are
in the
This
timizers.
tions
generate
a larger
programming
relation
in the last
subsequent
to obtain
a column
or the jom
G., Goel,
a)
presented
an enumeration
The
-
Bhargava,
ing of complex
min-
method
is broken-up
by generating
at each
the
steps
generate
base
presented
trees, constructed
trees.
namic
two
b) then
trees
query
tions
by the
[BHAR94]
enumer-
with
of two
predicate(s)
than
It is easy to envision
sociation
root
in join
and
schedule
consists
predicate(s)
3,1;
of the
of aggregations
by the method
Example
schedule
outline
the
to the
is referenced
references
an
selects
if any
icate
optimal
approach
aggregations
in [BHAR95b];
that
the
present
that
cost.
push
of
Harinarayan,
Projections:
to aggregation”,
V.
and
A
Quass,
powerful
To appear
in
D.,
ap-
VLDB,
[MURA92]
Muralilcrishna,
rithms
couver,
British
Pirahesh,
VLDB
H., Hellerstein,
[ROSE90]
Ca., June
graphs,
reorderable
Algo-
Queries”,
Pro-
pp 91-102,
Van-
Canada,
1992.
J. M. and Hasan,
rewrite
SIGMOD,
pp.
W.,
optimiza39-48,
San
1992.
A.
Rosenthal,
“Query
Con.,
based query
in Starburst,”
Diego,
Unnesting
SQL
Columbia,
“Extensible/rule
tion
“Improved
Aggregate
of 18th
ceedings
[PIRA92]
M.,
for Join
and
Galindo-Legaria,
implementing
outer
joins,”
C.,
trees, and freelySIGMOD,
pp.
291-
299, 1990.
[SEL179]
Selinger,
D. D.,
P. G., Astrahan,
Lorie,
path
selection
ment
system,”
M. M., Chamberlain,
R. A. and Price,
in a relational
SIGMOD,
T. G.,
database
pp. 23-34,
“Access
manage-
1979.
56