Database Decomposition into Fourth Normal Form

DATABASE DECOMPOSITION INTO FOURTH NORMAL FORM
GGsta Grahne and Kari-Jouko
University
of Helsinki,
Tukholmankatu
Department
R;iihB:
of Computer
2, SF-00250 Helsinki
We present
an algorithm
that decomAbstract.
poses a database scheme when the dependency set
contains
functional
and multivalued
dependencies.
The schemes in the resulting
decomposition
are in
fourth normal form and have a lossless
join.
Our
algorithm
does not impose restrictions
on the
allowed set of dependencies,
and it never requires the computation
of the full
closure
of the
dependency set. Furthermore,
the algorithm
works
in polynomial
time
for classes of dependencies
that properly
contain
conflict-free
dependency
sets.
25, Finland
database
scheme.
A good decomposition
properties
(1) Minimal
represent
instance,
knowing
The structure
defined
of a relational
relation
a set of dependencies
database
scheme U and by
D. Relations
that
scheme may contain
much redun-
dant information
and have undesirable
update
It
is therefore
nent
conform
to the universal
[DatBl].
cover
customary
to
scheme into subschemes
k
such that U = 'J R.. The collection
Rl,-..,s
i=l '
the database scheme. Uni$,.
. . ,I$> is called
jections
relations
onto
can be represented
the relation
This work was supported
as their
it
relations,
Separation:
it
relation
should
to ensure
after
be possible
the universal
dependencies
exists
forms.
that
relation
without
actually
relation.
This
i.e.
imply
rise
is
the decomposition
if
the pro-
the original
a wide variety
normal
(3NF)
if
preserving,
types,
set D.
of dependency
to different
sets
of
The two most common dependency
are functional
functional
should
join.
each one giving
types
to re-
from the compo-
the decomposition
is satisfied
dependencies.
dependencies
The normal
dependencies
forms
and multivalued
associated
with
are -~third normal form
normal form (BCNF)
~-
[Cod721 and Boyce-Codd
[Cod74],
is defined
186
be possible
the update
There
Finland.
should
in the universal
jected
by the Academy of
be
forms.
still
condition
taken
when a (projected)
is dependency
schemes in the
is usually
in
schemes should
the dependencies
hold
so that,
without
the attributes
relation
i.e.
constructing
pro-
all
normal
the universal
updated,
decompose the universal
versal
for
scheme. This
have a lossless
(3)
objects
the relation
(2) Representation:
can be
schemes
can be updated
the values
in one of various
by a universal
properties
meaningful
they
the universal
INTRODUCTION
have several
the relation
redundancy:
should
for
should
[BBG78]:
to mean that
1.
Science
whereas
for
fourth normal form (4NF) [Fag771
-~multivalued
dependencies.
The
requirements
3NF to
for
4NF,
these
i.e.
normal
forms
4NF implies
increase
BCNF, which
2.
from
PREVIOUS WORK
implies
There
3NF.
is
lossless
it
Unfortunately,
find
a decomposition
properties.
In
schemes
(or
or
decomposition
sial
(see
the
rise
well
does
exist
[BBG78]).
universal
relation
is,
each
tended
of
the
goals.
that
satisfy
the
will
be of
sition
In
this
a problem
the
paper
that
problem
of
is
{R
a lossless
to
given
assess
how
a relation
little
use
it
if
the
decompo-
such
that
and each
Ri
is
on the
desired
polynomial
time
Finding
more
is
in
testing
intractable.
functional
dependencies
circumvented
regardless
of
whether
in
not.
The same idea
[TsF80].
Their
ity
of
than
lossless
give
polynomial
than
We use
time
an algorithm
approach
does
not
general-
and
the
complex-
that
for
notation
a broader
polynomial
4b’F is
is
more
class
and
before
it
open.
of
would
works
dependency
closure
closure
algorithm
187
X-Y.
Lien
has
set
may not
if
shown
multivalued
of
in
to
is
a suf-
4NF.)
of
finding
Fagin
a
[Fag771
closure
scan
D+
Then
through
for
the
some nontrivthe
size
of
[U1182b],
of
this
lLie81.1
then
that
set
approach.
that
is
smaller
correctly.
depend-
algorithm.
dependencies
any
work
BCNF or
set
that
efficiency
algorithm,
D+:
decomposed
to
Unfortunately,
of
and
in
for
D+ can be exponential
the
a
dependencies,
decomposition
when
ruining
decomposition
the
the
only
is
the
some Ri.
just
the
is
dependency
searching
a precomputed
[Ull82b].
the
set
Moreover,
of
in
compute
and Fischer
dependencies
be sufficient
ial
thereby
depend-
(If
becomes
for
a nontrivial
Ri
condition
then
running
precomputed
algorithms.
and terminology
to
if
actually
nontrivial
dependency
proposed
efficient
and that
of
compu-
a sufficient
multivalued
and necessary
nontrivial
given
into
absence
also
Tsou
can be applied
only
The problem
working
by Tsou
includes
ficient
into
recently
algorithms,
existing
the
decomposition
decomposition
general
the
was presented
dependencies,
an algorithm
existing
encies
but
was
encies
heavily
is
a
[BeB79].
algorithm
then
and 4NF also.
probwhether
by testing
words,
it
of
applica-
NP-complete
not
a
and
efficiency
[TsF80],
Ri,
not
4NF.
testing
problem
in
X is
repeated
their
BCNF condition,
R l”..,\
O#XCRi,
in
that
In
other
pro-
that
4NF property
this
D’ whose
losslessness
BCNF is
the
tationally
In
3NF a simple
a lossless
time
to multivalued
in
scheme
that
by XY fl Ri
the
fur-
means
(i.e.
several,
known
found
For
algorithm
has
is
is
depends
in
schemes
it
one.
normal
X-WY
replaced
of
U has
Suppose
This
and the
a set
dependency
problem
form.
difficult,
ize
We shall
the
synthesis
polynomial
Fischer
of
normal
;BDB79].
BCSF is
in
guarantees
dependencies
The complexity
is
algorithm
necessary
a set
form.
in
This
First,
sufficient
solvable:
U into
4NF.
such
Ri
a
schemes
and Rl=U).
nontrivial
Then
produces
have
on
is
finding
Suppose
relation
in
[Fag77,ABU79],
Therefore
attention
a scheme
join
relation
into
properties,
to be always
4-J
if
of
our
the
lems.
expensive.
focus
known
Even
scheme
some set
of
This
in-
Ri
4NF.
of
and O#YflRi#Ri-X)
key
tion
its
into
by a set
onto
YDX=8,
R..
1
XU (Ri-Y).
validity
enough.
prohibitively
1”’
join
jection
a
controver-
meets
not
decomposing
subschemes
have
is
we will
is
of
have
has
decomposition
replaced
way of
k=l
Rl* . . . ,Rk (initially
ther
that
some R. is not
1
there
exists
a dependency
BCNF
do when
the
properties
decompose
algorithm
is
been
[AtP82,U1182a]).
this
subschemes
to
assumption
three
to
into
what
important
possible
method
not
(cf.
But
do not
Especially
course,
of
relation
that
of
much debate
It
exist
decomposition
The question
to
desirable
dependencies)
e.g.
to
possible
these
there
preserving
good
always
all
particular,
4NF [BeB79].
of
not
having
rather
a dependency
is
is
a straightforward
set
is
if
used
must
used,
such
in
the
be
the
Therefore
the nontrivial
dependency
must be found differently.
be to avoid
pendencies
deed,
Ri,
computing
there
D+ and to derive
using
of
the decomposition
tree
use an arbitrary
ther.
neither
both
but
tree
this
This
indicates
dependency
dependency
that
ible.
is,
That
may be only
instead
may
the subscheme XYflRi
dependencies.
XYflRi
nodes.
pendencies,
into
time.
an arbitrary
X' +Y'
they
throw
This
is possible,
X+Y
dependency
test
whether
still
holds
some nontrivial
in XYflRi;
out the attributes
since
dependencies
in Ri itself
that
(and not just
way the "smallest"
Ri is finally
in
obtained,
-X'Y'.
ate the normal
also
In this
X+Y
holds
that
3.
holds
dependencies.
If
we find
in XYllR;,
it
self;
X' --HY'
i.e.
multivalued
some nontrivial
does not necessarily
this
is a
dependencies,
open question.
[daS80]
produces
schemes that
however,
dependencies
are in
there
in D+ that
may
viol-
form conditions.
that
hold
plies
in Ri it-
schemes,
188
dependencies.
are present
multivalued
X-Y),
algorithm
may be embedded in XYllRi.
the sake of brevity,
the dependency
also
them into
X' +Y'
for
multivalued
pendencies
does not work for
de-
works
dependencies)
algorithm
into
time,
AN INFORMAL DESCRIPTION OF THE METHOD
the sequel
can be used to
only
approach
de-
conflict-free
[Scidl]
of multivalued
derived
We shall,
in
decompose Ri.
This
properties.
a lossless
of func-
X'+Y'
and it
exist
performing
Although
as D is concerned;
still
in XYnRi).
nontrivial
produces
method
decomposition
4NF as far
for
as an intrigueging
so,
all
but none of these
so-called
da Silva's
deis
not be added to D.
case (unrestricted
remains
lossless
de-
if
(XYflR;)
the properties
imply
Then
for
Sciore's
subclass
Similarly,
it
Therefore
the desired
the same restriction.
still
to
4NF schemes in polynomial
works
sets.
the general
with
X+Y.
into
practical
de-
all
a
in the extended
efficiently,
[Lie821
to find
is dynamic:
some algorithms
achieves
only
under
tree
start
exist
the
is sufficient
tree.
in D+ will
algorithm
it
it
process
by the decomposition
the two
during
we wish
the dependencies
set D. This
pendency
of D
add to D members of
X-Y,
but
can be found
Tsou and Fischer
nontrivial
repeatedly
a linear
In the case of functional
such a "smallest"
in polynomial
would
we will
dependency
composition
is in 4NF, and the decompo-
would degenerate
O(llJl>
not
This
Instead,
nontrivial
Lien's
if
D+, but we do not
whenever
the decomposition
as poss-
should
computing
process
the dependencies
a
efficient,
is in some sense between
decomposition
There
is essen-
on the basis
D+ in such a way that
only
open by Tsou and
D+.
We avoid
algorithms
find
are tested
to be a dif-
problem
the decomposition
set D, either.
guided
of an arbitrary
one should
is in a sense as "small"
that
tree
tional
tree
appears
this
use the fixed
pendency
nontrivial
was left
solving
computing
examine
the new sub-
the difference
any nontrivial
guarantee
they
fur-
than the decom-
X++Y,
making
extremes.
we
for
attributes
that
nontrivial
pendency
If
and in the case of a full binary
IUI -1 decomposition
steps.
means 2
IUl levels,
tial
for
X++Y
and it
Our solution
in 4NF.
Thus the decomposition
have
sition
step
problem,
However,
that
X-Y
dependency
the dependencies
of the new schemes
In each decomposition
posed scheme Ri,
with
dependency
is necessarily
schemes must have less
contain
the problem
of them must be decomposed
one attribute.
in
the smallest
Fischer.
without
may grow too large.
and XU(Ri-Y))
Therefore
with
nontrivial
Ri,
dependencies
[HIT791.
But then we are faced
decomposing
ficult
the de-
one of them can be found effi-
a result
efficiently
multivalued
would
set D. In-
are any nontrivial
then at least
(XYflRi
approach
on demand from the initial
if
ciently
Another
Finding
X-Y
set D consists
If
functional
we can either
to produce
with
convert
im-
Tsou and Fischer's
a lossless
apply
of
de-
ones (by the axiom X+Y
or start
and then
assume in
set of BCNF
our algorithm
to this
set.
a
Our goal
is to maintain
K, such that
sides
dependency
at least
whenever
in a candidate
is,
given
D = {A-G
whose left-hand
B-HH
I BHCE,
dependency
each left-hand
side
GH*C
1 E 1 AB}.
D by listing
on the right-hand
If we let
may not be in 4NF. For instance,
side
split.
Again,
posed using
schemes
left-hand
in the decompo-
then
It
ABCEGH
sides
>HCE
for
B/
which
is obtained
a,,,
by using
and B+H,
the scheme ABCE is not in 4NF: the deAB++C
of the original
hold
dependencies
in it.
in D contradicts
order,
into
the result
may in fact
4NF schemes.
other
example
no matter
However,
We start
gives
of dependencies
The problem
is a union
is used in
XflY=@).
GH is split.
decomposition
tree
is applied,
the left-hand
side
The corresponding
dependencies
not be directly
though
is that
with
the given
applied
the subschemes
nontrivial
quires
it
when A++G
in the subschemes,
may contain
dependencies,
we will
such a derivation
position
be applied
left-hand
when searching
We will
in Section
some necessary
by DEP(X))
return
5.
can-
X++Y
al-
will
re-
is defined
of some sets
The dependency
basis
if
always
exists
and
[Bee80,Ga182].
X-Y
XrR).
is nontrivial
otherwise
For brevity,
X is nontrivial
in R for
X is said
and only
XsX(YflR)sR;
in R (if
say that
if
in DEP(X) (provided
efficiently
set R if
is nontrivial
otherwise
Let
as a partition
X+Y
a dependency
is trivial
sometimes
X*Y
definitions.
scheme and D =
D implies
can be computed
one step
with
As explained
of
lation
the decom-
rithm.
although
only
be a relation
We say that
step.
For instance,
effect:
for
we
in R if
some Y in DEP(X);
to be trivial
in R.
in our algorithm
in a sense perform
simultaneously
is sufficient
Xn +Yn)
be a set of multivalued
deFor any XgU, the dependency basis of
in an attribute
some derived
whose derivation
the use of GH. A key idea
this
issue
of U-X such that
the decomposition.
is that
A,}
-Y
X (denoted
an-
where 4NF schemes are not reached,
what order
in the decomposition,
dependencies.
with
1'""
1
pendencies.
in some other
[Lie811
P is a
THE DECOMPOSITION ALGORITHM
the
be a decomposition
Lien
decom-
is added to K is XU (PnY).
the desired
the nontrivial
4.
{x
are applied
and if
in K need to be considered
U={Al,...,
the dependencies
by adding
Yet none
4NF property.
If
using
AH is
A-HG
pendencies
and AB-HE
side
X-WY,
is split
that
to the correctness
the dependencies
use
AB.
a dependency
can be shown that
achieving
By combining
R is the scheme being
that
the
directly
the situation
side
the element
GH,
1 E 1 B in ABCHE.
we can fix
side
A*G.
we-can
ABCHE is decomposed
if
AH. That
using
requires
added left-hand
In general,
tree
AB'
if
the newly
of
K equal
then the resulting
AH-C
B-crH,
side
derived
G. This
steps
to K the left-hand
the
is
of the dependency
two derivation
Similarly,
Here U=ABCEGH and
of D.)
LHS(D) permanently,
sition
these
the set of depend-
1 AGCE,
X-HY
derive
the dependencies
the original
basis
to K the left-hand
a dependency
application
side
by LHS(D).
in [Lie81,p.611.
(We have augmented
by adding
if
then we must first
is
the set of left-hand
consider
As an example,
encies
is a nontrivial
scheme R, then there
K equals
in D, denoted
entire
effect
one such dependency
is in K. Initially
sides
a set of left-hand
there
is essential
We say that
P if
the following
(i)
@#YllP#P,
(ii)
for
in Section
3, the splitting
to the decomposition
X-Y
splits
conditions
an attribute
are satisfied:
GH++C 1 E 1 AB cannot
in ABHCE, we can achieve
the same
189
and
some QEDEP(P),
BfYnQSY-P.
realgoset
Y
Pictorially
the splitting
situation
The following
is the
algorithm
following.
figure
could
illustrates
work when it
ABCEGH and D=(AaG
how the
is applied
1 BCEH, B+H
to U =
1 ACEG,
GH++C tE 1 AB).
X=A, L={G,BCEH)
V++W = A+BCEH splits
* K:=(A,B,GH)
U(g)
ABCEGH
,/
A slightly
sometimes
stronger
referred
is discussed
to as the split
in e.g.
are interested
could
form of condition
we require
in the splitting
also
the existence
algorithm
(ii),
key property,
(ii),
of a suitable
which
if
P
B/
The
without
condition
way it
/\
ABC
The underlined
that
is more efficient.
to present
allow
malized
Input:
A relation
join,
particular
scheme.
X *Yn)
n
choice
such that
A decomposition
n={Rl,...,Rm}
of U such
m
U Ri=U,
the decomposition
has a lossless
i=l
and each Ri is in 4NF.
5.
rectly
(decomposition
while
there
loop)
exist
that
X+Y
REn,
XEK
is nontrivial
--do begin
let L=(YIYEDEP(X)
{prepare
for
there
by using
exist
VEK,
splits
and WE
we will
the fact
P -and V(PnW)
-do K:=%{V(PnW)};
TI:= (n-iR}>U
U {X(YflR));
YEL
e&;
0
proof.
If
sides,
cor-
S-r-rT is
to the final
that
further
that
that
de-
S is necessarily
Ri belongs
S is
the properties
R. could
1
S, contra-
to the final
i.e.
is necessary
of splittings.
The first
In both
S++T
if
added to K it
two essential
proof.
more, we assume that
@K
de-
of the algorithm.
Rc_U and that
in XYllR}
claim
works
using
correctness
Vc_XYflRsVW
is trivial
and V*W
PEK
the algorithm
an indirect
two lemmas prove
YEL,
a different
Therefore
to study
sides}
any
of
A different
in some Ri belonging
To prove
X: add
process.
would yield
added to K in the algorithm.
output
using
fix
the elements
have been decomposed
dicting
and YnR#@);
DEP(V) such that
{V-W
such
in R
the decomposition
to K new left
while
and YEDEP(X)
does not
considering
show that
composition,
{initialization)
the members
CORRECTNESSOF THE DECOMPOSITION ALGORITHM
nontrivial
n :=(U);
all
tree.
We will
Method:
K:=LHS(D);
the algorithm
for
of the X-sets
composition
facts
in it.
that
order
the essential
to decompose the unnoreven though
K in the decomposition
scheme U and a set of de-
D= {X -HY
1
1'""
16ibn.
YiEDEP(Xi),
that
of a relation
indicate
ABCE further,
Note also
Decomposition
pendencies
output:
parts
L=IC,E)
ABE
the algorithm
of LHS(D) are trivial
the decomposition
algorithm.
Algorithm:
X=AB,
ABCE
\
guarantees
Q in DEP(P).
would work correctly
We are now ready
Here we
of P only
GH
X=B, L={H,ACEG}
V++W=B*ACEG
splits
GH and AH
19 K:=(A,B,GH,AH)
U
(BG@
/\
decompose XY. Therefore
condition
but in this
(i),
[Lie82,BFM81,BeK83].
be used to further
LcEH
facts
used in the
lemmas we assume that
is nontrivial
S is minimal
S' is nontrivial
in R. Furtheramong such left
in R then
ISI s IS’).
(decomposition]
It
LHS(D).
190
is useful
to consider
We say that
the pair
how S is
related
to
({Sl,...,Sk},{Sk+l,...,Sk+t1)
of S if
partition
is a representation
the following
conditions
are satisfied:
(ii)
(iii)
(iv)
Furthermore,
we say that
Intuitively,
k'+t'.
({Sl,...,Sk],{Sk+l,...,
is a minimal
cover
P+Q
representation
of S using
subject
to the condition
LHS(D),
then X is covered
sets
that
if
for
X always
exists,
k
and @#QfI (( fI Ti)-S)
i=l
are com-
but
it
@#Qn((
need
for
not be unique.
({Sl,...,Sk],{Sk+l,...,Sk+t))
and that
From the minimality
of S it
lsibk,
in R; let
is trivial
that
M(S) =
follows
k+ts2.
that
each Si,
TiEDEP(Si)
tion
XiEU,
of attribute
convenience,
sets
we define
for
{Xl,...,Xn}
U Xi=0
and
hand,
M(S) implies
that
ICC1 ,...,k}
iE1
such that
the minimality
U Si is
trivial
cannot
of S and
in R for
I#{l,...,k+t}.
be used to refine
any
Consider
A candidate
basis
Since we just
any collec-
n xi=u.
iE0
iE0
(2)
any Ic{l,...,k}.
of DEP( U Si).
iE1
is { fl T. U-( U Si)-(
fl Ti)}.
1'
iEI
iE1
iE1
have shown that Q would partition
where each
Therefore
# ( fl T;)-S
iE1
the computation
such
RcSiTi.
As a notational
k
# ( fl Ti)-S.
i=l
I-I Ti)-S)
iEI
On the other
In the lemmas we assume that
(1)
= 0
i=l
pletely
contained
in S. If X*Y
is a dependency
+
then a minimal
in D where X is nonredundant,
representation
r-l T;)-S)
p n ((
in LHS(D),
that
for
k
of S
XGS and X E
by sets
rule
in D such that
we have k+t 6
a minimal
refined.
multivalued
dek
pendencies,
we may in this case take I( fl T,)-S,
is1 L
k
k
U-S- n Ti) as the candidate
basis.
Now RE n SiTi
i=l
is1
k
k
k
c n ST. = SU( f-l T;) = SU (( Il Ti)-S);
thus
i=l
i=l
i=l
ki
S++( fl Ti)-S is trivial
in R. Since we assumed
i=l
that S is nontrivial
in R, there must exist
a
is a -minimal representation
of S (denoted
'k+t')
by M(S)) if for any other representation
({Tl,...,Tk,},{Tk,+l,...,Tk,+t,})
be further
By the decomposition
S. ELHS(D), l<i6k+t.
k
and XC_S}.
U S. = U {X/XELHS(D)
1
i=l
O#SiflS#Si,
k+l<i<k+t.
k+t
k
S = ( U Si) U ( U (Sin S)).
i=k+l
i=l
(i)
cannot
fl T..
iEIi
ll Ti,
iE1
Therefore
P-Q
p n ( n TV) + 0.
(3)
iE1
Lemma 1.
There
any Is{l,...
lowing
,k!
exists
a PELHS(D)
such that
and j E{l,...,k]-I,
for
We will
the fol-
claimed
hold:
proper
(i)
(ii)
S. ++Tj splits
( U Si) U (Pll
fl Ti),
3
iE1
iE1
k
k+t
Pn fl Ti E
u (S;llS).
i=k+l
i=l
Proof.
Consider
Beeri's
algorithm
with
tition
used.
lowing
a candidate
that
[Bee80].
basis,
Then the basis
rule:
and T-Y.
{U-S},
is refined
using
X DT =@, then T is
The algorithm
terminates
P has the properties
of il,...,
element
I be an arbitrary
k),
and let
of (1,. ..,kj-I.
( U Si) U (Pn n Ti).
iE1
iE1
splits
V, two conditions
j be an ar-
We will
denote
V the set
To show that
S. ++T.
J
J
isfied:
must be sat-
by
starts
but
any par-
than DEP(S) could
If T is in the partition
subset
bitrary
of DEP(S) using
The algorithm
e.g.
is not finer
is in D such that
TnY
the computation
and
show that
in the lemma. Let
be
the fol-
O#TjnV#V,
(b)
for
We will
and X-++Y
replaced
(a)
time.
by
when the
191
and
some WEDEP(V),
deal
with
the four
O#Tj
flW#Tj-V.
inequalities
one at a
lo 0#T,
nv.
If
that’0fPn
IU{j)#{l,...,k+t),
k+t},
= il,...,
hE I.
By the minimality
0; and because
result
(P"
then t=O, ka2,
and I#@;
of S and M(S),
S cRcS.T.,
h- 33
Sh"Tj
AeP
(using
We have shown that
let
Part
(ii)
(1)).
part
follows
(i)
of the claim
P"((
: T.)-S)=P"((
j,=l '
holds.
directly
from (1):
: T;)-((
i=l
: S.)U( kit (Sins))))
i=l l
i=k+l
Sh-Sj #
#O. The
Thus Ae
" Ti))=V.
iE1
( U Si)U(P"(
iE1
" Ti),
iE1
the result.
If IU{j)
n
iEIlJ(i1
immediately
gives
which
(
implies
(3) implies
Ti)=Tj"
since
follows.
= P"((
2O Tj "V#V.
Suppose to the contrary
that
" T;E
VGT..
In particular,
P "
3
iE1
k
thus
" Ti = " T..
T . . Let H= {l ,...,k)-(j);
3
iEH1
i=l
k
But (1) implies
that P" ( " Ti) =@, while (3)
i=l
implies
that P" ( " Ti) #a: a contradiction.
iEH
= V, i.e.
3' For some W in DEP(V),
suitable
W, we will
Tj "W#O.
start
To find
P"(
Lemma 2. Let V be an attribute
k
U SiGVsS.
Then there exist
i=l
LHS(D) such that V++W splits
a
Proof.
from the dependency
Consider
Since
((
" Ti)-P)
"P=!?J,
date basis
trivial
" Ti)-P.
iE1
O#Q"
P++Q can be used to
iE1
that
is {W-S,U-W-S).
" Ti)-P)"Q=
( " Ti)-P.
Thus V-H((
iE1
iE1
since Q E
( n Ti)"Q-(P"Q)
=( " Ti)"Q,
iE1
iE1
DEP(P). We will
show that condition
(b) holds
refine
a P*Q
(W-S)#W-S.
lo P"W#O.
If
WEDEP(V)
and PC
P and P"W=S.
Let W be the unique
the computation
V++W splits
such that
set
R=VW.
in R and since
must exist
set
of S, V++W is trivial
any WEDEP(V).
in DEP(V) such that
v= ( u si) u (p n ( n TV)) --w
iE1
iE1
" T;))=(
iE1
By the minimality
in R for
those attributes
u si++
n Ti. By shifting
iE1
iE1
" Ti that belong to P to the left,
we get
of
iE1
( " T$-(P"(
iE1
I! T.)-(
ki'
(S."S)))
=O, we have
i=l l
i=k+l
'
k
k+t
U (S;"S).
0
" Ti+
i=l
i=k+l
Tj "V
of DEP(S).
Since
S+T
REVWGSW=S
in D such that
Thus clearly
A candiis non-
U(W-S),
there
P" (W-S) =d and
P"WzS;
we claim
P.
P"W=O,
the computation
P*Q
could
be used in
of DEP(V) to refine
V, a con-
tradiction.
then it obviously
holds for
( " Ti)"Q;
iEI
some W in DEP(V) such that Ws( " Ti) "Q.
iE1
follows
The first
part,
Tj," ( " Ti) "Q#O,
iE1
immediately
from (2).
for
2O P"W#P.
PES.
implies
A6
U Si,
iE1
and AE(
P"W=P,
we would have PEW and
of M(S), P E
k
U{XlXELHS(D)
and XcS] = U S.cV,
contraill
ldieting
the fact that V"W=O.
4' Tj " ( " Ti) "Q#Tj-V.
From (2) we know that
$5 I
k
which implies
Q" (( " Ti)-S) # ( " Ti)-S,
i=l
i=l
k
k
Let AE ( " Ti)-S-Q.
Since
( n Ti)-S-Q#0.
i=l
i=l
establish
A@Q, AgTj " ( " Ti) "Q. We will
iE1
the claim by showing that AETj-V.
Since A E
k
have AET..
Moreover,
" Ti, we immediately
3
i=l
k
AgS
If
By the definition
3O Q"W#d.
Obvious.
4O Q"W#W-P.
Since
Q" (W-S) #W-S, we have W-S-Q
# 0. Let AEW-S-Q;
AEW,
but A@!P because
AEW-P.
Theorem 1.
beginning
" Ti)-S
i=l
192
then A@Q"W.
Moreover,
P" (W-S) =O,
implying
0
Consider
of the outer
minimal
set such that
Ri for
some TEDEP(S).
n=(Rl,...,R,)
while-loop.
S*T
and K at the
Let
isnontrivial
Then SEK.
S be a
in some
Proof.
Let M(S) =(lSl,...,Skl,iSk+l,...,Sk+t)).
By the initialization
If k+t=l,
of K, SiEK
then clearly
k=l
lOSSleSS join
for
lbibk+t.
and t=O,
sults
and S=S 1
for
Consider
then the case k+t)2.
Leassa 1 are now satisfied.
in LHS(D) whose existence
[Fag77,ABU791.
some Ri belonging
there
The conditions
exists
Ri.
Let P be the set
is implied
loss
I =8 and j =l,
S ++T 1 splits
1
trivial
in Ri.
P. By the minimality
Therefore
ing the decomposition
R using
P, Sl U (PnTl)
was there
gives
that
X-Y.
that
of S, Sl is
S,_CR~=XYDR~S
step
some dependency
splits
Lemma 1 (i)
T dur11
Ri from
produced
Since
Repeating
even though
S.
Besides
rithm
splits
the above argument
cient
that
eral
=
U SiU(Prl
II Ti) is added to K. Straighti=l
i=l
forward inductive
application
of Lemma 1 (i) imk
k
plies
that V= U SiU (Pll n Ti) is added to K
i=l
i=l
during decomposition
using X*Y.
Moreover,
k
k+t
Lemma 1 (ii)
states PI? f3 T:E
u (S, ns); thus
that
such that
there
follows.
exist
V-HW splits
definition
with
the claim
VsS,
P' fiW#8,
of S implies
that
flW) is added to K during
composition
step.
We clearly
VU(P'
flW)gS,
be applied
being
There
exist
pears
difficult
is still
have VSV U(P'llW)rS.
the above argument
added to K.
could
that
By the
RisVW.
sets
But
set
until
If
and Lemma 2 ‘can
o
looking
ates
Proof.
The decomposition
algorithm
be improved.
in K that
Termination
follows
still
There
works
too large:
of U and D and from the fact
dependencies
that
only
are used in the decomposition.
already
condition
(ii)
stricting
the size
193
exist
purpose.
requirements
that
for
a
the algo-
met one such re-
in the definition
for
the purpose
of K. Another
the proof
whenever
in K-LHS(D),
where either
VELHS(D)
The
us
correctly.
Thus we can immediately
nontrivial
there
any useful
additional
was introduced
that
let
4
the set K computed by
do not serve
place
cases,
are two main approaches
First,
splitting
VU(WnP)
from the finiteness
of K by a
in Section
to be added to K, and prove
element
correctly.
ap-
of the algorithm
algorithm
is still
can observe
termin-
where it
at the difficult
be added by examining
Corollary.
examples
how the basic
quirement:
S
exact-
open.
We have actually
we end up with
to determine
K is when compared to LHS(D+).
to bound the size
Thus we could
the same de-
of
is more effi-
Thus the complexity
the algorithm
together
n our algo-
is a subset
the algorithm
can be used.
rithm
inductively,
be decomposed
decomposition
pathological
Before
Lemma 2
which
then VU(P'
could
case we have been unable
polynomial.
and P' ELHS(D)
P' and P' nWcS.
of splitting,
the minimality
If
WEDEP(V)
con-
to the final
than the straightforward
method where the
+
D is computed. Unfortunately,
in the gen-
examine
V=S,
it
the set K, which
ly how much smaller
v_cs.
If
the actual
computes
entire
implies
in
S be minimal
o
LHS(D+). Therefore
Sl U(PflT1).
yields
let
EFFICIENCY OF THE DECOMPOSITION ALGORITHM
it
already)..
S2*T2
S is nontrivial
Ri belongs
further
6.
decomposition
By Theorem 1, SEK,
that
decomposition
using
that
Sl*Tl
was added to K (unless
By Lemma 1 (i),
to the final
sides.
the fact
the resulting
to the contrary
of generality,
among such left
by Lemma 1.
from the re-
To see that
an SSU such that
Without
tradicting
Taking
is obvious
schemes are in 4NF, suppose
E K.
for
in
property
requirement
can
of Theorem 1. We
the proof
the element
VELHS(D)-or
refers
to an
is of the form
PELHS(D).
add the condition
-or PELHS(D)
of
of re-
to the inner
violating
while-loop
its
Another
Suppose then that
for
way to enhance
algorithm
tion
order
that
the algorithm
that
of the order
order
without
correctness.
decomposition
sidered;
of the algorithm
the efficiency
of the
works correctly
in which
the sets
thus we are free
some j #i.
Recall
exists
independently
We will
to choose a convenient
lowing
that
that
containment
so that
the sets
is compatible
order.
are chosen from
with
Let XEK,
X'EK,
both X and X' are nontrivial
n; then X' is not
hand side
chosen before
X$X',
X as the left-
to be used in the decomposition.
same order
is used by Lien
[Lie82],
The
who calls
actly
containment
posing
the motivation
order
is that
if
schemes.
would be contained
using
in alI
be used to decompose several
further.
The following
property
formally.
theorem
case X
to show that
So-called
conflict-free
ciently
this
ginning
that
Consider
the sets
while-loop.
Z is nontrivial
n and K at the beLet ZsREn
such
T En-{RI,
Z&T.
in R. For all
Proof.
By induction
Basis.
n ={U).
side
holds
for
K and IT, which are replaced by K' and n' =
P
(n-(R)) U U {X(YinR>)
during a single
execution
i=l
of the outer while-loop.
We will
show that the
claim
holds
that
for
K' and n'.
Z is nontrivial
Consider
tive
ticular,
Z&R,
1. This proves
En'
Z&T
which
any TEn-(R').
yields
Z&X(YiflR)
ex-
of
IDI
it
is
all
are
algorithms
(e.g.
Lien's
A conflict-free
set is
as a set that
a "key"
(i)
and (ii)
satisfies
one of
such as the or-
or the intersection
requirement
does
means a left-hand
(ii)
property.
is immaterial
from
As long as
(i)
are added to
is satisfied,
no sets
K, and the algorithm
terminates
time.
defines
This
order)
trivially
properly
containing
exist
sets
it
order
that
containing
If
(a refinement
still
cyclic
of dependsets.
cases are,
splittings
of course,
among members
the splitting
is not difficult
guarantees
This
in polynomial
a class
conflict-free
interesting
where there
algorithm.
194
of dependencies
of view of our algorithm.
position
the claim.
of dependencies
properties,
property
is acyclic,
In parfor
[GalSZl.
time
condition
those
By the induc-
for
time
can be decomposed effi-
(where
equivalent
The really
such
that
of LHS(D) (and thus K).
the case R' En.
basis
the point
encies
in R'.
first
hypothesis,
Let ZsR'
linear
in polynomial
in [BFMBl]
Clearly,
the claim
re-
of a dependency
sets
previous
keys
thogonality
on Inl.
Suppose that
can be
K does not grow excessively.
of a dependency),
several
Obvious.
step.
of the algorithm
[Lie81,Lie82]).
characterized
of the outer
Inducti.vg
class
using
algorithm
not split
Theorem 2.
We also
the main operation
classes
possible
the largest
of them
establishes
step
works
For restricted
the sons of R and could
possibly
con-
in K can be
when IKI is bounded by a polynomial
decom-
in at most one of
In the opposite
This
of decomposition
and IUI.
the
X is used for
R, then X' is contained
the resulting
for
S is nontrivial
at most once.
can be done in almost
Thus our algorithm
Zc_X(Yj OR) for
each set
is the computation
which
it
Sup-
o
out efficiently:
quired
a "p-ordering".
Intuitively,
order
the claim.
each basic
carried
that
of S, ScZgX.
the containment
know that
in R; by the
any Ten-(R).
SEK such that
used in the decomposition
in some R E
R' =X(YinR)
By Theorem 1, there
Theorem 2 shows that
the fol-
for
to the claim)
a minimal
and proves
require
Z&T
But then ZsX.
tradicts
i.e.
Z is nontrivial
in R. By the minimality
in K are con-
as we please.
K in an order
Clearly
hypothesis,
pose (contrary
is to choose a decomposi-
keeps both n and K small.
some i.
inductive
R' gn,
relation
to find
the efficiency
holds
a decom-
of the containment
true
splittings
of the
for
dependency
in a
regular
restricted,
manner.
a class
would be technical,
tuitive
interpretation,
A definition
with
t-a
of such
no apparent
so we refrain
E
in-
from giving
A
one.
As mentioned
appears
difficult
bound for
simple
IKI.
to derive
It
that
section
can be combined
complicated
"real"
decomposition
example
clear
illustrated
using
why the inner
other
a group of
simultaneously
this
situation.
1 F, C-ABE
with
w
CF,
is replaced
Again,
fore
they
that
are also
within
the inner
ticular,
since
to further
resulting
does not split
either'C
into
two pieces,
side
that
fore
no sets
ever,
could
it
step,
EF breaks
partition
ABCE-AB-EF=C.
are added to K because
and C-ABE
during
step
does split
so that
trates
decompose
case the new set is immediately
in some scheme. The next
a situation
set only
through
example
where we reach
a chain
of trivial
illus-
it
added trivial
of V. By generalizing
they
involve
two,
it
is
these
ex-
many alternatives
is possible
to create
during
a de-
step.
before,
all
the growth
of K, either
ments into
the algorithm
order,
the sets
The problem
is
that
by adding
new require-
or by refining
left
for
are added
of restricting
further
the destudy.
7.
CONCLUDING REMARKS
a nontrivial
sets.
Let U=
BGH--Ac I E I F I I,
BI-uACGH
(see the fol-
lowing
that
the newly
the
may be many
non-
ABCEFGHI and D = {A-HBCEFGH I I,
1 E I F, CF-wBH 1 AEGI}
as regards
there
where K grows excessively
As stated
composition
trivial
so ABC can be used
show that,
to K are not useful.
ACE and BCE.
In this
In par-
the set ABC is added
CF,
of V, and also
of only
composition
EF. Therefore
use CE++A 1 B I F to further
ABCE into
of V
the same decompo-
C U (ABE nEF) =CE is added to K, and we can in the
next
in ABCGH; therein the role
1 H 1 E 1 F 1 I,
to consider
situations
of AB. How-
BI and
the depend-
loop of the algorithm.
in the role
instead
There-
trivial
AB splits
in the role
amples
does not have a right-hand
C becomes trivial
sition
or EF; although
sets
both
decompose ABCGH.
necessary
in the schemes ABCE and ABF. Now AB
A becomes
I E 1 F 1 I and
AB+CGH
loop of the algorithm,
sets
to decompose U,
any of the
also
A splits
considered
These examples
inner
AB is used first
since
in ABCGH and since
by
by (ABCGH, BEGH,
we add AB and AC to K. Computing
to K. Now ABC-G
Suppose that
but
AC++BEFGH I I are both
diagram.
begins
BGH does not split
sides,
ency bases we find
Here U=ABCEF
decomposition
(ll)
left-hand
trivial
the
1 F, EF++A I BC); U is
by the following
BGH, i.e.
BFGH, BGHI).
algorithm,
based on X. The following
illustrates
and D={AB*CE
I
two
and expanded
in the decomposition
decompositions"
t----------l
Suppose now that
why do we in a sense perform
"trivial
with
splittings.
may not be immediately
loop is required
i.e.
this
upper
C
G
I
are cases where it
a polynomial
We conclude
examples
to produce
there
before,
F
We have given
universal
that
diagram).
relation
scheme into
that
multivalued
works
for
decomposes
a set of schemes
are in 4NF and have a lossless
algorithm
195
an algorithm
join.
any set of functional
dependencies.
The
and
The key idea was to
a
apply
the concept
bute
of splittings
sets
in order
the entire
closure
in contrast
with
which
downright
question
rithm
works
splittings
(e.g.
contain
sets.
improvements
run in polynomial
It
for
Data1
is an open
in the basic
order
time
daS80 A.C. da Silva,
The decomposition
of relations based on relational
dependencies.
Ph.
D. thesis,
Univ. of California,
Los Angeles,
1980.
[Lie821).
that
properly
is
methods,
time
or in the decomposition
algorithm
of
This
in polynomial
dependency
whether
set.
most of the previous
of dependencies
conflict-free
the computation
of the dependency
forbid
The algorithm
classes
to avoid
in relaCod74 E.F.Codd, Recent investigations
tional
database systems. Information
Processing 74, Proc. of IFIP Congress 74, J.L.
Rosenfeld
(ed.),
North-Holland
Publ. Co.,
Amsterdam, 1974, 1017-1021.
among the attri-
algo-
C.J. Date, An Introduction
to Database Systems. Addison-Wesley,
Reading, Mass., 1981.
dependencies
and a
Fag77 R. Fagin, Multivalued
new normal form for relational
databases.
ACM Transactions
on Database Systems 2,3
(Sept. 1977), 534-544.
can make the
in the general
Gal82 Z. Galil,
An almost linear-time
algorithm
for computing a dependency basis in a relational
database.
J. ACM 29,l (Jan. 1982),
96-102.
case.
REFERENCES
HIT79 K. Hagihara,
& T.
M. Ito, K. Taniguchi
Kasami, Decision
problems for multivalued
dependencies
in relational
databases.
SIAM
Journal on Computing 8,2 (May 1979), 247-
ABU79 A.V. Aho, C. Beeri & J.D. Ullman, The
theory of joins in relational
databases.
ACM Transactions
on Database Systems 4,3
(Sept. 1979), 297-314.
264.
AtP82 P. Atzeni & D.S. Parker, Assumptions
in
relational
database theory.
Proc. of the
ACM Symposium on Principles
of Database
Systems, March 1982, l-9.
Lie81
Y.E. Lien, Hierarchical
schemata for relational
databases.
ACM Transactions
on Database Systems 6,l (March 1981), 48-69.
Lie82
BBG78 C. Beeri,
?.A. Bernstein
& N. Goodman, A
sophisticate's
introduction
to database
normalisation
theory.
Proc. of the Fourth
International
Conference on Very Large Data
Bases, Sept. 1978, 113-124.
Y.E. Lien, On the equivalence
of database
models. J. ACM 29,2 (April
1982), 333-362.
Sci81
E. Sciore,
Real-world
MVD's. Proc. ACM
SIGMOD 1981 Conference on Management of
Data,
April
1981, 121-132.
BeB79 C. Beeri & P.A. Bernstein,
Computational
problems related
to the design of normal
form relation
schemes. ACM Transactions
on
Database Systems 4,l (March 1979), 30-59.
TsF80 D.-M. Tsou & P.C. Fischer,
Decomposition
of
a relation
scheme into Boyce-Codd normal
form. Proc. ACM 1980 Annual Conference,
1980, 411-417. Reprinted
in ACM SIGACT News
Bee80 C. Beeri,
On the membership problem for
functional
and multivalued
dependencies
in
relational
databases.
ACM Transactions
on
Database Systems 5,3 (Sept. 1980), 241-259.
U1182a J.D. Ullman, The U.R. strikes
back. Proc.
of the ACM Symposium on Principles
of Database Systems, March 1982, 10-22.
14,3
Ul182b
BFM81 C. Beeri,
R. Fagin, D. Maier & M.
Yannakakis,
On the desirability
of acyclic
database schemes. Research Report RJ3131,
IBM, San Jose, May 1981.
BeK83 C. Beeri & M. Kifer,
Elimination
of intersection
anomalies
from database schemes.
Proc. of the Second ACM Symposium on Principles
of Database Systems, March 1983,
340-351.
BDB79 J. Biskup, U. Dayal & P.A. Bernstein,
Synthesizing
independent
database schemas.
Proc. ACM SIGMOD 1979 Conference on Management of Data, 1979, 143-152.
Cod72 E.F. Codd, Further
normalization
of the
data base relational
model. Data Base Systems, R. Rustin (ed.),
Prentice-Hall,
Englewood Cliffs,
N.J.,
1972, 33-64.
196
(1982),
23-29.
J.D. Ullman, Principles
of Database Systems (Second Edition).
Computer Science
Press, Potomac, Md., 1982.