Testing Satisfaction of Functional Dependencies

Testing Satisfaction of Functional Dependencies
PETER HONEYMAN
Bell Laboratortes, Murray Htll. New Jersey
Abstract D e t e r m m m g whether a single relation satisfies a set of funcnonal dependencies is a straightforward task However, determining whether a set of relations sattsfies a set of functional dependencies is a
more d~fficult problem Even the meaning of th~s notion of "satisfaction" needs to be settled Several
definitions for sat~sfactton are considered, one of which is determmed to be most sound This definition
requires that one can construct a single relatton that satisfies the dependencies while containing all of the
information m the set of relations A polynomial-time algorithm as then developed to test satisfacnon
using this definition
Categories and Subject Descriptors H 2 1 [Database Management] Logical Design
General T e r m s Algorithms, Theory
Additional Key Words and Phrases Relational database, computational complexity, congruence closure
1. Introduction
Semantic constraints play an important role in the design and implementation of
relational database systems. Important among these constraints ts the functional
dependency. Informally, a functional dependency is a statement to the effect that the
values associated wtth one set of attributes uniquely determine the values associated
wtth another attribute set.
When the database consists of a single relation, determining whether functional
dependencies are satisfied lS straightforward. If many relations are present, testmg
satisfaction presents certain problems, among these deciding what we mean by
satisfaction. Even with a suitable definmon in hand, an efficient algorithm for testing
satisfaction remains to be found.
The rest o f this paper is organized as follows. After defining the necessary terms,
several definttions for sansfaction are considered. We argue that one of them is
intuitively most sound, comciding with the others in sensible places A polynomialtime algorithm for testing satisfaction is then developed.
2. Definitions and Background
In what follows, individual attributes are represented by A, B, C, ... whde sets o f
attributes have names selected from ... X, Y, Z. In representing a set of attributes,
c o m m a s and set braces are elided, so that, for example, {A~, A._,. . . . .
An} is written
This work was supported m part by the National Science Foundation under Grant MCS 79-04528 at
Princeton Umverslty
Author's address Bell Laboratories, Murray Hdl, NJ 07974
Permission to copy w~thout fee all or part o f thin material ~s granted provtded that the cop~es are not made
or dtstnbuted for direct commercml advantage, the A C M copyright nottce and the t~tle of the pubhcat~on
and ~ts date appear, and notice is gtven that copying is by permission of the Association for C o m p u t m g
Machinery To copy otherwise, or to repubhsh, requires a fee a n d / o r specific permission
© 1982 A C M 0004-5411/82/0700-0668 $00 75
Journal of the Associationfor ComputingMachinery,Vol 29, No 3. July 1982.pp 668-677
Testing Satisfaction of Functional Dependencies
669
A 1A2 • • • An. The union of sets o f attributes is also represented by concatenation; for
example, X U Y U Z is written XYZ.
The cardinality of a set S is denoted I S I- Where necessary to express complexity
results, II S II denotes the length (in bits) of the representation o f S. Thus IIs II is (within
a constant factor) the number o f characters needed to write down S.
2.1 RELATIONS. The basis of the definitions that follow is the notion of an
attribute. An attribute is like a data type; it acts as a classification device for a set of
values. A relation scheme is a finite set of attributes. A relation scheme describes the
format of a particular relation; it can be thought of as the data structure for a relation,
or as a compound data type.
A database is a pair o f finite sets (R, r), where r is a set of relations and R is a set
of relation schemes describing r. R is called the database scheme. The structure of
each relation in r is described by exactly one m e m b e r o f R, and vice versa, mutatis
mutandis. Thus R and r are in 1-1 correspondence.
Associated with each attribute A is a domain of values, dom(A). IfA1Az . . . A , is
the scheme for relation r, then r can be viewed as a finite subset of dom(Al) x
dom(A2) x . . . × dom(A,). Formally, a relation over scheme R is a finite set o f tuples
over scheme R, where a tuple over scheme AIA~ . . . A , is a set # of mappings from
A, into dora(A,), 1 <_ i <_ n. The individual mappings are denoted/t[A1], ~[A2] . . . . .
#[A,], where #[A,]:A, ~ dom(A,). (In this way, the need to order the elements in a
tuple is obviated.) For tuple # over scheme X, the restriction of # onto attributes
Y__C_X i s {/~[A]IA E Y}, denoted #[Y].
A relation can also be viewed as a table with columns labeled by attributes. The
rows of the table consist of the individual tuples of the relation. The order of the
columns is unimportant, so long as the correspondence between column entries and
column headings is maintained.
In this paper we use two operations on relations: projection and (natural) join. Let
r be a relation with scheme R, and let S ___ R. The projection of r onto S, denoted
r[S], is {#[S] I# E r}. Intuitively, r[S] is the relation consisting of all tuples in r with
those columns removed that do not correspond to attributes in S. However, since
projection must yield a relation, and thus a set, we must remove any duplicate tuples
that might appear.
Let r and s be relations over schemes R and S, respectively. The (natural)join of
relations r and s, denoted r t~ s, is the set of all tuples over relation scheme RS whose
projections onto R and S are members of r and s, respectively. Since join is an
associative and commutative operation, no ambiguity can arise from extending the
domain to any finite set of relations. For database (R, r), I~ denotes t~r~rr, that is,
I~ is the join of all the relations in the database.
Let (R, r) be a database, R = (Ra, R2 . . . . . Rn}, r = {rl, rz, . . . , rn}. The universe
for R is U = UTzl R,. A relation I over the universe is called a universal instance if
I[R,] = r~, 1 <_ i <_ n. Thus a universal instance for (R, r) is a relation that, when
projected onto R, yield§ exactly the set r. r might have m a n y universal relations, or
one, or none at all.
2.2 DEPENDENCIES. Certain semantic constraints m a y be known to hold among
data, for example, no person should have two social security numbers. This type of
relationship between attributes is called a functional dependency. In general, Y is
functionally dependent on X (orfuncttonally determined by X), written X ~ Y, if in
any relation r with scheme R containing XY, if r contains tuples/~ and ~ for which
/t[X] ts equal to ~[X], then/x[Y] must be equal to ~[Y]. Here r satisfies X--* Y. We
670
PETER HONEYMAN
extend this defimtion to a set of dependencies in the obvious way. In Section 3 we
consider what it means for a set of relations to satisfy a set o f dependencies.
If X Y is contained in R, then X ~ Y is embedded in R. Extending this definition,
a set o f dependencies Fis embedded in a set o f relation schemes R if each dependency
in F is embedded in some R ~ R.
For the most part, this paper deals solely with functional dependencies. Therefore,
unless otherwise specified, the terms "dependency" and "determine" are read
"functional dependency" and "functionally determine," respectively.
A set of dependencies F logically implies another dependency f if every relation
that satisfies F also sausfies f Given a set F of dependencies, there may be other
dependencies logically implied by F. For example, A~A2 --~ Al is implied by the
empty set of dependencies, while X --~ Y and Y --~ Z imply X -~ Z. The set of all
dependencies logically implied by F is called the closure of F, denoted F ÷.
If F and G are sets of dependencies w~th identical closures, then F covers G (and
vice versa). A database scheme R preserves F if some cover of F is embedded m R.
Equivalently, R preserves F if F is covered by the dependencies of F + that are
embedded in R.
Observe that the dependency X - ~ A~A2 . . . A , covers the set {X--~ A~, X--~ A2,
. . . . X---~ A , ) . Thus there is no loss of generality in assuming that the right side of a
dependency is a single attribute.
2.3 DECOMPOSITIONS. Representing data as a universal instance can lead to
certain anomalous behaviors when updates occur [3, 4] so the universe is composed
into a set of relation schemes. An ~mportant property of a decomposition is that
information in a umversal instance be reconstructible from the resulting relations
without loss of information.
Given a database scheme R and a relation I over its umverse, the mapping defined
by projecting I onto the schemes of R and joining the resulting relations is called the
project-join map for R, denoted m R .
Let SAT(F) be the set of all relations over universe U that satisfy the dependencies
in F. Then R is a lossless decomposition of U if each I E SAT(F) is equal to ma(l).
Observe that If F is empty, every relation over U ts in SAT(F) and no nontnvial
decomposition is lossless [1]. However, a nonempty set of dependencies can imply
losslessness. We now review the theory of lossless decompositions.
Let R be a database scheme over universe U. A tableau for R is a table with one
column for each attribute, and one row for each R in R. TR is a tableau for R with
entry in row i and column j as follows:
tv =
a
bu
if A~ E R,,
otherwise.
TR is called the tableau for the database scheme.
The b u are called nondistmguished variables, and a is called a distinguished variable.
We apply the following transformations to TR.
For each dependency X--* Y in F there is a corresponding transformation rule: If
TR contains two rows with identical X values, then for all columns labeled by
elements of Y in these two rows, set their entries to be equal according to the
following rule. If one value is a distinguished variable, then replace the other entry
by this one. Otherwise, arbitrarily replace one value by the other.
The transformation rules are applied to TR until no more variables can be equated.
The resulting tableau is called CHASEF(TR). 1
The term CHASE, from [8 I, reflects the process of " c h a s m s down" the ~mphcauons of dependenctes
Testing Satisfaction of Functional Dependencies
A
B
B
671
C
rl
r2
FIGURE I
A
B
B
r~ t~ r2
FIGURE 2
C
F1o 3. F = { B ~ C }
FIG 4 F = ( A ~ B } .
PROPOSITION 1 [1]. R is a lossless decomposition if and only zf cHASEF(TR) contams
a row of distinguished variables.
3. Defining and Testing Satisfaction
In Section 2 we defined satisfaction in the context of a single relation and a set of
dependencies. Extending the definition to a set of relations is problematical; to
illustrate, consider the database of Figure 1 with F = {B ~ C). Because embedded
dependency B ----> C is violated by r2, any reasonable definition o f satisfaction must
exclude this case. This leads us to the following definition.
Definition of SATt. <R, r) satisfies F if for each r,, the dependencies of F +
embedded in R, are satisfied.
With this definition, the database of Figure 1 will indeed be excluded. Consider,
though, this database when F = {A --~ C}. By the definitaon of satisfaction just
proposed, the database satisfies F. However, F is violated by r~ t~ r2 (see Figure 2);
perhaps satisfaction should be defined to exclude this case as well.
Definition ofsATz.
(R, r) satisfies F i f I~ satisfies F.
The two defimtions are incomparable; the database of Figure 1 with F -- {A ~ C)
satisfies SAT1 but not SAT2, while the database and dependency set of Figure 3 satisfy
SAT2 but not SATI. However, if dependencies are preserved, then SAT1 implies SATe/
Consider, now, the database and dependency set of Figure 4. Here both definitions
are satisfied. Yet, we insist that any definition of satisfaction that admits this case is
unsound; the value for B is not uniquely identified when the value for A is 0.
For Figures 3 and 4, SAT2is satisfied because the join is empty. This points out the
weakness of SAT2" in Figure 3, B ~ C is satisfied even though it is directly violated
by r2. Furthermore, unless the decomposition is lossless, I~ may exhibit associations
among the data that correspond to no real-world facts.
The next proposal is an attempt to deal properly with such aberrant cases. Any
database with an empty join cannot be the projection of a umversal instance,
suggesting a third definition of satisfaction.
Definition of SAT3. (R, r) satifies F if r is the projection onto R of a universal
instance in SAT(F).
2 As an asnde, tt is precisely th~s nmphcat~on that shows the value of dependency preservation
672
PETER HONEYMAN
SAT3implies SAT1but not vice versa. SAT2is implied by SAT3when dependencies are
preserved and not otherwise, in general. However, the requirement that the database
have a universal instance is an artifact having little to do with dependency satisfaction.
Nevertheless, a suitable definition depends on relations over the universe. By
weakening the requirement that the database be the projection of a universal instance,
we capture the essence of the candidates above.
Definition. A relation I over U is a containing instance for (R, r) if the projection
of I onto R~ contains r,, for each i.
Definition of Satisfaction. (R, r) satisfies F if there exists a containing instance
for (R, r) in SAT(F).
Henceforth, when we speak of satisfaction, we use this definition. In words, a set
of relations satisfies a set of dependencies if all the tuples in the database can coexist
in a single relation that satisfies the dependencies. Equivalently, a database satisfies
a set of dependencies if by adding tuples to the relations, it is possible to construct a
database that meets the criterion of SATa.
Observe that multivalued dependencies [6, 11] and join dependencies [9] are always
satisfied under this definition.
The relationship of SATISFACTION to SATb SAT2, and SAT3 is as follows.
(1) SATISFACTIONalways implies SATe.The converse is not true, even if dependencies
are preserved, as illustrated by Figure 4. This problem is considered in [10].
(2) SATISFACTION and SAT2are incomparable, in general. If R is a lossless decomposition, then SATISFACTIONimplies SAT2; I~ must be a subset of any containing
instance. If, in addition, the database has a universal instance, then the definitions
are equivalent. From the observation relating SAT~ and SATz, we see that if R is
dependency preserving, then SATISFACTIONimplies SAT2.
(3) SATISFACTION is implied by SAT3. If the database has a universal instance, then
the definitions are identical.
In each case, the definitions coincide at the sensible places; the definition that relies
on embedded dependencies is always implied; when dependencies are independently
preserved, in the sense of [10], the definitions are equivalent. The definition that
relies on It~ is implied when the decomposition maintains the dependency information or when the join makes sense, that is, when dependencies are preserved or when
the decomposition is lossless. Finally, the definition that relies on a satisfying
universal instance is equivalent if a universal instance happens to exist.
Although complexity issues do not play a role in deciding what we mean when we
say that a database satisfies a set of dependencies, each of the tentative definitions
above has serious complexity problems. Since SAT1relies on testing satisfaction of the
embedded dependencies of F +, we need a way to construct this set; this can be done
in polynomial time iff P -- NP [2]. To test SAT2, we must construct Its, which can
have exponential size. Finally, testing SAT3 is probably intractable, since testing for
the existence of a universal instance is NP-complete [7]. Fortunately, SATISFACTION
does not suffer these problems.
3.1 AN ALGORITHMFOR TESTING SATISFACTION. According to the definition of
satisfaction, a relation I over U must be found that yields a superset of the relations,
when projected onto the database scheme. Without the requirement that this relation
satisfies the dependencies, we can construct I as follows.
Let # be a tuple in relation r,. Let tuple ~ in I have ~[R,] = #[R,] and arbitrary
values for ~[ U - R,]. If the arbitrary values are chosen so that they are all distinct,
Testmg Satisfaction of Functional Dependenctes
673
the resulting mstance bears a striking resemblance to TR, the tableau for R, except
here each relation is represented once for each tuple m the corresponding relation.
This motivates the following definition.
Definition. Let (R, r) be a database, R = (R1, R2
Rn), r = (rt, r2, . . . , rn}.
Let k, be the number of tuples in r,, and let k = ~,=1
" k ,. Number each tuple in r from
1 to k. A tableau for r xs a relanon Tr wxth k tuples over U. For each tuple ~t, in r
there is a tuple ~ m Tr for which
. . . . .
~[A~] = {~',~AJ]
if this mapping is defined,
otherwise,
where b u does not appear in any of the domains of U.
As in the definition of TR, the b,j are called nondistingulshed variables and the
other entries distmguished variables. We call Tr the tableau for the database, to
distinguish it from TR, the tableau for the database scheme.
Let ! be the set of all containing instances for (R, r). Tr can be viewed as a
representation of I in the following sense.
Let D be the union of the domains of U, and let I be a relation in I. Then there is
an assignment gl: (by E Tr} ~ D such that for each tuple/t in Tr, there Is a tuple
in I for which gl applied to the nondistinguished variables m/L yields ~. Extending
this notion of assignment from the nondistinguished variables to tuples, and then to
Tr, say that gl(Tr) C_ I. Each I m I can be represented by such a mapping. Observe
that the mapping ~s not unique; we let gl stand for one fixed mapping.
Inferences can be made about the possible values of the nondistinguished variables
by applying the dependencies to Tr, as in [1], as follows.
Assume the given dependencies have a single attribute on the right side. For each
X ~ A in F, if there are tuples/t and ~ in T for which/~[X] = ~[X] and/I[A ] is a
nondistinguished variable, then replace/~[A ] by ~[A ]. The transformations are applied
to Tr until no more transformations can be made. Observe that no transformation
permits the equating of distinguished variables. The resulting table is called
CHASEF(Tr).
We claim that (R, r) satisfies F exactly when CHASEF(T~) does.
3.2 CORRECTNESS
THEOREM 1.
C~aS~F(Tr) satisfies F if and only if (R, r) satisfies F.
PROOF. For the only part, ifCHASEF(T~) satisfies F, then it is a containing instance
in SAT(F).
For the if part, suppose (R, r) satisfies F. Then there is an instance I that
demonstrates this fact and a mapping gl for which gl(Tr) C_ I. If we can show that
gI(CHASEF(Tr)) C 1, then the proof is complete. Let Tr = T °, T 1. . . . . T n = CHASEF(Tr)
be the sequence T of tableaux produced along the way to producing CHASEF(Tr).The
proof is completed by showing that for each T ~ in T, g~(T k) C I.
Basis.
By definition, gt(T °) = gl(Tr) C I.
Induction. Suppose that gl(T ~) C_I. Let the next transformation be the application
of dependency X--~ A to tuples/~ and ~ in T k. There are two cases to consider:
(l) #[A] and ~[A] are both nondistinguished variables. Assume that the result of the
transformation is to replace all occurrences of b u by b,.j, in T k. Since I satisfies F,
g; must map both b u and b,,/in T 1~to the same value in I. Thus gi(T TM)_C I.
674
P L I E R HONEYMAN
(2) /x[Al is an item a from dora(A) while ~[A] is b~j. Since I satisfies F, gl must map
by to a. Thus gi(T k+l) C_ L
In either case, gi(T k+l) _ I. Thus, for each T k in the sequence T, g1(T k) C_ I, which
was to be proved. []
3.3. AN EXAMPLE
Example I. To see that equating nondistinguished variables is necessary, consider
the database and dependencies of Figure 5. The tableau for the database is shown in
Figure 6. No transformations can equate a nondistinguished variable with a distinguished one in Tr. Applying the rule for dependency A ~ C, we replace b33 by b13,
and b43 by b23. Similarly, for B ---) C, we replace b63 by bla, and b73 by b2a. (See Figure
7.) Now, the dependency CD ---) E results in the replacement of b65 by 0, and b75 by
0. Similarly, CD --->F replaces ba6 by 0 and b26 by 0, resulting in the tableau of Figure
8. At this point the first and second rows violate the dependency EF---> A. Hence, no
instance that satisfies F can yield a superset of relations rl - r4, that is, the database
violates the dependencies. []
3.4. COMPLEXITYANALYSIS. Computing C H A S E F ( T r ) c a n be treated as a case of
computing the congruence closure of a directed graph. To do so, the transformation
rules are redefined to allow equating of distinguished variables. Now (R, r) saUsfies
F if distinguished variables are never equated in the computation of CHASEF(Tr).
Let G be a directed graph, and let C be any equvialence relation on the vertices of
G. Let the successors of each vertex be ordered. The congruence closure C* of C is
the finest equivalence relation containing C such that any two vertices having
corresponding successors equivalent under C* are themselves equivalent under C*.
Assume, without loss of generality, that the right side of each dependency m F
consists of a single attribute. Assume as well that dom(A,) tq dora(A j) = O whenever
i ~j.3
Construct graph G = (V, E) as follows. Let t v be the entry in Tr in the row for
#, and the column for Aj. Let V be the symbols of Tr plus one additional symbol c,p
for each scheme R, and each dependencyfp:X~, ---> Bp and one symbol dp for each
dependencyfp. We call the vertices derived from T, T-vertices.
Let Xp be (Apl, Ap2 . . . . , Ap,p }. For each c~p, E contains the edges
((c,p, t~,l), (c~p, t~Ap2). . . . , (C~p, t~Ap,,,),(Czp, dp) },
ordered in increasing order as shown.
The equivalence relation C is defined c,p =- t,Bp for 1 --< i _< I R I, 1 __.p _< IF 1.
The purpose of the d, vertices is to ensure that variables that are equivalent in C*
appear in the same column in Tr.
Example 2. Let (R, r) and F b e the database and dependencies shown in Figure
9. T~ is shown in Figure 10. The graph for Tr is shown in Figure 1 1. The equivalence
relation C is cH = 0A, CZ~--= b2~, cza --- b3t, c~2 -= baa, czz - lc, c23 -= 0c. []
C* can be computed by iteratively coarsening C, with equivalence classes merged
whenever the corresponding successors of two vertices are found to be equivalent.
This produces a sequence of equivalence relations C = (C = C °, C a. . . . .
C a = C*}. The correspondence between C* and C H A S E F ( T r ) is proved in the next
three lemmas.
J This a s s u m p u o n ensures that all elements m T~ are mmally distract We can arrange this by s u b s c n p t m g
each m e m b e r o f
with t. The subscripts can later be removed
dora(A,)
675
Testing Satisfaction o f Functional Dependencies
ADE
`4B
rl
r2
CEF
BDF
r3
FIG 5
A
B
C
r4
F = {`4 --> C, B ---* C, CD ~ E, CD ~ F, EF--->.4 }.
D
E
F
0 bl2 bla 0
0 b]6
1 b22 b2a 2
0 b26
0
3 b3a ba4 ba5 ba6
1 4 b4a b~ b45 b46
b51 bs2 0 b54 0
0
b6] 3 b~ 0 b65 0
b71 4 bla 2 b75 0
A
B
!
b]2
b22
3
4
bl3 0
0 b16[
b23 2
0 b26I
bxa b~ ba5 b361
b23 b44 b45 b46]
{b61
3
4
bta
b2a
IbT]
FIGURE 6
A
B
C
D
E
OR
o
0
2
E
F~
o il
b~
b75
F
FmURE 8
A
B
C
bzl
0n
In
lc
0c
~
_
D
FIGURE 7
0 bl2 bla 0
0
0
1 b22 b23 2
0
0
0
3 bla ba4 ba5 baa
1 4 b23 b44 b45 b46
bsx b52 0 b54 0
0
b61 3 bla 0
0
0
b71 4 b2a 2
0
0
FIG 9.
C
(B~A,B~C)
bz~
b3t
FmURE 10
Os
te
b~3
te
Oc
(
FIGURE 11
LEMMA 1. L e t E be an equivalence class o f C*. Then all T-vertwes in E appear in
the s a m e c o l u m n o f Tr.
PROOF.
T h e p r o o f is b y i n d u c t i o n o n the s e q u e n c e o f e q u i v a l e n c e classes C.
Basis. Let u a n d v be T-vertices in the s a m e e q u i v a l e n c e class o f C o -- C. By the
c o n s t r u c t i o n o f C, u a n d v m u s t be e q u a l in Tr.
Induction.
S u p p o s e a n y T-vertices e q u i v a l e n t i n C k a p p e a r i n t h e s a m e c o l u m n
o f Tr. Let u a n d v be T-vertices m a d e e q u i v a l e n t m C k+l. Let the successors o f u
676
PETER HONEYMAN
(and thus v) in C h be equivalence classes et, e2. . . . . era. By the construction of C,
e m = (alp), the vertex corresponding to dependencyfp. Thus u and v are in the same
column in Tr; u and v are m the column corresponding to the right-hand side of
dependencyfp. By the induction hypothesis, all T-vertices equwalent to u in C k come
from the same column of Tr; the same holds for v. Therefore, merging the equivalence
classes containing u and v preserves the reduction hypothesis. []
LEMMA 2.
T-vertices equivalent in C* are equated in CHASEF(Tr).
PROOF. Suppose u and v are equivalent in C*. The proof that u and v are equated
in CHASEF(Tr) is by reduction on the sequence of equivalence classes C.
Basts.
u and v are equivalent in C O = C, so u and v are the same symbol.
Induction. Suppose that any variables equivalent in C ~ are equated in CHASEF(Tr)
and that variables u and v are made equivalent in C TM. By L e m m a 1, the T-vertices
of each equivalence class of C k appear in the same column of T~. Since u and v are
made equivalent m C k+~, the T-verttces in the union of their equivalence classes also
appear m the same column of Tr. Let the successors of u (and thus v) be equwalence
classes eh e2. . . . . e,~, dp. (Again, the last successor d, is always a singleton set
corresponding to dependencyfp.) Since equwalence classes are never refined, each e,
must contain a vertex from the row in Tr in which u appears and a vertex from the
row in which v appears. By the induction hypothesis, these vertices are equated m
CHASEF(Tr). Thus the rows in CHASEF(Tr) corresponding to u and v have identical
values in the columns for the attributes on the left side offp. Consequently, the entries
for u and v must be equal in CHASEF(Tr). This completes the induction step and the
proof. []
LEMMA 3.
Entrtes equated in CHASEF(Tr) are equivalent in C*.
PROOF. Suppose u and v are equated in CHASEF(Tr). The proof that u and v are
equivalent in C* is by reduction on the sequence of tableaux T = {T~ = T °, T t,
. . . , T n = CHASEF(Tr)) described in the proof of T h e o r e m I.
Basts.
u and v are equal in T o = T~, so they are equivalent in C*.
Induction. Suppose any variables equal in T k are eqmvalent in C*. Let u and v
be equated m T k+l because of a transfolmation using dependency A 1A2 • • • Am-'~ B.
Each A, corresponds to a subscripted c vertex. The successors of u (and v) are
equivalence classes containing these vertices. The entries in T k for the At columns in
the rows for u and v must be equal. By the induction hypothesis, the equivalence
classes containing u and v have equivalent successors in C*. Thus u and v must be
equivalent in C*. This completes the induction step and the proof. []
THEOREM 2. (R, r) saUsfies F if and only If no equtvalence class of C* contams
more than one dtstinguished variable.
PROOF. Follows from L e m m a s 2 and 3.
[]
The computation of CHASEF(T~) IS formulated as an instance of congruence closure
so that we may take advantage of the algorithms o f [5]; the complexity result given
below uses their algorithm with the most optimistic time complexity at the expense
o f some addztional space usage.
PROPOSITION 2 [5].
C* can be computed in ttme O ( m log m), where m =
THEOREM 3. Let k be the number o f rows in Tr, and let q =
satisfactton can be determined m ttme O ( k q log kq).
II FII.
I EI.
Dependency
Testing Satisfaction of Functional Dependencies
677
PROOF. To estimate the number of edges in G, observe that for each dependency
f , k subscripted c vertices are constructed. Each such vertex has IIf, U edges associated
with it. Thus the total number of edges is kq and the stated time bound follows~ []
Since q will usually be insignificant in comparison to k, the complexity can be
considered to be O(k log k).
4. Summary
The definition of satisfaction proposed appears to be an intuitively sound one. In
particular, if F is preserved by R, satisfaction according to the definition given is
equivalent to asserting that each relation satisfies its embedded dependencies in the
same way. Furthermore, if R is a lossless decomposition, the definition implies that
I~ ~ SAT(F). Not incidentally, the definition allows the existence of a rapid test for
satisfaction.
ACKNOWLEDGMENTS. I thank J. D. Ullman, A. V. Aho, M. Yannakakis, and an
anonymous referee for their suggestions.
REFERENCES
1 AHO, A.V, BEERI,C , AND ULLMAN,J.D. The theory of joins in relational databases Trans, Database
Syst. 4, 3 (Sept 1979), 297-314
2 BEERI,C., AND HONEYMAN,P Preserving functional dependencies SIAM J Comput. 10 (Aug. 1981),
647-656
3 CObb, E.F. A relational model of data for large shared data banks Commun. A C M 13, 6 (June
1968), 377-387
4. CODD, E.F. Further normalization of the data base relational model In Data Base Systems, R
Rusttn, Ed., PrenUce Hall, Englewood Cliffs, N.J, 1972, pp 33-64
5. DOWNEY, P.J., SETHI, R , AND TARJAN, R E Variations on the common subexpression problem J
ACM 27, 4 (Oct. 1980), 758-771
6. FAGIN, R. Multwalued dependencies and a new normal form for relational databases. Trans.
Database Syst 2, 3 (Sept 1977), 262-278.
7 HONEYMAN,P., LADNER,R E, AND YANNAKAKIS,M Testing the universal instance assumption Inf.
Proc. Lett 20, 1 (1980), 14-19
8. MAIER, D., MENDELZON, A O., AND SAGIV, Y Testing implications of data dependencies Trans.
Database Syst. 4, 4 (Dec 1979), 455-469
9. RISSANEN,J Theory of relations for databases-A tutorial survey Proc 7th Syrup. on Mathematical
Foundatzons of Computer Science, Lecture Notes in Computer Science 64, G GoDs and J Hartmanis,
Eds., Springer-Verlag, Berlin, 1978, pp 536-551
10 SAGIV,Y Can we use the universal instance assumption without null values9 Proc. ACM-SIGMOD
Int. Conf on Management of Data, Ann Arbor, Mlch, 1981, pp 108-120
1!. ZAN1OLO, C Analysis and design of relational schemata for database systems Tech. Pep UCLAENG-7669, Dep. of Computer Science, Univ. of Cahfornla, Los Angeles, Calif., 1976.
RECEIVED MAY 1980; REVISED MARCH 1981, ACCEPTED APRIL 1981
Journalof the Assocmt~onfor CompmmgMachinery,Voi 29, No 3, iuiy 1982

Download Report

Testing Satisfaction of Functional Dependencies

Paperzz.com

Your Paperzz