Chap 4. Design Theory for
Relational Databases
Contents
2
Functional Dependencies
Rules About Functional Dependencies
Design of Relational Database Schemas
Decomposition: The Good, Bad, and Ugly
Third Normal Form
Not in the text
Note: Dependency between Attributes
a movie: title, year, length, genre, studioName, starName, . . .
a movie star: name, address, birthdate, . . .
A given value of (title, year) uniquely determines
a studio: name, address, president, . . .
each value of length, genre, studioName
title
Star Wars
Wayne’s World
year
year
Star Wars
1977
Wayne’s World 1992
length genre
127
95
sciFi
comedy
There are some relationships
between attributes, which affects
the design of tables
3
year
Star Wars
1977
Wayne’s World 1992
1977
1992
title
title
title
year
Star Wars
1977
Star Wars
1977
Star Wars
1977
Wayne’s World 1992
Wayne’s World 1992
length
genre
studioName
124
95
sciFi
comedy
Fox
Paramount
length
genre
starName
124
124
124
95
95
sciFi
sciFi
color
comedy
comedy
Carrie Fisher
Mark Hammil
Harrison Ford
Dana Carvey
Mike Meyers
A given value of (title, year) does not
uniquely determine the value of starName
Functional Dependencies
Functional dependency (FD) on a relation R
– A1A2 An B
A1, A2, , An
functionally determines B
If two tuples of R agree on attributes A1,A2, An, then they must
also agree in another attribute B
title year genre
– A1A2 An B1B2 Bm
If two tuples of R agree on attributes A1,A2, An, then they must
also agree on another list of attributes B1 , B2 , , Bm
4
A1A2 An B1
A1A2 An B2
. . .
A1A2 An Bm
title year length genre
title year length
title year genre
Functional Dependencies (cont’d)
< Effect of a functional dependency on two tuples >
A’s B’s
A’s
B’s
title year length genre
t1
tuple t
y1
l1
g1
must be the same
t1
tuple u
For a given value in LHS,
there is a unique value in RHS
If t and
u agree
here,
Then they
must agree
here
Relation R satisfies the FD f
» a given FD f is true in any instance of relation R
5
y1
l2
g2
FD says something about all
possible instances of relation R,
not about a particular instance
Functional Dependencies (cont’d)
(Ex)
title
year
length
genre
studioName
starName
Star Wars
1977
124
SciFi
Fox
Carrie Fisher
Star Wars
1977
124
SciFi
Fox
Mark Hamill
Star Wars
1977
124
SciFi
Fox
Harrison Ford
Gone With the Wind
1939
231
Drama
MGM
Vivien Leigh
Wayne’s World
1992
95
Comedy
Paramount
Dana Carvey
Wayne’s World
1992
95
Comedy
Paramount
Mike Meyers
Movies1 (title, year, length, genre, studioName, starName)
title year length
title year length genre studioName
title year starName (X)
6
An FD between attributes depend on
a particular application semantics
An FD can be regarded as a constraint
Functional Dependencies (cont’d)
Keys of relations
– a set of attributes {A1,A2, An} is a key for a relation R if :
(1) Those attributes functionally determine all other attributes of R
i.e., it is impossible for two distinct tuples of R to agree on all
of A1, A2, , An
(2) No proper subset of {A1,A2, An} functionally determines all other
attributes of R;
i.e., a key must be minimal
7
Functional Dependencies (cont’d)
(Ex) Consider Moves1 relation.
Movies1 (title, year, length, genre, studioName, starName)
{title, year, starName} is a key
» It functionally determines all the other attributes
» No proper subset functionally determines all the other attributes
If there are more than one keys,
» designate one of the keys as the primary key
8
Functional Dependencies (cont’d)
Superkey
» short for “superset of a key”
– a set of attributes that contains a key
every key is a superkey
some superkeys are not minimal, i.e., not keys
(ex) Consider Movies1
» {title, year, starName} is a key and is a superkey
» {title, year, starName, length} is a superkey, but not a key
9
Note: Why called a Functional Dependency?
What is “Functional” about Functional Dependency?
– A1A2 An B
left-hand side: determinant
right-hand side: determinee
» when a value for each attribute A1, A2, , An is given, there is a unique
value for B
(ex) We can imagine a following function in Movies1:
title year length
» takes a string like “Star Wars” and an integer like 1977 and produces the
unique value of length, namely 124.
10
Rules about Functional Dependencies
Reasoning about Functional Dependencies
» when a relation R satisfies a set of FD’s, we can deduce that the relation
must satisfy certain other FD’s
(ex) If a relation R(A, B, C) satisfies A B and B C, we can
deduce that R also satisfies A C
– The splitting/combining rule
– Trivial functional dependency
– Closure of attributes
– Transitive rule
– Armstrong’s axiom
11
(a, b1, c1), (a, b2, c2)
by A B: b1 = b2
(a, b, c1), (a, b, c2)
by B C: c1 = c2
Thus, two tuples agreeing on A
also agree on C
Rules about FD (cont’d)
■
Two sets of FD’s S and T are equivalent
» if the set of relation instances satisfying S is exactly the same as the set of
relation instances satisfying T
■
A set of FD’s S follows from a set of FD’s T
» if every relation instance that satisfies all the FD’s in T also satisfies all
the FD’s in S
Two sets of FD’s S and T are equivalent
iff S follows from T, and T follows from S
Test whether one FD follows from one or more other FD’s
compute the closure
12
Rules about FD (cont’d)
The splitting/combining rule
» Let A = A1A2 An for simplicity
– Splitting rule
A1 , A2 , , An are attributes
in some relation R
» We can replace a functional dependency A B1B2 Bm by a set of
FD’s A Bi for i = 1, 2, , m
Splitting rule does not work for left sides
– Combining rule
» We can replace a set of FD’s A B1 for i = 1, 2, m by the single
functional dependency A B1B2 Bm
title year length genre
title year length
title year genre
13
Rules about FD (cont’d)
Trivial functional dependencies
– An FD A1A2 An B1B2 Bm is trivial if the right side is a subset
of the left side
i.e., two tuples that agree in all of A1A2 An also agree in a subset
of them
(ex) title year title : a trivial FD (because title title)
can remove
attributes
in the RHS
that are also
in the LHS
14
– An FD A1A2 An B1B2 Bm is equivalent to A1A2 An
C1C2 Ck, where the C’s are all those B’s that are not also A’s.
An FD is (completely) nontrivial if none of the B’s is one of the A’s
Note: Many FD’s can be deduced
(Ex) Consider Relation R(A, B, C, D, E, F)
FD’s: {AB C,
BC AD,
D E, CF B }
AB BC
BC A
BC D
AD E
BD E
ABC C
ABC AD
CD E
ABD C
ABC A
ED E
AB E (∵ AB D, D E)
ABE C
ABC D
FD E
...
ABF C
BCD AD
ABD E
ABCD C
BCD A
ACD E
ABCE C
BCD D
AED E
...
...
...
AB AC
15
AB D (∵ AB BC, BC D)
Rules about FD (cont’d)
Closure of attributes
{A1, A2, An}: a set of attributes
S: a set of FDs
– the closure of {A1, A2, An} under the FD’s in S :
» a set of attributes B such that every relation that satisfies all the FD’s in S
also satisfies A1A2 An B.
a set of all attributes functionally determined by {A1, A2, An}
a set of all attributes B
such that A1A2 An B holds
– {A1, A2, An}+: the closure of {A1, A2, An}
16
Rules about FD (cont’d)
(Ex) Consider Movies-R (title, year, length, studioName)
FD’s: {title year length, title year studioName}
Compute {title, year}+ :
{title, year}, {title, year, length}, {title, year, length, studioName}
{title, year}+ = {title, year, length, studioName}
17
Rules about FD (cont’d)
Algorithm: Closure of a set of attributes
S: a set of FD’s where all the FD’s have singleton right sides
1. Let X be a set of attributes that eventually will become the closure.
initially, X = {A1, A2, An}
2. Search for some FD B1B2 Bm C such that
{B1, B2, Bm } X
Note that X B1B2 Bm
is a trivial FD
» Add C to the set X
3. Repeat Step 2 until no more attributes can be added to X
– The set X, after no more attributes can be added to it, is the correct value
of {A1, A2, An}+
Since X can only grow, eventually nothing more can be added to X
and the algorithm ends
18
Rules about FD (cont’d)
(Ex) Consider Relation R(A, B, C, D, E, F)
FD’s: {AB C, BC AD, D E, CF B }
{A, B}+ : {A, B} , {A, B, C}, {A, B, C, D}, {A, B, C, D, E}
Thus, {A, B}+ = {A, B, C, D, E}
19
Rules about FD (cont’d)
is a consequence of
Test whether A1A2 An B follows from a set of FDs S
– Compute {A1, A2, An}+ under the FD’s in S
– If B is in {A1, A2, An }+ , then A1A2 An B follows from S
(ex) FD’s: {AB C, BC AD, D E, CF B }
» Test whether AB D follows from the FD’s.
Since {A,
B}+
= {A, B, C, D, E}, AB D does follow.
» Test whether D A follows.
{D}+ = {D, E}, D A does not follow.
20
{A, B, C},
{A, B, C, D},
{A, B, C, D, E}
Rules about FD (cont’d)
Test if A1, A2, An is a key for relation R:
– {A1, A2, An}+ is all attributes, and
– for any proper subset Y of {A1, A2, An}, Y+ is not all attributes
of R
21
Rules about FD (cont’d)
Transitive rule
– If A1A2 An B1B2 Bm and B1B2 Bm C1C2 Ck hold,
Then, A1A2 An C1C2 Ck also holds.
<Proof> all the C’s are in {A1, A2, An}+
(ex) Consider another version of Movies.
Movies’(title, year, length, genre, studioName, studioAddr)
Suppose we found the following FD’s:
title year studioName
studioName studioAddr
22
Then, we can infer
title year studioAddr
Note: Armstrong’s axioms
Armstrong’s axioms
» a sound and complete set of inference rules about FD’s
– Reflexivity
» If {B1,B2, Bm} {A1,A2, An}, then A1A2 An B1B2 Bm.
– Augmentation
» If A1A2 An B1B2 Bm, then for any set of attributes C1C2 Ck,
A1A2 AnC1C2 Ck B1B2 BmC1C2 Ck.
– Transitivity
» If A1A2 An B1B2 Bm and B1B2 Bm C1C2 Ck, then
A1A2 An C1C2 Ck.
23
Note: Armstrong’s axioms (cont’d)
Not in the text
(Ex) Show that the following inference rules follow from
Armstrong’s axioms
– if X AB, then X A
» AB A
/* by reflexivity */
» X AB
/* given */
» thus, X A
/* by transitivity */
– if X A and X B, then X AB
24
» X XB
/* by augmentation in X X*/
» XB AB
/* by augmentation in X A */
» thus, X AB
/* by transitivity */
Note that the closure
computation can also
prove these rules
Rules about FD (cont’d)
Closing sets of FD’s
– basis for a set of FD’s F for a relation
» any set of FD’s from which we can infer all the FD’s for a relation
– minimal basis B
» all the FD’s in B have singleton right sides
» no proper subset of B is a basis
» for any XA in B, we cannot remove an attribute in X
(ex) Consider a relation R(A, B, C), and a set of FD’s for R.
no redundant FD
no redundant
attribute
in the left side
{AB, AC, BA, BC, CA, C B, ABC, ACB, BCA}
minimal bases: {A B, B A, B C, C B},
{A B, B C, C A} , etc
25
Rules about FD (cont’d)
Projecting functional dependencies
» R1 = πL(R)
» S: a set of FD’s for R
What FD’s hold in R1?
projection of FD’s S
onto relation R1
– Find a basis for relation R1
Algorithm: Find-Projected-Basis
» Find all FD’s having singleton right sides that hold in R1
– For each subset X of the attributes in R1,
» find all X A such that
FD’s that follow from S, and
involve only attributes of R1
A is an attribute of R1
A is in X+ under S
26
the number of X’s is exponential
in the number of attributes in R1
Note: Minimal basis of Projected FD’s
Find a minimal basis for R1
Algorithm: Projecting a Set of Functional Dependencies
– Let T be the output of “Find-Projected-Basis”
Find a basis for R1
– Construct a minimal basis by modifying T as follows:
» Remove an FD F from T
Find a
minimal
basis
if F follows the other FD’s in T
» Replace Y B by Z B, if
Z B follows from T, and
ZY
» Repeat the above steps until no more changes to T
27
remove
redundant FD
remove
redundant attributes
in the left side
Rules about FD (cont’d)
(Ex1) Consider R (A, B, C, D) and FD’s: {A B, B C, C D}
» R1 (A, C, D) = πA, C, D (R)
– Find FD’s that hold in R1
Consider the closure
of each eight subsets
of {A, C, D}
{A}+= {A, B, C, D}: A C, A D
Find a basis for R1
{C}+= {C, D}: C D
{D}+= {D}
{A,
C}+=
{A,
D}+=
If the closure of X is all attributes,
any new FD’s cannot be discovered
by closing superset of X
{A} +
{C, D}+= {C, D}
» thus, FD’s for R1 is {A C, A D, C D}
– Find a minimal basis for R1
{A C, C D}
28
because A D follows
from {A C, C D}
Note: Find a minimal basis
Not in the text,
but refer to 3.5.2
Find a minimal basis for relation R
» Let F be a set of FD’s for relation R, where all the FD’s in F have
singleton right sides
remove
redundant FD
– For each XA in F,
» if XA is implied by other FD’s, remove XA from F
If X+ under F‒{XA} has A, XA is implied by other FD’s
remove
redundant
attributes
in the
left side
– For each XA in F,
» if B in X is redundant, remove B from X
If
(X-{B})+
under F has A, B in X is redundant
– The resulting set is a minimal basis for R
29
F = {CBA, CB}
If F implies CA,
B is redundant in CBA
Note: Find minimal basis (cont’d)
Not in the text
We do not need to consider
a superset Z of X such that Z → Y
If X Y is in a minimal basis,
» Z Y such that Z is a superset of X is not in the same minimal basis
X Y must be in a minimal basis
» if all the attributes of X does not appear in the right side of any FD
30
Relational Database Design
Anomaly
» problems such as redundancy that occur when we try to cram too much
attributes into a single relation
(ex) Consider the relation
key = {title, year, starName}
title year → length genre studioName
Movies1 (title, year, length, genre, studioName, starName)
– Redundancy
» Information may be repeated unnecessarily in several tuples
(ex) For each movie, length and genre are repeated in several tuples
Why redundancy occurs in Movies1?
- FD viewpoint: there is a FD where determinant is a proper subset of a key (i.e., there is a partial dependency)
- ER-model viewpoint: there is a multi-valued attribute in an entity set
31
Relational Database Design (cont’d)
(ex) Consider the relation Movies1 below.
title
year
length
Star Wars
1977
124
Star Wars
1977
Star Wars
genre
studioName
starName
SciFi
Fox
Carrie Fisher
124
SciFi
Fox
Mark Hamill
1977
124
SciFi
Fox
Harrison Ford
Gone With the Wind
1939
231
Drama
MGM
Vivien Leigh
Wayne’s World
1992
95
Comedy
Paramount
Dana Carvey
Wayne’s World
1992
95
Comedy
Paramount
Mike Meyers
Movies1 (title, year, length, genre, studioName, starName)
32
Relational Database Design (cont’d)
– Update anomaly
» We may change information in one tuple, but leave the same information
unchanged in another.
(ex) change the length of “Star Wars” in the first tuple, but not
in the second or third tuples.
– Deletion anomaly
due to deletion based on
a condition on that attribute(s)
» If a set of values satisfying a condition on the attribute(s) becomes empty,
we may lose other information as a side effect.
(ex) If we delete a tuple of “Vivien Leigh”, we lose all the other
information about the movie “Gone With the Wind”.
33
Relational Database Design (cont’d)
main cause
of redundancy
Multi-valued attribute in an entity set Movies
Not in the text
redundancy,
update anomaly,
deletion anomaly
– Consider attribute starName in Movies entity set
{title, year}: would be a key for movie objects
» starName has multiple values for each value of (title, year)
title
year
Movies
studioName
length
Movies1
34
key: {title, year, starName}
title year length genre
studioName
starName
genre
title
year
length
genre
studioName
starName
Star Wars
1977
124
SciFi
Fox
Carrie Fisher
Star Wars
1977
124
SciFi
Fox
Mark Hamill
Star Wars
1977
124
SciFi
Fox
Harrison Ford
Note that there is an FD
whose determinant is
a subset of the key
Not even 2NF
Relational Database Design (cont’d)
Decomposing Relations
» decompose relations to eliminate anomalies
– Given a relation R(A1,A2, ,An), we may decompose R into two
relations S(B1,B2, ,Bm) and T(C1,C2, ,Ck) such that
» {A1, A2, , An} = {B1, B2, , Bm} U {C1, C2, , Ck}
» S = π B1, B2, , Bm (R)
» T = π C1, C2, , Ck (R)
35
Relational Database Design (cont’d)
(Ex) Decompose Movies1 into Movies2 and Movies3
Movies2
Movies3
Not redundant.
(title, year)
form a key, i.e.
no more succinct
way to represent
a movie
36
title
year
length
genre
Star Wars
1977
124
SciFi
Fox
Gone With the Wind
1939
231
Drama
MGM
Wayne’s World
1992
95
Comedy
Paramount
title
year
starName
Star Wars
1977
Carrie Fisher
Star Wars
1977
Mark Hamill
Star Wars
1977
Harrison Ford
Gone With the Wind
1939
Vivien Leigh
Wayne’s World
1992
Dana Carvey
Wayne’s World
1992
Mike Meyers
studioName
This decomposition
eliminates anomalies
in Movies.
- redundancy
- update anomaly
- deletion anomaly
Relational Database Design (cont’d)
Boyce-Codd Normal Form (BCNF)
– A relation R is in BCNF iff :
whenever there is a nontrivial FD A1A2 An B for R,
{A1,A2, ,An} is a superkey for R.
The left side of every nontrivial FD is a superkey
Every determinant
is a superkey
(ex) Any two-attribute relation R(A, B) is in BCNF.
» when there is no nontrivial FD: surely BCNF
» when A B hold, but B A does not: A is the only key. Thus, BCNF
» when both A B and B A hold: both A and B are keys. Thus, BCNF
37
Relational Database Design (cont’d)
partial dependency if X is a subset of a key
(Ex) Consider relation Movies1.
title
year
length
Star Wars
1977
124
Star Wars
1977
Star Wars
Key X
genre
studioName
starName
SciFi
Fox
Carrie Fisher
124
SciFi
Fox
Mark Hamill
1977
124
SciFi
Fox
Harrison Ford
Gone With the Wind
1939
231
Drama
MGM
Vivien Leigh
Wayne’s World
1992
95
Comedy
Paramount
Dana Carvey
Wayne’s World
1992
95
Comedy
Paramount
Mike Meyers
» Key: {title, year, starName}
» FD: title year length genre studioName
– Movies1 is not in BCNF
38
A
2NF if 1NF and
no partial dependency
for any nonkey attribute
1NF if all attributes have atomic values
Relational Database Design (cont’d)
(Ex) Consider relation Movies2.
title
year
length
genre
studioName
Star Wars
1977
124
SciFi
Fox
Gone With the Wind
1939
231
Drama
MGM
Wayne’s World
1992
95
Comedy
Paramount
» Key: {title, year}
neither title nor year by itself functionally determines any of the
other attributes.
» FD: title year length genre studioName
– Movies2 is in BCNF
39
Relational Database Design (cont’d)
Decomposition into BCNF
– we can break any relation schema into a collection of subsets of
its attributes with the following properties.
» each relation is in BCNF, and
» the original relation can be reconstructed from the decomposed relation
40
Relational Database Design (cont’d)
Algorithm: BCNF decomposition
1. For each nontrivial X A where X is not a superkey for R,
These steps
are applied
recursively to
relation R and
its set of FD’s
the right side A is all the attributes that are functionally
determined by X, i.e., A = X+ – X
decompose R into R1 and R2 such that
R1 = (X, A) and
/* R1 = X+ */
R2 = (R – A)
/* R2 = R – (X+ – X) */
2. Recursively decompose R1 and R2 until no such FD
/* Compute FD’s for R1 and R2 before applying this recursive step
S1 = the set of FD’s for R1
S2 = the set of FD’s for R2 */
41
a basis
This step
takes O(2n)
Relational Database Design (cont’d)
R
others
X
A
X A where X is not a superkey for R
(assume that A = X+ – X)
R1
R2
BCNF decomposition into two overlapping relation schemas
42
Relational Database Design (cont’d)
(Ex) Movies1(title, year, length, genre, studioName, starName)
» Key: {title, year, starName}
– FD: title year length genre studioName
» {title, year} is not a super key. Decompose Movies1 into
Movies2 (title, year, length, genre, studioName)
Movies3 (title, year, starName)
In this example, the problem is due to
the existence of a partial dependency
(or the existence of a multi-valued attribute)
43
not 2NF
Relational Database Design (cont’d)
R is in 2NF
(Ex) Consider relation R(title, year, studioName, president, presAddr)
» FD’s: title year studioName,
studioName president,
president presAddr
key: {title, year}
BCNF violations
– Decompose R
» Start with studioName president presAddr
R1 = {studioName, president, presAddr}, R2 = {title, year, studioName}
FD’s for R1: {studioName president, president presAddr},
key: {studioName}
» R1 is not in BCNF because president presAddr
R3 = {president, presAddr}, R4 = {studioName, president}
» Final result = R2, R3, R4
44
Not in the text
Note: Problem due to a transitive dependency
transitive
dependency
Key
A
X
Redundancy due to transitive dependency
Key
X
A
X is not a subset of a key, and
A is not a member of some key.
(Ex) MoviesStudio (title, year, length, genre, studioName, studioAddr)
» Key: {title, year}
Not 3NF if there is
a transitive dependency
for any nonkey attribute
» title year studioName,
studioName studioAddr
2NF, but
not 3NF
45
title
year
length
Star Wars
1977
124
Gone With the Wind
1939
Wayne’s World
Adams Family
genre
3NF if 2NF and
no transitive dependency
for any nonkey attribute
studioName
studioAddr
SciFi
Fox
Hollywood
231
Drama
MGM
Buena Vista
1992
95
Comedy
Paramount
Hollywood
1991
102
Comedy
Paramount
Hollywood
Not in the text
Note: Problem due to transitive dependency (cont’d)
(Ex) Decompose MoviesStudio into MovieStudio1 and MovieStudio2
MovieStudio1
MovieStudio2
46
title
year
length
Star Wars
1977
124
SciFi
Fox
Gone With the Wind
1939
231
Drama
MGM
Wayne’s World
1992
95
Comedy
Paramount
Adams Family
1991
102
Comedy
Paramount
StudioName
studioAddr
Fox
Hollywood
MGM
Buena Vista
Paramount
Hollywood
genre
studioName
Note: BCNF decomposition algorithm
Not in the text
Complexity of the BCNF decomposition algorithm: O(2n)
– R (A, B, C, D, E), F = {A B, BC D, E B}
Key: ACE
» R1 = {A, B}, R’ = {A, C, D, E}
» Is R’ in BCNF? No
Check if there is
a FD where
the determinant
is not a key
/* key: ACE */
AC D holds in R’, but AC is not a key in R’
{A, C}+ = {A, C, B, D}
Need to compute
projected FD’s
for R’ : O(2n)
The result of BCNF algorithm may not be unique
» there can be more than one sequences of choosing BCNF-violating
functional dependencies
47
Note: Computing BCNF Decomposition
Not in the text
(ex) R = {B, O, I, S, Q, D}
FD’s: {S D, I B, IS Q, B O}. key: {I, S}
BOISQD
SD
key: {I, S}, Consider I B O
BOISQ
key: {I}, Consider B O
BO
IBO
ISQ
IB
BCNF: {S, D}, {B, O}, {I, B}, {I, S, Q}
48
all the attributes
determined by I,
i.e., I+ - I
Consider S D
If we are not sure that there is no
BCNF-violating FD, we need to
compute the projection of FD’s onto R’
the set of leaf nodes is
a BCNF decomposition
Decomposition: The Good, Bad, and Ugly
Properties that decompositions need to have
1. Elimination of Anomalies
Can we recover the original relation
from the tuples in its decomposition?
2. Recoverability of Information
3. Preservation of Dependencies
Are all the FD’s in the original relation
preserved in the relations of the decomposition?
Various Normal Forms and the above properties
» there is no way to get all these three at once
– BCNF: gives 1 and 2, but not necessarily 3
– 3NF: gives 2 and 3, but not necessarily 1
49
Decomposition: The Good, Bad, and Ugly
Recovering information from a decomposition
– After decomposition of the relation, the original relation can
only be recovered by joining the decomposed relations.
careless decomposition may fail to reconstruct
(ex) Decompose R (A, B, C) into R1 (A, B) and R2 (B, C).
A
B
C
A
B
B
C
1
2
3
1
2
2
3
4
2
5
4
2
2
5
R (A, B, C)
R1 (A, B)
R2 (B, C)
A
B
C
1
2
3
1
2
5
4
2
3
4
2
5
R’ (A, B, C)
< loss of information >
50
Decomposition: The Good, Bad, and Ugly (cont’d)
Lossless join decomposition
– Let R1, R2, , Rn be a decomposition of relation R
e.g., R1(A, B) and R2(B, C) is a decomposition of R(A, B, C)
» the union of all the attributes in R1,, Rn is equal to the attributes in R
» each Ri has the projection of R onto the attributes of Ri
– If the join of R1, R2, , Rn is equal to R,
» this decomposition is called a lossless join decomposition
For convenience,
we use Ri to denote both
the relation scheme and
it instance
51
The decomposition is lossless.
The decomposition is a lossless join decomposition.
The decomposition has a lossless join.
Note: Lossless Join Decomposition
Not in the text
S (A, B, C), FD’s = {A B, A C}
a1
b1
c1
a1
b1
b1
c1
a2
b2
c2
a2
b2
b2
c2
a3
b1
c3
a3
b1
b1
c3
S S1⋈S2
Since B C, there can
be more than one c-values
for a single b-value
R (A, B, C), FD’s = {B C}
52
a1
b1
c1
a1
b1
b1
c1
a2
b2
c2
a2
b2
b2
c2
a3
b1
c1
a3
b1
b1
c1
R = R1⋈R2
Since B C, there is
only one c-value for
each b-value
Decomposition: The Good, Bad, and Ugly (cont’d)
Does a decomposition of R have a lossless join?
» F: a set of FD’s that hold in R
» R is decomposed into k relations R1, R2, . . . , Rk
» Lossless join iff R = R1⋈ R2⋈ . . . ⋈ Rk
we can
easily
see this
– Any tuple t in R is also in R1⋈ R2 ⋈ . . . ⋈ Rk
» each Ri has the projection of t onto the attributes of Ri
» no tuple of each Ri is dangling in R1⋈ R2 ⋈ . . . ⋈ Rk
we will
prove it
by the
chase test
» Thus, t must be in R1⋈ R2 ⋈ . . . ⋈ Rk
Is this decomposition
lossless?
R1 and R2 have
no common attributes:
R1 ⋈ R2 = R1 ⨉ R2
R1 and R2 have
a common attribute
For any case, t in R is
clearly in R1 ⋈ R2
(no dangling tuple in R1
and R2)
– For any tuple t in R1⋈ R2 ⋈ . . . ⋈ Rk, is t also in R?
» the chase test for a lossless join: an organized way to see if this is true,
by using the FD’s in F
53
Decomposition: The Good, Bad, and Ugly (cont’d)
The Chase test for lossless join
» Is any tuple t in R1⋈ R2 ⋈ . . . ⋈ Rk also in R?
Decomposition into k relations
a tableau with k tuples
1) Draw a tableau
» Suppose tuple t = (a, b, . . . , k) is in R1⋈ R2 ⋈ . . . ⋈ Rk
Let R1 consist of the first two attributes of R
- R1 must have a tuple (a, b)
- Then, we know there must be a tuple (a, b, *, . . . , *) in R
one tuple
for each
decomposed
relation
– Draw a tableau with k tuples t1, t2, . . . , tk as follows:
» create tuple ti for relation Ri
for attributes of Ri: values of t
for other attributes: arbitrary variables
54
the same letter as t
the same letter as t
with subscript i
Note: Supplementary
a tuple t is in R1⋈ R2 ⋈ . . . ⋈ Rk
there must be tuples in R, say t1, t2, . . . , tk , such that t is the join of
the projections of each ti onto the attributes of Ri
ti and tj need not
be different
(Ex) Let R(A, B, C) be decomposed into R1(A, B) and R2(B, C).
Suppose t = (a, b, c) is in R1⋈ R2 .
Then, relation R must have two tuples in R as follows.
not necessarily different
Relation R
55
A
B
C
t1 :
a
b
*
t2 :
*
b
c
“*” denotes
an unknown value
Decomposition: The Good, Bad, and Ugly (cont’d)
(Ex) Consider R (A, B, C, D).
F = { A B, B C, CD A }
R is decomposed into R1(A, D), R2(A, C), R3(B, C, D)
Suppose a tuple t = (a, b, c, d) is in R1⋈ R2⋈ R3.
we know that
the following
three tuples
t1, t2 and t3
must exist in R
A
B
C
D
t1 :
a
b1
c1
d
t2 :
a
b2
c
d2
t3 :
a3
b
c
d
Tableau for the decomposition of R into R1, R2, R3
56
We have to show that
t is also in R
Subscripted
variables denote
unknown values
Decomposition: The Good, Bad, and Ugly (cont’d)
2) Chase the tableau
Goal: prove that t is really in R, by using FD’s in F
– Apply the FD’s in F to equate symbols in the tableau
» for each X Y:
If two rows agree on X, equate the symbols in Y as follows.
if one of them is unsubscripted, make the other be the same
if both are with their own subscript, change either to be the other
– If one of the rows is actually the same as t,
i.e., the row becomes all unsubscripted symbols
» we have proved that any tuple t in R1⋈ R2 ⋈ . . . ⋈ Rk is also in R
57
Decomposition: The Good, Bad, and Ugly (cont’d)
(Ex) Example continued
R (A, B, C, D): R1(A, D), R2 (A, C), R3(B, C, D)
F = { A B, B C, CD A }
58
A
B
C
D
A
B
C
D
a
b1
c1
d
a
b1
c1
d
a
b2
c
d2
a
b1
c
d2
a3
b
c
d
a3
b
c
d
A
B
C
D
A
B
C
D
a
b1
c
d
a
b1
c
d
a
b1
c
d2
a
b1
c
d2
a3
b
c
d
a
b
c
d
AB
CD A
BC
we have proved that
tuple t3 must be t
if R satisfies FD’s
Decomposition: The Good, Bad, and Ugly (cont’d)
Why the chase works
The result of the chase test
- a row with all unsubscripted variables
- no row with all unsubscripted variables
– Chase results in a row with all unsubscribed variables: why is
the join lossless?
» The chase process itself is a proof that
tuple t in the join must be a tuple in R
» we also know that every tuple in R is surely in R1⋈ R2 ⋈ . . . ⋈ Rk
Thus, the chase has proved that the result of projection and join is
exactly R.
59
Decomposition: The Good, Bad, and Ugly (cont’d)
– There is no row with all unsubscribed variables, after applying
all FD’s: why is the join not lossless?
i.e., no other way to equate remaining subscripted symbols
» Think of the resulting tableau (i.e., the tableau after applying all FD’s) as
an instance of the relation R.
(1) relation R
does not have
a tuple t, but
1) It satisfies all FD’s, but
does not have a tuple (a, b, . . ., k)
2) we know the i-th row has unsubscripted symbols in the attributes of Ri.
(2) there is a tuple t
in R1⋈ . . . ⋈ Rk
Thus, when we project this relation onto the attributes of Ri and take
the natural join, we get the tuple with all unsubscripted symbols.
i.e., tuple (a, b, . . ., k) is in R1⋈ R2 ⋈ . . . ⋈ Rk
This tuple is not in R, so the join is not lossless.
60
Decomposition: The Good, Bad, and Ugly (cont’d)
(Ex) Consider R (A, B, C, D).
F = {B AD}
R is decomposed into R1(A, B), R2(B, C), R3(C, D)
Let a tuple (a, b, c, d) be in R1 ⋈ R2 ⋈ R3 .
Then, R must
have these
three tuples
61
A
B
C
D
A
B
C
D
a
b
c1
d1
a
b
c1
d1
a2
b
c
d2
a
b
c
d1
a3
b3
c
d
a3
b3
c
d
B AD
Consider the resulting tableau as an instance of R
- every unsubscripted symbols in the initial tableau remains unchanged
- Thus, R1 ⋈ R2 ⋈ R3 (after projection of the tableau onto the attributes
of R1 , R2 and R3 ) has a tuple (a, b, c, d)
We just showed that R1 ⋈ R2 ⋈ R3 can have a tuple that is not in R
We found that R
does not have the
tuple (a, b, c, d)
An instance of R:
- satisfies all the
FD’s in R, but
- does not have
tuple (a, b, c, d)
Not in the text
Note: Simple test for Lossless Join
Testing a lossless join decomposition
The set of common attributes is
a key for at least one relation
– Let R1 and R2 be a decomposition of R. This decomposition has
a lossless join iff R1 R2 R1, or R1 R2 R2.
(ex) R = {A, B, C}, FD’s: { A B }
set of common attributes
in R1 and R2
Let R1 R2 be X.
» R1 = {A, B}, R2 = {A, C}: lossless
» R1 = {A, B}, R2 = {B, C}: not lossless
» R1 = {A, C}, R2 = {B, C}: not lossless
This test may not be easily applied
if decomposed into more than two relations
62
t1 :
t2 :
R1-X
X
R2-X
a
a2
b
b
c1
c
Tableau for R
Decomposition: The Good, Bad, and Ugly (cont’d)
Dependency preservation
» Dependency may not be preserved in a BCNF decomposition
– Does the join of decomposed relations satisfy the original FDs?
(Ex) Bookings (theater, city, title)
/* A movie is currently being shown at some theater in some city */
» theater city
/* BCNF violation */
» title city theater
a theater is located in one city
a movie is not shown
in two theaters in the same city
» Keys: {title, city}, {theater, title}
Lossless Join: Is the join of decomposed relations equal to the original relation?
63
Decomposition: The Good, Bad, and Ugly (cont’d)
– BCNF decomposition: {theater, city}, {theater, title}
» Only “theater city” is preserved
Dependency is not
fully preserved
relations satisfying
theater city
theater
Guild
Park
city
Menlo Park
Menlo Park
theater title
Guild
Park
Antz
Antz
“title city theater” is violated
theater
Join
Guild
Park
city
Menlo Park
Menlo Park
Even if we want to obey the FD “title city theater” in our database,
there is no way we can do with the tables R1(theater, city) and R2(theater, title)
64
title
Antz
Antz
Third Normal Form
Third Normal Form
– A relation R is in third normal form (3NF) if :
Whenever A1A2 An B is a nontrivial FD, either {A1,A2,
An} is a superkey, or B is prime
prime (attribute):
an attribute is a member of some key
relax BCNF requirement
Key A
65
X
We can always have a lossless join and dependency preserving 3NF decomposition
Third Normal Form (cont’d)
Algorithm: 3NF Decomposition
» F: a set of FD’s that hold for relation R.
Lossless join and
Dependency Preservation
» G: a minimal basis for F.
– For each X A in G, create a relation scheme XA.
R(A, B, C, D):
AB C,
C B,
AD
» may create a relation scheme (X, A1, , Am):
if X A1, , X Am are in G, and there is no Ai Aj where both
Ai and Aj are in {A1, , Am}
» If there are relations R1 and R2 such that R2 R1, R2 may be removed
– If none of the relation schemes contains a key for R, add a relation
scheme consisting of a key for R. /* handles attributes not involved in any FD */
Since there can be more than one minimal bases,
the result may not be unique
66
After decomposition, there is a relation
whose set of attributes is a superkey
Third Normal Form (cont’d)
3NF violation
(Ex) R(A, B, C, D, E); F = {ABC, CB, AD}
» keys: {A, E, B} and {A, E, C}
E does not appear in any FD
– Make sure F is a minimal basis (see next pages)
– Take the attributes of each FD as a relation schema
i.e., R1(A, B, C), R2(B, C) and R3(A, D)
– Remove a relation that is a subset of another relation
i.e., remove R2(B, C)
– Since none of the relation schemes contains a key of R, Add a relation whose
schema is a key
i.e., add one of keys, {A, B, E} or {A, C, E}
Result: R1(A, B, C), R3(A, D), R4(A, B, E).
67
Third Normal Form (cont’d)
(Ex. cont’d) Check if F is a minimal basis.
R(A, B, C, D, E)
F = {ABC, CB, AD}
» need to verify two things:
1) Remove redundant FD’s.
» ABC
If this has C,
AB C
is implied
by other FD’s
For each FD,
check if other FD’s
imply this FD
Compute {A, B}+, using F-{ABC}
{A, B}+F-{ABC} = {A, B, D}
thus, AB C is not implied
by F-{ABC}
» CB : {C}+F-{CB} = {C}
» AD : {A}+F-{AD} = {A}
No FD can be eliminated, i.e., not implied by any other FD’s
68
Third Normal Form (cont’d)
R(A, B, C, D, E)
(Ex. cont’d)
F = {ABC, CB, AD}
2) Remove redundant attributes from the left side of the FD’s.
» AB C
For each FD,
check if there
is a redundant
attribute in
its left side
Eliminate A, and check if B C is implied by F
{B}+F = {B}, so B C does not follow from F
Eliminate B, and check if A C is implied by F
{A}+F = {A, D}, so A C does not follow from F
No attribute can be eliminated from a left side of FD’s
69
If F implies BC,
then A in ABC
is redundant
Third Normal Form (cont’d)
Why the 3NF algorithm works
– Lossless join
Start with a relation whose set of attributes K is a superkey.
» Compute K+, using a sequence of FD’s
since K is a superkey, K+ is all the attributes
» Consider the chase test, using the same sequence of FD’s
The chase test
concludes that
the decomposition
is lossless
subscripted symbols in the row for K are equated to unsubscripted
symbols, in the same order as attributes were added to the closure
(ex) Consider a relation R(A, B, C) with F = {AC}. Key K= {A, B}
R1(A, C) and Rk(A, B) are created by the 3NF algorithm
» In computing K+: C is added by AC.
» In the chase test: ck in the row for Rk is equated to c in the row for R1
70
Third Normal Form (cont’d)
(Ex) R(A, B, C, D, E); F = {ABC, CB, AD}
keys: {A, B, E} and {A, C, E}
» 3NF decomposition: R1(A, B, C), R2(A, D), R3(A, B, E)
– {A, B, E}+: {A, B, E, C}, {A, B, E, C, D}
A
B
C
D
E
A
B
C
D
E
A
B
C
D
E
a
b
c
d1
e1
a
b
c
d1
e1
a
b
c
d1
e1
a
b2
c2
d
e2
a
b2
c2
d
e2
a
b2
c2
d
e2
a
b
c3
d3
e
a
b
c
d3
e
a
b
c
d
e
71
Third Normal Form (cont’d)
– Dependency preservation
» Each FD of the minimal basis has all its attributes in some relation of the
decomposition.
» Thus, each FD can be checked in the decomposed relations.
– Third normal form
/*all the relations of the decomposition are in 3NF */
» Suppose we add a relation whose schema is a key
this relation is surely in 3NF
all the attributes are prime
» Consider relations from the FD’s of a minimal basis
these relations are intuitively in 3NF
72
informal statement,
need more rigorous proof
Note: 1NF, 2NF and 3NF
Not in the text
Consider a functional dependency X A
1NF, 2NF and 3NF
– 1NF: every attribute has atomic values
2NF: only
historical
interest
– 2NF: every nonprime attribute is fully dependent on a key
Key X
A
partial dependency: Not allowed
nonprime attribute
= nonkey attribute
Key
A
X
Key
A
X
transitive dependency: Allowed
– 3NF: every determinant is a superkey, or the right-side is prime
X
73
Key
A
Key A
X
Allowed
Not in the text
Note: 3NF and BCNF
A relation that is in 3NF, but is not in BCNF
– when relations are not in BCNF, there will be some redundancy left
in the schema
(Ex) Consider Bookings (theater, city, title) again.
» FD’s : {theater city, title city theater}
» Keys: {title, city}, {theater, title}
theater
repetition of
information
74
city
title
Guild
Menlo Park
Star Wars
Guild
Menlo Park
Rocky
Bookings is in 3NF,
but is not in BCNF
© Copyright 2026 Paperzz