R - COW :: Ceng

Database Design
(Normalization)
A. Yazici, CEng 553
Spring 2008 1
What it’s all about
• Given a relation, R, and a set of
functional dependencies, F, on R.
• Assume that R is not in a desirable
form for enforcing F.
• Decompose relation R into relations,
R1,..., Rk, with associated functional
dependencies, F1,..., Fk, such that
R1,..., Rk are in a more desirable
form, 3NF or BCNF.
• While decomposing R, make sure to
preserve the dependencies, and
make sure not to lose information.
A. Yazici, CEng 553
Spring 2008 2
Primitive Domains
FLT-SCHEDULE
flt#
weekday
airline
dtime
from
atime
to
DL242 MO WE FR DELTA 10:40
ATL
12:30
BOS
SK912 SA SU
SAS
12:00
CPH
15:30
JFK
AA242 MO FR
AA
08:00
CHI
10:10
ATL
Attributes must be defined over
domains with atomic values
FLT-SCHEDULE
flt#
weekday
airline
dtime
from
atime
to
DL242
MO
DELTA 10:40
ATL
12:30
BOS
DL242
WE
DELTA 10:40
ATL
12:30
BOS
DL242
FR
DELTA 10:40
ATL
12:30
BOS
SK912
SA
SAS
12:00
CPH
15:30
JFK
SK912
SU
SAS
12:00
CPH
15:30
JFK
AA242
MO
AA
08:00
CHI
10:10
ATL
AA242
FR
AA
08:00
CHI
10:10
ATL
A. Yazici, CEng 553
Spring 2008 3
Bad Database Design
- redundancy of fact
FLIGHTS
flt#
date
airline
plane#
DL242 10/23/00
Delta
k-yo-33297
DL242 10/24/00
Delta
t-up-73356
DL242 10/25/00
Delta
o-ge-98722
AA121 10/24/00
American
p-rw-84663
AA121 10/25/00
American
q-yg-98237
AA411 10/22/00
American
h-fe-65748
• redundancy: airline name repeated for
same flight
• inconsistency: when airline name for a
flight changes, it must be changed many
places
A. Yazici, CEng 553
Spring 2008 4
Bad Database Design
- fact clutter
FLIGHTS
flt#
date
airline
plane#
DL242 10/23/00
Delta
k-yo-33297
DL242 10/24/00
Delta
t-up-73356
DL242 10/25/00
Delta
o-ge-98722
AA121 10/24/00
American
p-rw-84663
AA121 10/25/00
American
q-yg-98237
AA411 10/22/00
American
h-fe-65748
• insertion anomalies: how do we represent
that TK912 is flown by Turkish Airline
without there being a date and a plane
assigned?
• deletion anomalies: cancelling AA411 on
10/22/00 makes us lose that it is flown by
American.
• update anomalies: if DL242 is flown by
Sabena, we must change it everywhere.
A. Yazici, CEng 553
Spring 2008 5
Bad Database Design
- information loss
FLIGHTS
flt#
airline
plane#
DL242 10/23/00
Delta
k-yo-33297
DL242 10/24/00
Delta
t-up-73356
DL242 10/25/00
Delta
o-ge-98722
AA121 10/24/00
American
p-rw-84663
AA121 10/25/00
American
q-yg-98237
AA411 10/22/00
American
h-fe-65748
FLIGHTS-AIRLINE
flt#
date
airline
DATE-AIRLINE-PLANE
date
airline
plane#
DL242 Delta
10/23/00
Delta
k-yo-33297
AA121 American
10/24/00
Delta
t-up-73356
AA411 American
10/25/00
Delta
o-ge-98722
10/24/00
American
p-rw-84663
10/25/00
American
q-yg-98237
10/22/00
American
h-fe-65748
A. Yazici, CEng 553
Spring 2008 6
Bad Database Design
- information loss
FLIGHTS-AIRLINE
flt#
DATE-AIRLINE-PLANE
airline
date
airline
plane#
DL242 Delta
10/23/00
Delta
k-yo-33297
AA121 American
10/24/00
Delta
t-up-73356
AA411 American
10/25/00
Delta
o-ge-98722
10/24/00
American
p-rw-84663
10/25/00
American
q-yg-98237
10/22/00
American
h-fe-65748
airline
plane#
FLIGHTS
flt#
date
DL242 10/23/00
Delta
k-yo-33297
DL242 10/24/00
Delta
t-up-73356
DL242 10/25/00
Delta
o-ge-98722
AA121 10/24/00
American
p-rw-84663
AA121 10/25/00
American
q-yg-98237
AA211 10/22/00
American
h-fe-65748
AA411 10/24/00
American
p-rw-84663
AA411 10/25/00
American
q-yg-98237
AA411 10/22/00
American
h-fe-65748
• information loss: we polluted the database
with false facts; we can’t find the true facts.
A. Yazici, CEng 553
Spring 2008 7
Bad Database Design
- dependency loss
FLIGHTS-AIRLINE
flt#
airline
DATE-AIRLINE-PLANE
date
airline
plane#
DL242 Delta
10/23/00
Delta
k-yo-33297
AA121 American
10/24/00
Delta
t-up-73356
AA411 American
10/25/00
Delta
o-ge-98722
10/24/00
American
p-rw-84663
10/25/00
American
q-yg-98237
10/22/00
American
h-fe-65748
• dependency loss: we lost the fact that
(flt#, date) → plane#
A. Yazici, CEng 553
Spring 2008 8
A Lossy Decomposition
A. Yazici, CEng 553
Spring 2008 9
Good Database Design
FLIGHTS-DATE-PLANE
FLIGHTS-AIRLINE
flt#
airline
DL242 Delta
AA121 American
AA411 American
flt#
date
plane#
DL242 10/23/00
k-yo-33297
DL242 10/24/00
t-up-73356
DL242 10/25/00
o-ge-98722
AA121 10/24/00
p-rw-84663
AA121 10/25/00
q-yg-98237
AA411 10/22/00
h-fe-65748
• no redundancy of FACT (!)
• no inconsistency
• no insertion, deletion or
update anomalies
• no information loss
• no dependency loss
A. Yazici, CEng 553
Spring 2008 10
Informal Design Guidelines
for Relation Schemas
• Informal measures of quality
for relation schemas design.
1. Semantics of the attributes: it
should be easy to explain the
meaning of the schema. If a
schema correspond to one entity
type or one relationship type, its
meaning tends to be clear.
2. Reducing the redundant values in
tuples: no anomalies.
3. Reducing the null values in tuples:
nulls in exceptional cases only.
4. Disallowing the possibility of
generating spurious tuples.
A. Yazici, CEng 553
Spring 2008 11
Functional Dependencies
and Keys
Let X and Y be sets of attributes in R
• Y is functionally dependent on X
in R iff for each x ∈ R.X there is
precisely one y∈ R.Y
• Y is fully functional dependent
on X in R if Y is functional
dependent on X and Y is not
functional dependent on any
proper subset of X
• We use keys to enforce functional
dependencies in relations:
X→Y
X
A. Yazici, CEng 553
Y
Spring 2008 12
Functional Dependencies
and Keys
real world
name
database
cust# name address
address
Consider the meaning
cust# name address
cust# name address
A. Yazici, CEng 553
cust# name address
separate
cust# name address
combined
Spring 2008 13
Functional Dependencies
Dtime
Airline
Airport
Code
Name
From
1
City
Atime
Miles
n
Flt Schedule
Airport
n
1
Price
To
State
1
Instance
Of
Plane
Type
Plane#
Flt#
Weekday
Date
n
Airplane
1
Functional Dependencies in
the ER-Diagram
Assigned
n
Flt Instance
Ticket#
AIRPORT ↔ Airportcode
FLT-SCHEDULE ↔ Flt#
FLT-INSTANCE ↔ (Flt#, Date)
AIRPLANE ↔ Plane#
CUSTOMER ↔ Cust#
RESERVATION ↔ (Cust#, Flt#, Date)
RESERVATION ↔ Ticket#
n
Total
#Seats
#Avail
Seats
Seat#
ReserVation
Check-In
Status
n
First
Customer
Name
Customer
Customer
Address
Middle
City
State
Phone#
Last
AIRPORT
airportcode
Street
Cust#
Zip
name city state
FLT-SCHEDULE
Airportcode → name, City, State
Flt# → Airline, Dtime, Atime, Miles,
Price, (from) Airportcode, (το)
Airportcode
(Flt#, Date) → Flt#, Date, Plane#
(Cust#, Flt#, Date) →Cust#, Flt#, Date,
Ticket#, Seat#, CheckInStatus,
Ticket# → Cust#, Flt#, Date
Cust# → CustomerName,
CustomerAddress, Phone#
flt# airline dtime from-airportcode atime to-airportcode miles price
FLT-WEEKDAY
flt# weekday
FLT-INSTANCE
flt# date plane# #avail-seats
AIRPLANE
plane# plane-type total-#seats
CUSTOMER
cust# first middle last phone# street city state zip
RESERVATION
flt# date cust# seat# check-in-status ticket#
A. Yazici, CEng 553
Spring 2008 14
How to Compute Meaning
- Armstrong’s inference rules
Rules of the computation:
– reflexivity: if Y⊆ X, then X→Y
– Augmentation: if X→Y, then WX→WY
– Transitivity: if X→Y and Y→Z, then X→Z
Derived rules:
– Union: if X→Y and X→Z, the X→YZ
– Decomposition: if X→YZ, then X→Y and
X→Z
– Pseudotransitivity: if X→Y and WY→Z,
then WX → Z
Armstrong’s Axioms:
– sound (generate only functional
dependencies that actually hold) and
– complete (generate all functional
dependencies that hold).
A. Yazici, CEng 553
Spring 2008 15
How to Compute Meaning
- Armstrong’s inference rules
• Proof of reflexivity: if Y⊆ X, then
X→Y
Suppose Y⊆ X and two tuples t1 and
t2 exist in some relation instance r of
R such that t1[X] = t2[X]. Then t1[Y] =
t2[Y] because Y⊆X; hence, X→Y
must hold in r.
A. Yazici, CEng 553
Spring 2008 16
How to Compute Meaning
- Armstrong’s inference rules
• Proof of Augmentation:
{X→Y} ⇒ WX→WY
Assume X→Y holds in a relation
instance r of R but that WX→WY
does not hold. Then there must exist
two tuples t1 and t2 in r such that
(1) t1[X] = t2[X],
(2) t1[Y] = t2[Y],
(3) t1[WX] = t2[WX], and
(4) t1[WY] ≠ t2[WY].
This is not possible because from (1)
and (3) we deduce (5) t1[W] = t2[W],
and from (2) and (5) we deduce (6)
t1[WY] = t2[WY]. Contradicting (4).
A. Yazici, CEng 553
Spring 2008 17
How to Compute Meaning
- Armstrong’s inference rules
•
Proof of transitive rule:
{X→Y, Y→Z} ⇒ X→Z.
Assume (1) X→Y and (2) Y→Z both
hold in a relation instance r of R.
Then for any two tuples t1 and t2 in r
such that t1[X] = t2[X], we must have
(3) t1[Y] = t2[Y] from assumption (1);
hence we must also have (4) t1[Z] =
t2[Z], from (3) and assumption (2);
hence X→Z must hold in r.
Proof of decomposition (or
projection) rule: {X→YZ} ⇒ X→Y
and X→Z
1. X→YZ (given)
2. YZ→Y (using ref. rule and Y⊆YZ)
3. X→Y
A. Yazici, CEng 553
Spring 2008 18
How to Compute Meaning
- Armstrong’s inference rules
•
Proof of pseudotransitive rule:
{X→Y, WY→Z} ⇒ WX → Z
1. X→Y (given)
2. WY→Z (given)
3. WX→WY (usig 1 and
augmentating W)
4. WX→Z (trans. on 3 and 2)
A. Yazici, CEng 553
Spring 2008 19
How to Compute Meaning
- Armstrong’s inference rules
•
Proof of union rule: if X→Y and
X→Z, the X→YZ
1. X→Y (given)
2. X→Z (given)
3. X→XY (usig 1 and augmentation
rule, notice that XX=X)
4. YX→YZ (using 2 and
augmentation with Y)
5. X →YZ (use 3 and 4 and
transitivity rule.)
A. Yazici, CEng 553
Spring 2008 20
Finding a Key for a Relation
Alg: Finding a Key K for R, given a set of
FDs
1. Set K = R.
2. For each attribute A in K
{Compute (K-A)+ wrt F;
If (K-A)+ contains all the attributes in R,
then set K = K – {A}};
Example: R = Ssn, Pnumber, Ename,
Pname, Plocation, Hours
F = {Ssn Ename, Pnumber {Pname,
Plocation}, {Ssn,Pnumber} Hours}
The Key is {Ssn,Pnumber}, since
{Ssn,Pnumber}+ = {Ssn, Pnumber, Ename,
Pname, Plocation, Hours}
A. Yazici, CEng 553
Spring 2008 21
How to Compute Meaning
-the meaning of a set of FDs, F+
• The set of all FDs implied by a
given set F of FDs is called the
closure of F, F+.
• Given the ribs of an umbrella,
the FDs, what does the whole
umbrella, F+, look like?
• Determine each set of attributes,
X, that appears on a left-hand
side of a FD. Determine the set,
X+, the closure of X under F.
A. Yazici, CEng 553
Spring 2008 22
Closure, F+
Example: Contracts (contractid,
supplierid, projid, deptid, partid,
qty, value).
We denote the schema for
Contracts as CSJDPQV. The
meaning of a tuple is that the
contract with contractid C is an
aggrement that supplier S
(supplierid) will suply Q items of
part P (partid) to project J
(projectid)
associated
with
department D (deptid); the value
V of this contract id equal to
value.
The following ICs are known to
hold:
F = {C CSJDPQV, JP C,
SD P}
A. Yazici, CEng 553
Spring 2008 23
Closure, F+
Example: Contracts (contractid, supplierid,
projid, deptid, partid, qty, value).
F = {C CSJDPQV, JP C, SD P}
• Several additional Fds hold in the closure
of the set of given Fds:
• From JPC, CCSJDPQV, and
transitivity, we infer JP CSJDPQV.
• From SD P and augmentation, we infer
SDJ JP and then by transitivity, SDJ CSJDPQV.
• We can infer several additional FDs that
are in the closure by using augmentation
or decomposition. For example, from
CCSJDPQV, using decomposition, we
can infer C C, C S, C J, and so
forth.
• Finally we have a number of trivial FDs
from the reflexivity rule.
A. Yazici, CEng 553
Spring 2008 24
How to Compute Meaning
when do sets of FDs mean the same?
• F covers E if every FD in E is also
in F+
F+
≡
E
F
• F and E are equivalent if F covers
E and E covers F.
• We can determine whether F covers
E by calculating X+ with respect to F
for each FD, X→Y in E, and then
checking whether this X+ includes
the attributes in Y+. If this is the
case for every FD in E, then F
covers E.
A. Yazici, CEng 553
Spring 2008 25
Example
• R = (A, B, C, G, H, I)
F={ A→B
A→C
CG → H
CG → I
B → H}
• some members of F+
– A→H
• by transitivity from A → B and B → H
– AG → I
• by augmenting A → C with G, to get AG →
CG and then transitivity with CG → I
– CG → HI
• by augmenting CG → I to infer CG → CGI,
and augmenting of CG → H to infer
CGI → HI, and then transitivity
A. Yazici, CEng 553
Spring 2008 26
Procedure for Computing F+
• To compute the closure of a set of
FDs F:
F+=F
repeat
for each FD f in F+ apply
reflexivity and augmentation
rules
on f add the resulting FDs to F+
for each pair of FDs f1and f2 in
F+
if f1 and f2 can be combined
using transitivity
then add the resulting FDs to
F+
until F+ does not change any
further
A. Yazici, CEng 553
Spring 2008 27
How to Compute Meaning
when do sets of FDs mean the same?
Alg: Determining X+, the closure of X under F
X+ = X;
Repeat
Old X+ = X+
For each FD Y Z in F do
If X+ ⊇ Y Then X+ = X+ ∪ Z ;
Until (X+ = old X+);
Example: {Ssn Ename,
Pnumber {Pname, Plocation},
{Ssn,Pnumber} Hours}
{Ssn}+ = {Ssn,Ename}
{Pnumber}+ = {Pnumber, Pname,Plocation}
{Ssn,Pnumber}+ = {Ssn, Pnumber, Ename,
Pname, Plocation, Hours}
A. Yazici, CEng 553
Spring 2008 28
Example of Attribute Set Closure
• R = (A, B, C, G, H, I)
• F = {A → B
A→C
CG → H
CG → I
B → H}
• (AG)+
1. result = AG
2. result = ABCG (A → C and A → B)
3. result = ABCGH (CG → H and CG⊆
AGBC)
4. result = ABCGHI (CG → I and CG ⊆
AGBCH)
• Is AG a candidate key?
1. Is AG a super key?
1. Does AG → R? == Is (AG)+ ⊇ R
2. Is any subset of AG a superkey?
1. Does A → R? == Is (A)+ ⊇ R
2. Does G → R? == Is (G)+ ⊇ R
A. Yazici, CEng 553
Spring 2008 29
Uses of Attribute Closure
There are several uses of the
attribute closure algorithm:
• Testing for superkey:
– To test if α is a superkey, we
compute α+, and check if α+ contains
all attributes of R.
• Testing functional dependencies
– To check if a functional dependency
α → β holds (or, in other words, is in
F+), just check if β ⊆ α+.
– That is, we compute α+ by using
attribute closure, and then check if it
contains β.
– Is a simple and cheap test, and very
useful
• Computing closure of F
– For each γ ⊆ R, we find the closure
γ+, and for each S ⊆ γ+, we output a
FD γ → S.
A. Yazici, CEng 553
Spring 2008 30
How to Compute Meaning
- minimal cover of a set of FDs
• Is there a minimal set of ribs that will hold
the umbrella open?
F is minimal if:
• every dependency in F has a single
attribute as right-hand side.
• we can’t replace any dependency X→A in
F with a dependency Y→A where Y⊂X
and still have a set of dependencies
equivalent with F. This ensures that there
are no redundancies by having redundant
attributes on the left-hand side of a
dependency.
• we can’t remove any dependency from F
and still have a set of dependencies
equivalent with F. This ensures that there
are
no
redundancies
by
having
dependency that can be inferred from the
remaining FDs in F.
A. Yazici, CEng 553
Spring 2008 31
How to Compute Meaning
- minimal cover of a set of FDs
Alg:Finding Minimal Cover F for a set of Fds E.
1. Put the FDs in a standard Form: Obtain a collection
G of equivalent FDs with asingle attribute on the
right side (using the decomposition axiom)
That is; replace each Fd X {A1,..., An} in F by
the n Fds XA1, XA2,..., XAn.
2. Minimize the left side of each FD: For each FD in G,
check each attribute in the left side to see if it can
be deleted while preserving equivalence to F+.
For each Fd XA in F
For each attribute B∈X
If ((F - {XA}) ∪ {(X - {B}) A}) ≡ F
Then replace XA with (X - {B}) A in F.
3. Delete redundant FDs: That is; for each remaining
Fd XA in F If (F - {XA}) ≡ F, then remove XA
from F.
Example: E = {Ssn Ename, Pnumber {Pname,
Plocation}, {Ssn,Pnumber} Hours}}
F= {Ssn Ename, Pnumber Pname, Pnumber Plocation, {Ssn,Pnumber} Hours}
A. Yazici, CEng 553
Spring 2008 32
How to Compute Meaning
- minimal cover of a set of FDs
Example: Finding Minimal Cover F for a set
of Fds E.
F = {A B, ABCD E, EF G, EF H, and
ACDF EG}.
•
•
•
•
•
Let us rewrite ACDF EG so that every
right side is a single attribute: ACDF E
and ACDF G.
Next consider ACDF G. This
dependency is implied by the following
FDs:
A B, ABCD E, and EF G.
Therefore, we can delete ACDF G.
Similarly, we can delete ACDF E.
Next consider ABCDE. Since AB
holds, we can replace it with ACD E.
Therfore, we can delete ABCD E.
A this point one can verify that each
remaining FD is minimal and required.
Thus, a minimal cover for F is the set:
F = {A B, ACD E, EF G, EF H}.
A. Yazici, CEng 553
Spring 2008 33
How to guarantee
lossless joins
R1 R2=R
• Decompose
relation,
R,
with
functional dependencies, F, into
relations, R1 and R2, with attributes,
A1 and A2, and associated FDs, F1
and F2.
• The decomposition is lossless iff:
• R1∩R2 → R1 - R2 is in F+, or
• R1∩R2 → R2 - R1 is in F+
A. Yazici, CEng 553
Spring 2008 34
Example
• R = (A, B, C)
F = {A → B, B → C)
– Can be decomposed in two different
ways
• R1 = (A, B), R2 = (B, C)
– Lossless-join decomposition:
R1 ∩ R2 = {B} and R2 - R1 = {C}
Therefore, B → C
– Dependency preserving
• R1 = (A, B), R2 = (A, C)
– Lossless-join decomposition:
R1 ∩ R2 = {B} and R1 - R2 = {A}
Therefore, B → A
– Not dependency preserving
(cannot check A → B without
computing R1 R2)
A. Yazici, CEng 553
Spring 2008 35
Lossless-Join
(a)Applying the algorithm to test the decomposition
of EMP_PROJ into EMP_PROJ1 and
EMP_LOCS.
R={SSN, ENAME, PNUMBER,PNAME, PLOCATION,
HOURS}
R1=EMP_LOCS={ENAME, PLOCATION}
R2=EMP_PROJ1={SSN, PNUMBER, HOURS,
PNAME, PLOCATION}
F={SSN→ENAME; PNUMBER→{PNAME,
PLOCATION}; {SSN,PNUMBER} →HOURS}
SSN ENAME PNUMBER PNAME PLOCATION HOURS
R1
b11
a2
b13
b14
a5
b16
R2
a1
b22
a3
a4
a5
a6
(no changes after to matrix after applying functional
dependencies )
A. Yazici, CEng 553
Spring 2008 36
Lossless-Join
(b) Another decomposition of EMP_PROJ.
EMP
SSN
PROJECT
ENAME
PNUMBER
PNAME
PLOCATION
WORKS_ON
SSN
PNUMBER
A. Yazici, CEng 553
HOURS
Spring 2008 37
Lossless-Join
(c) Applying the algorithm to the decomposition in
figure (b).
R={SSN, ENAME, PNUMBER,PNAME, PLOCATION,
HOURS}
R1=EMP={SSN, ENAME}
R2 =PROJ={PNUMBER, PNAME, PLOCATION}
R3=WORKS_ON={SSN, PNUMBER, HOURS}
F={SSN→ENAME; PNUMBER→{PNAME,
PLOCATION}; {SSN,PNUMBER} →HOURS}
SSN ENAME PNUMBER PNAME PLOCATION HOURS
R1
a1
a2
b13
b14
b15
b16
R2
b21
b22
a3
a4
a5
b26
R2
a1
b32
a3
b34
b35
a6
(original matrix S at start of algorithm )
A. Yazici, CEng 553
Spring 2008 38
Lossless-Join
(c (cont.)) Applying the algorithm to the decomposition
in figure (b).
R={SSN, ENAME, PNUMBER,PNAME, PLOCATION,
HOURS}
R1=EMP={SSN, ENAME}
R2 =PROJ={PNUMBER, PNAME, PLOCATION}
R3=WORKS_ON={SSN, PNUMBER, HOURS}
F={SSN→ENAME; PNUMBER→{PNAME,
PLOCATION}; {SSN,PNUMBER} →HOURS}
SSN ENAME PNUMBER PNAME PLOCATION HOURS
R1
a1
a2
b13
b14
b15
b16
R2
b21
b22
a3
a4
a5
b26
R2
a1
(b32→ a2 ) a3
(b34→ a4 ) (b35→ a5 )
a6
(matrix S after the first two functional dependencies.
last row is all “a” symbols, so we stop.)
A. Yazici, CEng 553
Spring 2008 39
How to guarantee
preservation of FDs
F+=(F1∪... ∪ Fk)+
• Decompose
relation,
R,
with
functional dependencies, F, into
relations, R1,..., Rk, with associated
functional dependencies, F1,..., Fk.
• The decomposition is dependency
preserving iff:
• F+ = (F1∪... ∪ Fk)+
A. Yazici, CEng 553
Spring 2008 40
Overview of Normal Forms
NF2
1NF
2NF
3NF
BCNF
4NF
5NF
A. Yazici, CEng 553
Spring 2008 41
Normal Forms
- definitions
• NF2: non-first normal form
• 1NF: R is in 1NF. iff all domain
values are atomic2
• 2NF: R is in 2. NF. iff R is in 1NF
and every nonkey attribute is fully
dependent on the key
• 3NF: R is in 3NF iff R is 2NF and
every nonkey attribute is nontransitively dependent on the key
• BCNF: R is in BCNF iff every
determinant is a candidate key
• Determinant: an attribute on
which some other attribute(s) is
fully functionally dependent.
A. Yazici, CEng 553
Spring 2008 42
Example of Normalization
FLT-INSTANCE
flt# date plane# airline from to miles
airline
plane#
flt#
date
from
to
miles
A. Yazici, CEng 553
Spring 2008 43
Example of Normalization
airline
1NF:
flt#
date
plane#
from
to
miles
2NF:
airline
plane#
flt#
date
flt#
from
to
miles
A. Yazici, CEng 553
Spring 2008 44
Example of Normalization
3NF &
BCNF:
flt#
plane#
flt#
date
airline
from
to
from
to
miles
A. Yazici, CEng 553
Spring 2008 45
3NF that is not BCNF
R
A
B
C
A
C
B
Candidate keys: {A,B} and {A,C}
Determinants: {A,B} and {C}
A decomposition:
R1
C
R2
B
A
C
Lossless, but not dependency
preserving!
A. Yazici, CEng 553
Spring 2008 46
Normalization Theory
Alg: Relational decomposition into BCNF with
nonadditive (lossless) join property
1. Decomposes a universal relational schema
R = {A1,...,An} into a decomposion D =
{R1,...,Rm}
2. Set D = {R}.
3. While there is a relation schema Q in D that
is not in BCNF do {
Choose a relation schema Q in D that is
not in BCNF;
Find a FD XY in Q that violates BCNF
Replace Q in D by two relation schemas
(Q-Y) and (X ∪Y); };
Example:
Q = {S,B,D}, sailor S has reserved
boat B on date D. F = {SBD, DB}.
SBD: a sailor can reserve a given boat
for at most one day, DB: on any given
day at most one boat can be reserved).
So, DB violates BCNF.
Q is decomposed into two relations,
Q1 = {S,D} and Q2= {D,B} and
SBD cannot be preserved.
A. Yazici, CEng 553
Spring 2008 47
Normalization Theory
Alg: Relational decomposition into BCNF with
nonadditive (lossless) join property
Example:Contracts (contractid, supplierid,
projid, deptid, partid, qty, value). That
is, R = CSJDPQV, and
F= {CCSJDPQV, JP C, SD P}.
1. We get two schemas: SDP and
CSJDQV. SDP is in BCNF.
2. Suppose that we also have the constraint
that each project deals with a single
supplier: JS. Then CSJDQV is not in
BCNF.
3. Decompose it: SDP, JS, CJDQV
4. This decomposition is not dependency
preserving. FD JPC cannot be
enforced without a join.
5. One way to deal with this is to add CJP,
which amounts to storing some
information redundantly to make the
dependency enforcement cheaper.
6. This example shows that redundancy
can still occur across relations, even
though there is no redundancy within a
relation.
A. Yazici, CEng 553
Spring 2008 48
Normalization Theory
Alg:Relational synthesis into 3NF with dependency
preserving and nonadditive (lossless) join property
1. Find a minimal cover G for F.
2. For each left-hand-side X of a FD that appears in G
create a relation schema in D with attributes {X ∪
{A1} ∪,..., ∪ {Ak}}, where X A1, ..., X Ak are the
only dependencies in G with X as left-hand-side (X
is the key of this relation).
3. If none of the relation schemas in D contains a key
of R, then create one more relation schema in D that
contains attributes that form a key of R.
Example: Contracts (contractid, supplierid, projid,
deptid, partid, qty, value). That is, R = CSJDPQV,
and F={CCSJDPQV, JPC, SDP, JS}
1. Minimal cover = {CJ, CD, CQ, CV, SDP,
JS, JPC} .
(CP is implied by CS, CD, and SDP, so we
can delete CP. Similarly, CS is implied by CJ
and JS; so we can delete CS.)
2. We obtain the following relations: CJ, CD, CQ, CV,
CJP, SDP, and JS}
We can improve this schema by combining relations for
which C is the key into CDJQV, in addiiton to CDP and
JS.
Note: Compare this schema with the one we found eairlier:
SDP, JS, CJDQV, and CJP. There could be significant
differences.
A. Yazici, CEng 553
Spring 2008 49
Normalization Theory
Alg:Relational synthesis into 3NF with dependency preserving and
nonadditive (lossless) join property
1.
Find a minimal cover G for F.
2.
For each left-hand-side X of a FD that appears in G create a
relation schema in D with attributes {X ∪ {A1} ∪,..., ∪ {Ak}},
where X A1, ..., X Ak are the only dependencies in G
with X as left-hand-side (X is the key of this relation).
3.
If none of the relation schemas in D contains a key of R,
then create one more relation schema in D that contains
attributes that form a key of R.
Example: R = ABC, and F={AB, CB},
• Key is AC.
• When we use standart process of
repeated decomposition, we obtain the
following:
R1=AB, R2=BC,
Since AB ∪ BC = B and BA or BC
does not exist, it is not lossless.
•
•
However, AB, BC, AC is dependency
preserving and lossless-join of R.
We obtain this result through a process
of synthesis rather than through a
process of repeated decomposition.
A. Yazici, CEng 553
Spring 2008 50
Multivalued Dependencies:
A New Form of Redundancy
• Multivalued
dependencies
(MVD’s) express a condition
among tuples of a relation that
exists when the relation is trying
to represent more than one
many-many relationship.
• Then certain attributes become
independent of one another, and
their values must appear in all
combinations.
A. Yazici, CEng 553
Spring 2008 51
A New Form of
Redundancy
EMP
Ename
Proj-name Dep-name
Smith
{X,Y,Z}
{Anna,John}
Suzan
{X,Z}
{Ali, Barış}
• This relation represents
independent 1:N relationships,
between employees and projects
the other between employees
their dependents
A. Yazici, CEng 553
two
one
and
and
Spring 2008 52
Multivalued
Dependencies (MVDs)
• Let R be a relation schema and
let α ⊆ R and β ⊆ R.
The
multivalued dependency
α →→ β
holds on R if in any legal relation
r(R), for all pairs for tuples t1 and
t2 in r such that t1[α] = t2 [α], there
exist tuples t3 and t4 in r such
that:
t1[α] = t2 [α] = t3 [α] = t4 [α]
t3[β] = t1 [β]
t3[R – α–β] = t2[R – α–β]
t4 [β] = t2[β]
t4[R – α–β] = t1[R – α–β]
A. Yazici, CEng 553
Spring 2008 53
MVD (Cont.)
• Tabular representation of α →→ β
Note that since the behavior of Z
and W are identical it follows that
α →→ β if α →→ R- α- β
A. Yazici, CEng 553
Spring 2008 54
MVD
Example: course →→ teacher,
course →→ book
• The above formal definition is
supposed to formalize the notion
that given a particular value of Y
(course) it has associated with it a
set of values of Z (teacher) and a
set of values of W (book), and these
two sets are in some sense
independent of each other.
• Note:
– If Y → Z then Y →→ Z
– Indeed we have (in above
notation) Z1 = Z2. The claim
follows.
A. Yazici, CEng 553
Spring 2008 55
Armstrong’s inference rules for
MVDs
Rules of the computation:
– Reflexivity for FDs: if Y⊆ X, then X→Y
– Augmentation for FDs: if X→Y, then
WX→WY
– Transitivity for FDs: if X→Y and Y→Z,
then X→Z
– Complementation rule for MVDs): if
XY, then X (R-(X∪Y)
– Augmentation rule for MVDs: if XY
and W ⊆ Z, then WXYZ.
– Transitive rule for MVDs: if XY and
YZ, then X(Z-Y).
– Replication rule for FDs and MVDs: If
XY, then XY.
– sound (generate only dependencies
that actually hold) and
– complete (generate all dependencies
that hold).
A. Yazici, CEng 553
Spring 2008 56
Use of Multivalued
Dependencies
• We use multivalued dependencies
in two ways:
1. To test relations to determine whether
they are legal under a given set of
functional
and
multivalued
dependencies.
2. To specify constraints on the set of
legal relations. We shall thus concern
ourselves only with relations that
satisfy a given set of functional and
multivalued dependencies.
• If a relation r fails to satisfy a given
MVD, we can construct a relation r′
that does satisfy the MVD by
adding tuples to r.
A. Yazici, CEng 553
Spring 2008 57
4NF- Fourth Normal Form
• A relation schema R is in 4NF with
respect to a set D of functional and
multivalued dependencies if for all
multivalued dependencies in D+ of
the form α →→ β, where α ⊆ R and
β ⊆ R, at least one of the following
hold:
– α →→ β is trivial (i.e., β ⊆ α or
α ∪ β = R)
– α is a superkey for schema R
• If a relation is in 4NF it is in BCNF
A. Yazici, CEng 553
Spring 2008 58
4NF Definition
(alternative defn)
•
A relation R
is in 4NF if
whenever X ->->Y is a nontrivial
MVD, then X is a superkey.
– “Nontrivial means that:
1. Y is not a subset of X, and
2. X and Y are not, together,
all the attributes.
– Note that the definition of
“superkey” still depends on
FD’s only.
A. Yazici, CEng 553
Spring 2008 59
4NF
• Example: R =(A, B, C, G, H, I)
F ={ A →→ B
B →→ HI
CG →→ H }
• R is not in 4NF since A →→ B and A is
not a superkey for R
• Decomposition
a) R1 = (A, B)
(R1 is in 4NF)
b) R2 = (A, C, G, H, I) (R2 is not in
4NF)
c) R3 = (C, G, H)
(R3 is in 4NF)
d) R4 = (A, C, G, I) (R4 is not in 4NF)
(Since A →→ B and B →→ HI, A →→ HI,
A →→ I)
e) R5 = (A, I)
(R5 is in 4NF)
f)R6 = (A, C, G)
(R6 is in 4NF)
A. Yazici, CEng 553
Spring 2008 60
BCNF Versus 4NF
• Remember that every FD X Y
is also an MVD, X Y.
• Thus, if R is in 4NF, it is certainly
in BCNF.
– Because any BCNF violation is a
4NF violation.
• But R could be in BCNF and not
in 4NF, because MVD’s are
“invisible” to BCNF.
Note: If a relation schema is in
BCNF and at least one of its keys
consists of a single attribute, it is
also in 4NF.
A. Yazici, CEng 553
Spring 2008 61
4NF
Example: Drinkers(name, addr,
phones, beersLiked)
FD:
MVD’s:
name addr
name phones
name beersLiked
• Key is {name, phones,
beersLiked}.
• Therefore, all dependencies
violate 4NF.
A. Yazici, CEng 553
Spring 2008 62
Join Dependency
• Join dependencies generalize
multivalued dependencies
– lead to project-join normal
form (PJNF) (also called fifth
normal form (5NF))
• Problem with these generalized
constraints: are hard to reason
with, and no set of sound and
complete set of inference rules
exists.
• Hence rarely used
A. Yazici, CEng 553
Spring 2008 63
Join Dependecy
• Consider the SUPPLY all-key
relation.
• Suppose
that
the
following
additional constraint always holds:
Whenever a supplier s supplies
part p, and a project j uses part p,
and the supplier s supplies at least
one part to project j, then supplier s
will also be supplying part p to
project j.
• This constraint can be restated in
other ways and specifies a join
dependency JD(R1,R2,R3) among
the three projections
• R1(sname,partname),
R2(sname,projname), and
R3(partname,projname) of
SUPPLY.
A. Yazici, CEng 553
Spring 2008 64
Join Dependency
SUPPLY
sname
partname projname
Smith
bolt
projX
Smith
nut
projY
Adam
bolt
projY
Walton
nut
projZ
Adam
nail
projX
Adam
bolt
projX
Smith
bolt
projY
A. Yazici, CEng 553
Spring 2008 65
Join Dependecy
• If JD constraint holds, the
tuples below the dotted
line in the figure, must
exist in any legal state of
the SUPPLY relation that
also contains the tuples
above the dotted line.
A. Yazici, CEng 553
Spring 2008 66
Join Dependecy
• Join dependencies(JD), denoted by
JD(R1, R2, ..., Rn), specified on
relation schema R, specifies a
constraint on the states r of R.
• The constraint states that every
legal state r of R should have a
lossless join decomposition into R1,
R2, ..., Rn; that is, for every such r
we have
r = (ΠR1(r), ΠR2(r), ..., ΠRn(r))
• Notice that an MVD is a special
case of a JD where n = 2.
• A JD(R1, R2, ..., Rn), specified on
relation schema R, is a trivial JD if
one of the relation schemas Ri in
JD(R1, R2, ..., Rn) is equal to R.
A. Yazici, CEng 553
Spring 2008 67
5NF
• A relation schema R is said
to be in fifth normal form
(5NF) if, for every JD
{R1, ...,Rn} that holds over
R, one of the following
statements is true:
• Ri = R for some i, or (trivial
JD)
• The JD is implied by the set
of those FDs over R in
which the left side is a key
for R.
A. Yazici, CEng 553
Spring 2008 68
5NF
SUPPLY = (sname, partname,
projname)
Decomposed into R1,R2,R3:
R1 (sname, partname),
R2 (sname, projname),
R3 (partname, projname)
of SUPPLY.
A. Yazici, CEng 553
Spring 2008 69
5NF
Note: If a relation schema is
in 3NF and each of its keys
consists of a single attribute,
it is also in 5NF.
This result can be very
useful in practice because it
allows us to conclude that a
relation is in 5NF without
ever identifying the MVDs
and JDs that may hold over
the relation.
A. Yazici, CEng 553
Spring 2008 70
Inclusion Dependencies
• They are common but they
have little influence on DB
design.
• Some columns of a relation
are contained in other
columns (usually of a second
relation.)
• A foreign key (FK) constraint
is an example of an inclusion
dependency.
Example: R.A ⊆ S.B
A. Yazici, CEng 553
Spring 2008 71
Inclusion Dependencies
• We should not split groups of
atributes that participate in an
inclusion
dependency.
(i.e.,
happens when going from 3NF to
BCNF)
• Ex. AB⊆CD, while decomposing
the relation schema containing AB,
we should ensure that at least one
of the schemas obtained in the
decomposition contains both A and
B (Otherwise we cannot check the
inclusion dependency AB⊆CD without
the reconstruction.)
• Most inclusion dependencies are
key-based (i.e., FK)
• EER diagram that invloves ISA
hierarchies also leads to key-based
inclusion dependencies.
A. Yazici, CEng 553
Spring 2008 72