Normalization

Functional Dependencies
and Normalization
Instructor: Mohamed Eltabakh
[email protected]
1
What to Cover

Functional Dependencies (FDs)

Closure of Functional Dependencies

Lossy & Lossless Decomposition

Normalization
2
Decomposing Relations
sNumber
StudentProf
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
MM
FDs: pNumber  pName
Student
sNumber
s1
s2
sName
Dave
Greg
Lossless
pNumber
p1
p2
Student
sNumber
S1
S2
sName
Dave
Greg
pNumber
p1
p2
Lossy
pName
MM
MM
Professor
pName
MM
MM
Professor
pNumber
p1
p2
pName
MM
MM
3
Lossless vs. Lossy Decomposition

Assume R is divided into R1 and R2

Lossless Decomposition


R1 natural join R2 should create exactly R
Lossy Decomposition

R1 natural join R2 adds more records (or deletes
records) from R
4
Lossless Decomposition
sNumber
StudentProf
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
MM
FDs: pNumber  pName
Student
sNumber
s1
s2
sName
Dave
Greg
Lossless
pNumber
p1
p2
Professor
pNumber
p1
p2
pName
MM
MM
Student & Professor are lossless decomposition of StudentProf
(Student ⋈ Professor = StudentProf)
5
Lossy Decomposition
sNumber
StudentProf
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
MM
FDs: pNumber  pName
Student
sNumber
S1
S2
sName
Dave
Greg
Lossy
pName
MM
MM
Professor
pNumber
p1
p2
pName
MM
MM
Student & Professor are lossy decomposition of StudentProf
(Student ⋈ Professor != StudentProf)
6
Goal: Ensure Lossless
Decomposition

How to ensure lossless decomposition?

Answer:

The common columns must be candidate key in
one of the two relations
7
Back to our example
sNumber
StudentProf
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
MM
pNumber is
candidate key
FDs: pNumber  pName
Student
sNumber
s1
s2
sName
Dave
Greg
Lossless
pNumber
p1
p2
Student
sNumber
S1
S2
sName
Dave
Greg
pNumber
p1
p2
Lossy
pName
MM
MM
Professor
pName
MM
MM
pName is not
candidate key
Professor
pNumber
p1
p2
pName
MM
MM
8
What to Cover

Functional Dependencies (FDs)

Closure of Functional Dependencies

Lossy & Lossless Decomposition

Normalization
9
Normalization
10
Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs
11
Normalization

Set of rules to avoid “bad” schema design

Decide whether a particular relation R is in “good” form


Several levels of normalization





If not, decompose R to be in a “good” form
First Normal Form (1NF)
BCNF
Third Normal Form (3NF)
Fourth Normal Form (4NF)
If a relation is in a certain normal form, then it is known that
certain kinds of problems are avoided or minimized
12
First Normal Form (1NF)

Attribute domain is atomic if its elements are considered to
be indivisible units (primitive attributes)

Examples of non-atomic domains are multi-valued and
composite attributes

A relational schema R is in first normal form (1NF) if the
domains of all attributes of R are atomic
We assume all relations are in 1NF
13
First Normal Form (1NF):
Example
Since all attributes are primitive  It is in 1NF
14
Boyce-Codd Normal Form
(BCNF): Definition
A relation schema R is in BCNF with respect to a
set F of functional dependencies if for all functional
dependencies in F+ of the form
α→β
where α ⊆ R and β ⊆ R, then at least one of the
following holds:


α → β is trivial (i.e.,β⊆α)
α is a superkey for R
Remember:
Candidate keys are also
superkeys
15
BCNF: Example
Student
sNumber
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
ER
s3
Mike
p1
MM
Student Info
Professor Info
Is relation Student in BCNF given pNumber  pName


It is not trivial FD
pNumber is not a key in Student relation
NO
How to fix it and make it in BCNF???
16
Decomposing a Schema into
BCNF

If R is not in BCNF because of non-trivial
dependency α → β, then decompose R

R is decomposed into two relations


R1 = (α U β ) -- α is super key in R1
R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α
17
Example of BCNF Decomposition
StudentProf
sNumber
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
MM
FDs: pNumber  pName
Student
Professor
sNumber
sName
pNumber
pNumber
pName
s1
Dave
p1
p1
MM
s2
Greg
p2
p2
MM
FOREIGN KEY: Student (PNum) references Professor (PNum)
18
What is Nice about this
Decomposing ???

R is decomposed into two relations


R1 = (α U β ) -- α is super key in R1
R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α
This decomposition is lossless
(Because R1 and R2 can be joined based on α, and α is
unique in R1)

When you join R1 and R2 on α, you get R back
without lose of information
19
StudentProf = Student ⋈
Professor
StudentProf
sNumber
sName
pNumber
pName
s1
Dave
p1
MM
s2
Greg
p2
MM
FDs: pNumber  pName
Student
Professor
sNumber
sName
pNumber
pNumber
pName
s1
Dave
p1
p1
MM
s2
Greg
p2
p2
MM
BCNF decomposition rule create lossless decomposition
20
Multi-Step Decomposition

Relation R and functional dependency F


R = (customer_name, loan_number, branch_name, branch_city, assets, amount )
F = {branch_name  assets branch_city,
loan_number  amount branch_name}

Is R in BCNF ?? NO

Based on branch_name  assets branch_city


R1 = (branch_name, assets, branch_city)
R2 = (customer_name, loan_number, branch_name, amount)

Are R1 and R2 in BCNF ?

Divide R2 based on loan_number  amount branch_name


R2 is not
R3 = (loan_number, amount, branch_name)
R4 = (customer_name, loan_number)
Final Schema has R1, R3, R4
21
What is NOT Nice about BCNF
Before decomposition, we had set of functional
dependencies FDs (Say F)
After decomposition, do we still have the same set of
FDs or we lost something ??
22
What is NOT Nice about BCNF

Dependency Preservation

After the decomposition, all FDs in F+ should be preserved

BCNF does not guarantee dependency preservation

Can we always find a decomposition that is both
BCNF and preserving dependencies?


No…This decomposition may not exist
That is why we study a weaker normal form called (third
normal form –3NF)
23
Dependency Preserving
Assume R is decomposed to R1 and R2
Dependencies of R1 and R2 include:
 Local dependencies α → β


All columns of α and β must be in a single relation
Global Dependencies

Use transitivity property to form more FDs across R1 and R2
relations
Yes  Dependency preserving
Does these dependencies match
the ones in R ?
No  Not dependency preserving
24
Example of Lost FD

Assume relation R(C, S, J, D, T, Q, V)

C is key, JT  C and SD  T




(C is key) -- Good for BCNF
(JT is key) -- Good for BCNF
(SD is not a key) –Bad for BCNF
Decomposition:


C  CSJDTQV
JT  CSJDTQV
SD  T
R1(C, S, J, D, Q, V) and R2(S, D, T)
Lossless & in BCNF
Does C CSJDTQV still exist?

Yes: C CSJDQV (local), SDT (local), C CSJDQVT (global)
25
Example of Lost FD (Cont’d)

Assume relation R(C, S, J, D, T, Q, V)

C is key, JT  C and SD  T




(C is key) -- Good for BCNF
(JT is key) -- Good for BCNF
(SD is not a key) –Bad for BCNF
Decomposition:


C  CSJDTQV
JT  CSJDTQV
SD  T
R1(C, S, J, D, Q, V) and R2(S, D, T)
Lossless & in BCNF
Does SD T still exist?

Yes: SDT (local)
26
Example of Lost FD (Cont’d)

Assume relation R(C, S, J, D, T, Q, V)

C is key, JT  C and SD  T




(C is key) -- Good for BCNF
(JT is key) -- Good for BCNF
(SD is not a key) –Bad for BCNF
Decomposition:


C  CSJDTQV
JT  CSJDTQV
SD  T
R1(C, S, J, D, Q, V) and R2(S, D, T)
Lossless & in BCNF
Does JT CSJDTQV still exist?

No this one is lost (no way from the local FDs to get this one)
27
Dependency Preservation Test

Assume R is decomposed into R1 and R2
local dependencies in R1

The closure of FDs in R is F+

The FDs in R1 and R2 are FR1 and FR2, respectively

Then dependencies are preserved if:

F+ = (FR1 union FR2)+
local dependencies in R2
28
Back to Our Example

Assume relation R(C, S, J, D, T, Q, V)

C is key, JT  C and SD  T








(C is key) -- Good for BCNF
(JT is key) -- Good for BCNF
(SD is not a key) –Bad for BCNF
Decomposition:


C  CSJDTQV
JT  CSJDTQV
SD  T
R1(C, S, J, D, Q, V) and R2(S, D, T)
F+ = {C  CSJDTQV, JT CSJDTQV, SD T}
FR1 = {C  CSJDQV}  local for R1
JT  C is still missing
FR2 = {SD  T}
 local for R2
FR1 U FR2 = {C  CSJDQV, SD  T}
(FR1 U FR2)+ = {C  CSJDQV, SD  T, C T}
29
Dependency Preservation
BCNF does not necessarily preserve FDs.
But 3NF is guaranteed to be able to preserve FDs.
30
Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs
31
Third Normal Form: Motivation

There are some situations where


BCNF is not dependency preserving
Solution: Define a weaker normal form, called Third
Normal Form (3NF)


Allows some redundancy (we will see examples later)
But all FDs are preserved
There is always a lossless, dependencypreserving decomposition in 3NF
32
Normal Form : 3NF
Relation R is in 3NF if, for every FD in F+
α  β,
where α ⊆ R and β ⊆ R, at least one of the following holds:

α → β is trivial (i.e.,β⊆α)

α is a superkey for R

Each attribute in β-α is part of a candidate key (prime attribute)
L.H.S is superkey OR
R.H.S consists of prime attributes
33
Testing for 3NF

Use attribute closure to check for each dependency α → β,
if α is a superkey

If α is not a superkey, we have to verify if each attribute in
(β- α) is contained in a candidate key of R
34
3NF: Example
Lot (ID, county, lotNum, area, price, taxRate)
Primary key: ID
Candidate key: <county, lotNum>
FDs:

county  taxRate
area  price
Is relation Lot in 3NF ?
NO
Decomposition based on county  taxRate
Lot (ID, county, lotNum, area, price)
County (county, taxRate)

Are relations Lot and County in 3NF ?
Lot is not
35
3NF: Example (Cont’d)
Lot (ID, county, lotNum, area, price)
County (county, taxRate)
Candidate key for Lot: <county, lotNum>
FDs:
county  taxRate
area  price
Decompose Lot based on area  price
Lot (ID, county, lotNum, area)
County (county, taxRate)
Area (area, price)

Is every relation in 3NF ?
YES
36
Comparison between 3NF & BCNF ?

If R is in BCNF, obviously R is in 3NF

If R is in 3NF, R may not be in BCNF

3NF allows some redundancy and is weaker than BCNF

3NF is a compromise to use when BCNF with good constraint
enforcement is not achievable

Important: Lossless, dependency-preserving decomposition of
R into a collection of 3NF relations always possible !
37
Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs
38
Canonical Cover of FDs
39
Canonical Cover of FDs

Canonical Cover (Minimal Cover) = G



Is the smallest set of FDs that produce the same F+
There are no extra attributes in the L.H.S or R.H.S of and
dependency in G
Given set of FDs (F) with functional closure F+

Canonical cover of F is the minimal subset of FDs (G),
where
G + = F+
Every FD in the canonical cover is needed,
otherwise some dependencies are lost
40
Example : Canonical Cover

Given F:


A  B, ABCD  E, EF  GH, ACDF  EG
Then the canonical cover G:

A  B, ACD  E, EF  GH
The smallest set (minimal) of FDs that
can generate F+
41
Computing the Canonical
Cover

Given a set of functional dependencies F,
how to compute the canonical cover G
42
Example : Canonical Cover
(Lets Check L.H.S)



Given F = {A  B, ABCD  E, EF  G, EF  H, ACDF  EG}
Union Step: {A  B, ABCD  E, EF  GH, ACDF  EG}
Test ABCD  E

Check A:


 A cannot be deleted
{BCD}+ = {BCD}
Check B:

{ACD}+ = {A B C D E}
 Then B can be deleted

Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG}

Test ACD  E

Check C:


{AD}+ = {ABD}
 C cannot be deleted
Check D:

{AC}+ = {ABC}
 D cannot be deleted
43
Example: Canonical Cover
(Lets Check L.H.S-Cont’d)

Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG}

Test EF  GH
 Check E:


 E cannot be deleted
Check F:


{F}+ = {F}
{E}+ = {E}
 F cannot be deleted
Test ACDF  EG

None of the H.L.S can be deleted
44
Example: Canonical Cover
(Lets Check R.H.S)

Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG}

Test EF  GH
 Check G:


Check H:


{EF}+ = {E F G}
 H cannot be deleted
Test ACDF  EG

Check E:


{EF}+ = {E F H}  G cannot be deleted
{ACDF}+ = {A B C D F E G}  E can be deleted
Now the set is: {A  B, ACD  E, EF  GH, ACDF  G}
45
Example: Canonical Cover
(Lets Check R.H.S-Cont’d)

Now the set is: {A  B, ACD  E, EF  GH,
ACDF  G}

Test ACDF  G

Check G:

{ACDF}+ = {A B C D F E G}  G can be deleted
Now the set is: {A  B, ACD  E, EF  GH}
The canonical cover is:
{A  B, ACD  E, EF  GH}
46
Canonical Cover

Used to find the smallest (minimal) set of FDs
that have the same closure as the original set.

Used in the decomposition of relations to be in
3NF

The resulting decomposition is lossless and
dependency preserving
47
Done with Normalization

First Normal Form (1NF)

Boyce-Codd Normal Form (BCNF)

Third Normal Form (3NF)

Canonical Cover of FDs
48
Questions ?
49
What You Learned

Data Models


Entity-Relationship Model & ERD
Relational Model

Conversion between the data models

Relational Algebra & Operators

Structured Query Language SQL


DML: Data Manipulation Language
DDL: Data Definition Language
50
What You Learned (Cont’d)

Advanced SQL


Triggers, Views, Cursors, Stored Procedures and Functions
PL/SQL

Functional Dependencies

Normalization Rules
51
In Advanced Courses
Things get more interesting
Indexing
Techniques
Transaction
Query
Optimization
Handling
And
Management
of Big Data
many more …
52