Normalization

Chapter 3
Normalization (Database Design)
1
Chapter 3 : Objectives
 What normalization is and what role it plays in




database design
About the normal forms 1NF, 2NF, 3NF, BCNF, and
4NF
How normal forms can be transformed from lower
normal forms to higher normal forms
That normalization and E-R modeling are used
concurrently to produce a good database design
That some situations require denormalization to
generate information efficiently
2
What is Normalization ?
 A technique for producing a set of relations
with desirable properties (minimum data
redundancy), given the data requirements of
an enterprise.
 First developed by E.F. Codd (1972)
3
Why Normalization ?
 Main objective in developing a logical data
model for relational database systems is to
create an accurate representation of the
data, its relationships, and constraints.
 To achieve this objective, must identify a
suitable set of relations.
 A relation can be normalized to a specific
form to prevent possible occurrence of
update anomalies.
4
Why Normalization ?
As normalization proceeds, relations become
progressively more restricted (stronger) in
format and also less vulnerable to update
anomalies.
5
Why Normalize ?
 An ER Model is not always available as a
starting point for design
 To reduce redundant data in existing design
 To increase integrity of data and stability of
design
 To identify missing tables, columns and
constraints
6
Why Normalization ?
SID
S1
S1
S1
S2
S2
S3
S3
S3




Name
Joseph
Joseph
Joseph
Alice
Alice
Tom
Tom
Tom
Grade
A
B
A
A
A
B
B
A
Course#
CIS800
CIS820
CIS872
CIS800
CIS872
CIS800
CIS872
CIS860
Text
b1
b2
b5
b1
b5
b1
b5
b1
Major
CIS
CIS
CIS
CS
CS
Acct
Acct
Acct
Dept
CIS
CIS
CIS
MCS
MCS
Acct
Acct
Acct
Is there any redundant data?
Can we insert a new Course# with a new textbook?
What should be done if 'CIS' is changed to 'MIS'?
What would happen if we remove all CIS800 students?
7
Update Anomalies
 A poorly designed relation contains redundant data.
 Data redundancy causes update anomalies :
 Insertion Anomalies -An independent piece of information
cannot be recorded in a relation unless an irrelevant information
must be inserted together at the same time.
 Modification Anomalies - The update of a piece of information
must occur at multiple locations (if not, it leads to inconsistency).
 Deletion Anomalies - The deletion of a piece of information
unintentionally removes other information.
 Normalization is used to remove update anomalies by
decomposing the relations into smaller relations based on the
constraints.
8
First Normal Form
 All key attributes defined
 No repeating groups in table
 All attributes dependent on
primary key
9
Example : 1NF
Example : FIRST(S#,STATUS,CITY,P#,QTY)
- Suppliers supply certain quantities of parts and are located in some city.
 - Status of the supplier is determined by the location (city) of that supplier.
Fig : Relation FIRST
S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
STATUS
20
20
20
20
20
20
10
10
10
20
20
20
CITY
London
London
London
London
London
London
Paris
Paris
Paris
London
London
London
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
QTY
300
200
400
200
100
100
300
400
200
200
300
400
10
Example : 1NF
S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
STATUS
20
20
20
20
20
20
10
10
10
20
20
20
CITY
London
London
London
London
London
London
Paris
Paris
Paris
London
London
London
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
QTY
300
200
400
200
100
100
300
400
200
200
300
400



What is the Primary key of FIRST ?
Is FIRST in 1NF ?
Does FIRST have update anomalies ?
11
Example : 1NF
To overcome the above anomalies in the
relation FIRST , FIRST has to be
decomposed into smaller relations
(projections).
=> Decomposition
12
Non-loss Decomposition
 To overcome the update anomalies in a relation R, R
is decomposed into smaller relations (projections).
 A bad decomposition loses information.
 In a good decomposition
 The join of decomposed relations restores the original
relation.
 Decomposed relations can be maintained
independently.
13
Non-loss Decomposition
Rissanen's Rules:
 Two projections R1 and R2 of a relation R are
independent iff
Every FD in R can be logically derived from
those in R1 and R2 and
 The common attributes of R1 and R2 form a
candidate key for at least one of the pair.

14
Functional Dependency
Given two attributes X and Y of a relation R, Y
is functionally dependent on X iff each Xvalue in R has associated with it exactly one
Y-value in R(if two tuples agree on their Xvalue, they also agree on their Y-value).
X Y
15
Functional Dependency
Example: List all FDs in FIRST
FD Diagram of FIRST:
S#
CITY
QTY
P#
STATUS
16
Properties of FDs
 X is called the determinant of Y
 X may or may not be the key attribute of R
 X and Y may be composite
 X and Y could be mutually dependent on
each other

Husband  Wife
Wife  Husband
 The same Y-value may occur in multiple
tuples
17
Properties of FDs
An FD could be trivial or non-trivial
An FD is trivial
 if it is impossible for it not to be satisfied
 if the RHS is a subset of the LHS
In Normalization, the FDs of interest are :
 non-trivial
 hold for all times
18
Full Functional Dependency :
Given attributes X and Y of a relation R, Y is
fully functionally dependent on X if Y is
functionally dependent on X but not on any
proper subset of X.
19
Transitive Functional Dependency
Given attributes X, Y and Z of a relation R, Z is
transitively dependent on X iff
X Y,
Y  Z and X  Z
20
UNNORMALIZED FORM (UNF)
 A table that contains one or more repeating groups.
Examples :
 Is FIRST Unnormalized ?
 Is relation SCH given below Unnormalized ?
Student
Courses
Hobbies
-----------------------------------------------------Jin
DBMS
Tennis
OS
Cooking
Habib
C++
Networking
CA
Reading
Surfing
21
UNF to 1NF
Remove repeating group by entering appropriate data into the empty columns of
rows containing repeating data (‘flattening’ the table).
Student
Courses
Hobbies
-----------------------------------------------------------------------Jin
DBMS
Tennis
Jin
DBMS
Cooking
Jin
OS
Tennis
Jin
OS
Cooking
Habib
C++
Reading
Habib
C++
Surfing
Habib
Networking
Reading
Habib
Networking
Surfing
Habib
CA
Reading
Habib
CA
Surfing
22
FIRST NORMAL FORM (1NF)revisited
A relation R is in 1NF iff all attribute domains
contain atomic values only.
Example :
FIRST is in 1NF
 FIRST has update anomalies

23
FIRST NORMAL FORM (1NF)
FD diagram of FIRST:
QTY
S#
CITY
P#
STATUS
24
SECOND NORMAL FORM (2NF)
A relation R is in 2NF iff it is in 1NF and every nonprime-key attribute is fully functionally dependent on
the primary key.

A non-prime-key attribute does not belong to any
candidate key.
 Is FIRST in 2NF ?
 How to normalize FIRST into 2NF ?
 Remove all partial (or non-full) dependencies from
FIRST by decomposing FIRST into two smaller
relations SP and SECOND
25
SECOND NORMAL FORM (2NF)
 Is the decomposition of First into SP and
SECOND good ?
SP(S#,P#,QTY)
SECOND(S#,CITY,STATUS)
26
SECOND NORMAL FORM (2NF)
 FD Diagram of SP and SECOND
S#
CITY
S#
QTY
P#
STATUS
 SECOND is in 2NF
 SECOND has update anomalies
27
THIRD NORMAL FORM(3NF)
 A relation is in 3NF iff it is in 2NF and in which
every non-prime-key attribute is nontransitively dependent on the primary key.
 Is SECOND(S#,CITY,STATUS) in 3NF ?
28
THIRD NORMAL FORM(3NF)
 How to normalize SECOND into 3NF ?

Remove all transitive dependencies from
SECOND by decomposing SECOND into two
smaller relations SC and CS.
SC(S#,CITY)
CS(CITY,STATUS)

Make sure that the decomposition is good !
29
30
THIRD NORMAL FORM(3NF)
 FD Diagram of SC and CS
S#
CITY
CITY
STATUS
31
Help me Codd !
The key(1NF), the whole key(2NF), and
nothing but the key (3NF) - so help me
Codd !!
32
THIRD NORMAL FORM(3NF)
 Relation R(A,B,C,D) given below is in 3NF
 Still has Update Anomalies !!
33
Example :
Example: TEACH (Student, Course, Instructor)



A student can take several courses and can
be taught by several instructors.
For each course, each student of that course
is taught by only one teacher.
An instructor teaches only one course.
 Two Candidate keys : (Student, Course)
(Student, Instructor)
34
Example :
Choose (Student, Course) as the Primary key
FD Diagram for TEACHES
STUDENT
INSTRUCTOR
COURSE
 Is TEACHES in 3NF ?
 Does TEACHES have update anomalies ?
 What happens if (Student, Instructor) is chosen as
the Primary key!
35
BOYCE/CODD NORMAL FORM
 A relation is in BCNF iff every determinant is a
Candidate key.
 Update Anomalies occur in a 3NF relation R if
 R has multiple candidate keys
 Those candidate keys are composite and
 The candidate keys are overlapped
 How to Normalize TEACHES in BCNF ?
 Decompose TEACHES into
SI (Student, Instructor) and IC (Instructor, Course)
36
Denormalization
 Normalization is one of many database
design goals
 But Is it always the best possible design ?
37
Why Denormalize ?
 Normalization is the process of putting one fact in
one appropriate place. This optimizes updates at the
expense of retrievals.
 When a fact is stored in only one place, retrieving
many different but related facts usually requires going
to many different places. This tends to slow the
retrieval process.
 Updating is quicker, however, because the fact you're
updating exists in only one place.
38
Why Denormalize ?
 A relational normalized database imposes a heavy
access load over physical storage of data even if it is
well tuned for high performance.
 A normalized design will often store different but
related pieces of information in separate logical
tables (called relations).

If these relations are stored physically as separate disk
files, completing a database query that draws
information from several relations (a join operation) can
be slow.
39
Denormalization
 Denormalization

is the process of attempting to optimize the
performance of a database by adding redundant
data or by grouping data.

technique to move from higher to lower normal
forms of database modeling in order to speed up
database access.
40
An Example of Denormalization
Consider a relation :
Contact (Name, Street, Zip, City, Province)
Name
Street
Zip
City
Province
John
401 Sunset Av.
N91 Q23
Windsor
ON
Harry
402 Sunset Av.
N4T 3R5
Windsor
ON
Bill
123 First St.
N4Y 7Y8
London
ON
PK of Contact ?
Is Contact in 3NF ?
41
Example of Denormalization(contd)
Was it worth the decomposition ?
42