Normal Forms

Normalization
Asif Sohail
University of the Punjab
Punjab University College of Information Technology (PUCIT)
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 1
Introduction
•
•
•
•
•
a)
b)
c)
It is a process of converting a complex, large and unstable
relation into a set of simple, small and stable relations.
It is a process of efficiently organizing data in a database.
Normalization results in a well structure relation – a relation
that contains min. redundancy and allows insert, update and
delete without errors/inconsistencies.
Errors or inconsistencies caused by redundant data are also
called anomalies.
There are three types of anomalies:
Insertion Anomaly.
Deletion Anomaly.
Updation Anomaly.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 2
Introduction
a) Insertion Anomaly.
• It occurs when extra data beyond the desired data
must be added to the database.
b) Deletion Anomaly.
• It occurs when it is necessary to change multiple
rows to modify only a single fact.
c) Updation Anomaly.
• It occurs when deleting a row causes some
unwanted deletions.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 3
Introduction
•
Normalization is based on the analysis of Functional Dependency (FD).
•
A FD is a relationship between two attributes A & B of a relation R,
such that attribute B is said to be functionally dependent on attribute
A, if A uniquely determines the values of B.(A->B)
•
The attribute on the left-hand side of the arrow in a functional
dependency is called Determinant and on the right are called
Dependents.
•
An attribute may be functionally dependent on more than one
attributes.
•
Functional Dependencies are helpful in identifying the keys for a given
relation.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 4
Continued…
EMPLOYEE1
EMP_ID
Name
Dept_Name
Salary
EMPLOYEE2
EMP_ID
Course_Title
Name
Dept_Name
Salary
Date_Completed
Date a Course is completed is completely determined by
the EMP_ID and Course_Title
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 5
Inference Rules
• Rules of Inference for functional dependencies, called inference
axioms or Armstrong axioms, after their developer, can be used
to find all the FDs logically implied by a set of FDs.
• There are the following inference rules:
a) Reflexivity.
b) Augmentation.
c) Transitivity.
d) Additivity or union.
e) Projectivity or Decomposition
f)
Pseudo transitivity:
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 6
Inference Rules
a) Reflexivity.
•
{Y 1,...,Y n} ⊆ {X1,...,Xm} implies {X1,...,Xm} → {Y 1,...,Y n}
• OR If B is a subset of A, then A → B.
• For Example: StName,stAdr → stName
b) Augmentation.
• If we have A → B then AC → BC. For Example
• If stId → stName then
• StId,stAdr → stName,stadr
c) Transitivity.
• If A → B and B → C, then A → C
• If stId → prName and prName → credits then
• stId → credits
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 7
Inference Rules
d) Additivity or Union:
• If A → B and A → C, then A → BC
• If empId → eName and empId → qual Then we can write it as
empId → eName, qual
e) Projectivity or Decomposition
• If A → BC then A → B and A → C
• If empId → eName,qual Then we can write it as
• empId → eName and empID → qual
f) Pseudo transitivity:
• If A → B and CB → D, then AC → D
• If stID → stName and stName,fName → stAdr Then we can
write it as StId,fName → stAdr
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 8
Normal Forms
•
Normalization process is built around the concept of Normal
Forms.
•
A Normal Form is a state of a relation that can be determined
by applying simple rules regarding functional Dependencies.
–
First Normal Form
–
Second Normal Form
–
Third Normal Form
–
Boyce Codd Normal Form
–
Fourth Normal Form
–
Fifth Normal Form
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 9
Table with
Multivalued attribute
Remove multivalued attributes
First Normal Form
Remove partial dependencies
Second Normal Form
Remove transitive dependencies
Third Normal Form
Removing remaining anomalies
Boyce-Codd
Normal Form
Remove multivalued dependencies
Fourth Normal Form
Remove remaining anomalies
Fifth Normal Form
First Normal Form (1NF)
•
A relation is said to be in 1NF, if it contains no repeating
group.
•
The value at the intersection of a row and column must be
atomic(having one value)
•
If you developed a logical design by transforming ER diagram
into relations, there should not be any multivalued attributes
remaining
•
Consider the following relation:
Student (RegNo,Name,Program, C-Code, C-Title, C-Grade)
•
This relation has a repeating group and therefore it has the
insert, delete and update anomalies.
•
Multiple values create problems in performing operations like
select or join.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 11
First Normal Form (1NF)
•
The relation Student can be converted into 1NF using either
of the following methods:
a) Change the PK of the relation and define a composite key RegNo
& C_Code.
b) Split the relation into 2 relations:
Student (RegNo, Name, Program)
Course (RegNo, C-Code, C-Title, C_Grade)
Example 2: STD(stId, stName, stAdr, prName, bkId)
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 12
Second Normal Form (2NF)
•
•
•
•
A relation is in 2NF if:
– It is in 1NF
– Every nonkey attribute is fully functionally dependent on
the primary key
A situation of Partial Functional Dependency arises when PK
of a relation is composite and a non key attribute is
functionally dependent on part (but not all) of the PK.
Referring to the Course relation:
Course (RegNo, C-Code,C-Title, C_Grade)
The functional dependencies are:
C-Code -> C_Title (Partial FD)
RegNo,C_Code -> C_Grade (Full FD)
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 13
Second Normal Form (2NF)
•
Since all the non key attributes are not fully functionally
dependent on the PK or there is partial functional
dependency in the relation, therefore it is not in 2NF.
•
The Anomalies associated with the course relation are:
a) Insert Anomaly:
•
A course instance cant be inserted without a student (RegNo)
b) Delete Anomaly.
•
Deleting a student will unnecessarily delete course data.
c) Update Anomaly.
•
A course cant be updated independently.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 14
Second Normal Form (2NF)
• The process for transforming a 1NF table to 2NF is:
– Identify any determinants other than the composite key, and
the columns they determine.
– Create and name a new table for each determinant and the
unique columns it determines.
– Move the determined columns from the original table to the
new table. The determinate becomes the primary key of the
new table.
– Delete the columns you just moved from the original table
except for the determinate which will serve as a foreign key.
– The original table may be renamed to maintain semantic
meaning.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 15
Second Normal Form (2NF)
•
•
The relation Course can be converted into 2NF by
decomposing it into the following relations:
Course (C-Code,C-Title)
Result (RegNo, C-Code, C_Grade)
A relation in 1NF will be in 2NF if:
– The PK consists of only one attribute OR
– No nonkey attributes exist in the relation
OR
– Every nonkey attribute is functionally dependent on
the full set of primary key attributes
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 16
Third Normal Form (3NF)
•
•
•
•
•
A relation is said to be in 3NF, if it is in 2NF and there is no
Transitive Dependency.
A Transitive Dependency is a functional dependency between
two or more non key attributes of a relation.
Consider the following relation:
Emp (EmpNo, EName, Job, Sal, Proj-No,Proj-Details)
In the above relation, there is a following transitive
dependency:
Proj-No -> Proj-Details
Due to this, project information cant be maintained
independent of a employee record and hence there are
anomalies in the relation.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 17
Third Normal Form (3NF)
•
•
You can remove transitive dependency from a relation in the
following way:
Create a new relation against transitively dependent
attributes and leave the PK of new relation in the old relation
to serve as a FK.
Emp (EmpNo, EName, Job, Sal, Proj-No)
Project (Proj-No, Proj-Details)
More Examples:
Ex1: PNo, PName, PBudget, EmpNo, EName, Job, ChgHour, Hours
Ex2: STD (stId, stName, stAdr, prName, prCrdts)
– stId -> stName, stAdr, prName, prCrdts
– prName -> prCrdts
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 18
Third Normal Form (3NF)
• The process of transforming a table into 3NF is:
– Identify any determinants, other the primary key, and the
columns they determine.
– Create and name a new table for each determinant and the
unique columns it determines.
– Move the determined columns from the original table to the
new table. The determinate becomes the primary key of the
new table.
– Delete the columns you just moved from the original table
except for the determinate which will serve as a foreign key.
– The original table may be renamed to maintain semantic
meaning.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 19
Normalization - Exercise
•
Order Relation:
OrderNo, OrderDetails, OrderDate, CustNo, CustName,
ProductNo, ProdName, Price, QtyOrdered
•
Student Relation:
RegNo, Name, Address, Program, C-Code, C-Title,
C-Grade, T-Code, T-Name
•
Patient Relation:
VisitNo, VisitDate, PatNo, PatName, PatAge, DNo, DName,
DSpeciality, Diagnosis
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 20
Boyce-Codd Normal Form (BCNF)
•
•
•
•
•
A relation is said to be in BCNF, if it is in 3NF and every determinant is a
candidate key or there is no overlapping of candidate keys.
If a table contains atomic candidate keys, the 3NF and BCNF are
equivalent. The BCNF differs from the 3NF only when there are more than
one candidate keys and the keys are composite and overlapping.
A 3NF relation is not in BCNF and this happens only if
– the candidate keys in the relation are composite keys (that is, they are
not single attributes),
– there is more than one candidate key in the relation, and
– the keys are not disjoint, that is, some attributes in the keys are
common.
Consider a relation R(A, B, C, D) such that
A,B -> C, D and C->B.
The relation R has no partial dependency nor it contains transitive
dependency. Thus the relation R is in 3NF.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 21
Boyce-Codd Normal Form (BCNF)
•
Consider the following relation:
PROJECT (RegNo, PTool, Supervisor)
Constraints:
1. For each Project Tool (PTool), a student has only one
supervisor.
2. A project may be in more than one tools.
3. Each supervisor can supervise only one tool.
•
In the above relation, no single attribute is a PK.
•
Possible candidate keys are RegNo, PTool and RegNo,
Supervisor and Ptool, Supervisor.
•
The candidate keys overlap as they share RegNo.
•
The Anomalies associated with the course relation are:
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 22
Boyce-Codd Normal Form (BCNF)
a)
•
Insert Anomaly:
Supervisor and PTool cant be defined unless a student takes a
project.
b) Delete Anomaly.
•
Deleting a student will unnecessarily delete project data.
c) Update Anomaly.
•
Updating a PTool may result in unwanted changes.
•
The relation Project can be converted into BCNF by
decomposing it into the following relations:
PROJECT1 (RegNo, PTool)
PROJECT1 (Supervisor, PTool)
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 23
Fourth Normal Form (4NF)
•
•
A relation is said to be in 4NF, if it is in BCNF and there
is no multivalued dependency.
A Multivalued Dependency is a type of dependency that
exists in a relation R having at least 3 attributes R(A, B,
C) such that, for each value of A, there is a well defined
set of values of B and a well defined set of values of C,
but the set of values of B and the set of values of C is
independent or have no impact on each other.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 24
Fourth Normal Form (4NF)
•
•
•
•
•
•
•
a)
•
Consider a relation
COURSE (C_Code, C_Instructor, Book_Titles)
The constraints on the above relation are:
Each course may have several instructors.
Each course uses several book titles.
Set of Book titles used for a course is independent of the
set of instructors.
Q: What will be the PK of the relation?
The Anomalies associated with the course relation are:
Insert Anomaly:
Adding a new book title will require adding as many rows
as many instructors are there.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 25
Fourth Normal Form (4NF)
b) Delete Anomaly.
•
Deleting an instructor will result in unwanted deletions.
c) Update Anomaly.
•
Updating a book title requires updating several rows.
•
The relation Course can be converted into 4NF by
decomposing it into the following relations:
•
COURSE (C_Code, C_Instructor)
•
BOOKS (C_Code, Book_Titles)
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 26
Fifth Normal Form (5NF)
•
•
•
•
A relation is said to be in 5NF, if it is in 4NF and there is
no Join Dependency.
A Join Dependency means that a relation cant be
decomposed into two or more relations such that the
resulting relations can be recombined to form original
data. Thus if R(A, B, C) is decomposed into R1(A,B) and
R2(B,C) – a join dependency exists if we can get back to
R by taking natural join of R1 and R2.
In every Normal Form, we decompose a given relation.
The decomposition can be lossless or lossy
decomposition. Join Dependency ensures that the
decomposition is lossless.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 27
Fifth Normal Form (5NF)
•
•
•
•
•
Consider the following relation:
STUDENT (RegNo, Name, Program, C_Code, C_Grade)
We decompose the relation into the following relations in
order to convert it into 1NF.
STUDENT (RegNo, Name, Program)
RESULT(RegNo, C_Code, C_Grade)
Some tuples of the above relations are….
Natural Join of the above two relations will give us the
tuples of the original relation STUDENT. Thus the
decomposition is lossless.
Now we decompose RESULT relation as:
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 28
Fifth Normal Form (5NF)
•
•
•
•
RESULT1(RegNo, C_Code)
RESULT2(C_Code, C_Grade)
Some tuples of the above relations are….
Natural Join of the above two relations will give us some
extra tuples which were not in the original relation.
Even though we are getting more tuples, but still the
decomposition is lossy, because without the original
relation, we have no way to identify which tuples are extra
or spurious. Thus we actually loose information.
We can guarantee that the decomposition if lossless by
making sure that for each pair of relations that will be
joined, the set of common attributes is a determinant in
one of the relations.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 29
Fifth Normal Form (5NF)
•
•
•
•
5NF is not easily verified as lower normal forms.
In fact, no systematic method exists for obtaining 5NF or
for ensuring that a set of relations is indeed in 5NF.
It is tempting to suggest that such relations are
pathological cases and are likely to be rare in practice.
5NF is also called Project-Join Normal Form.
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 30
Thank you for your attention.
Asif Sohail
Assistant Professor
University of the Punjab
Punjab University College of Information Technology (PUCIT)
Allama Iqbal (Old) Campus, Anarkali
Lahore, Pakistan
Tel:
+92-(0)42-111-923-923 Ext. 154
E-mail:
[email protected]
© 2009 Punjab University College of Information Technology (PUCIT)
September 8, 2009
Slide 31