Schema Refinement and
Normal Forms
2013
CS3754 Class Notes #7, John Shieh
1
Normalization
• It is a process that we can use to remove
design flaws from a database
• A number of normal forms, which are sets
of rules describing what we should and
should not do in our table structure
• 3NF is sufficient to avoid the data
redundancy problem of a designed
relational database
2
CS4753/2006F
Problems caused by redundancy
• Redundant Storage
– Some information is stored repeatedly.
• Update Anomalies
– If one copy of such repeated data is updated, an
inconsistency is created, unless all copies are similarly
updated.
• Insertion anomalies
– It may not be possible to store certain information
unless some other, unrelated, information is stored.
• Deletion Anomalies
– It may not be possible to delete certain information
without losing some other, unrelated, information.
2013
CS3754 Class Notes #7, John Shieh
3
Id
name
lot
rating
Hourly_wages
Hours_worked
123-22-3666
Attishoo
48
8
10
40
231-31-5368
Smiley
22
8
10
30
131-24-3650
Smethurst
35
5
7
30
434-26-3751
Guldu
35
5
7
32
612-67-4134
Madayan
35
8
10
40
• Redundant Storage
– The hourly wages depend on rating levels. So, for
example, hourly wage 10 for rating level 8 is repeated
three times.
• Update Anomalies
– The hourly_wages in the first tuple could be updated
without making a similar change in the second tuple.
2013
CS3754 Class Notes #7, John Shieh
4
Id
name
lot
rating
Hourly_wages
Hours_worked
123-22-3666
Attishoo
48
8
10
40
231-31-5368
Smiley
22
8
10
30
131-24-3650
Smethurst
35
5
7
30
434-26-3751
Guldu
35
5
7
32
612-67-4134
Madayan
35
8
10
40
• Insertion Anomalies
– We cannot insert a tuple for an employee unless we
know the hourly wage for the employee’s rating value.
• Deletion Anomalies
– If we delete all tuples with a given rating value (e.g.
tuples of Smethurst and Guldu) we lose the association
between the rating value and its hourly_wage value.
2013
CS3754 Class Notes #7, John Shieh
5
Decompositions
• Intuitively, redundancy arise when a relational
schema forces an association between attributes
that is not natural.
• Functional dependencies can be used to identify
such situations and suggest refinements to the
schema.
• The essential idea is that many problems arising
from redundancy can be addressed by replacing a
relation with a collection of ‘smaller’ relation.
2013
CS3754 Class Notes #7, John Shieh
6
Id
name
lot
rating
Hourly_wages
Hours_worked
123-22-3666
Attishoo
48
8
10
40
231-31-5368
Smiley
22
8
10
30
131-24-3650
Smethurst
35
5
7
30
434-26-3751
Guldu
35
5
7
32
612-67-4134
Madayan
35
8
10
40
name
lot
rating
Hours_worked
rating
Hourly_wages
123-22-3666
Attishoo
48
8
40
8
10
231-31-5368
Smiley
22
8
30
5
7
131-24-3650
Smethurst
35
5
30
434-26-3751
Guldu
35
5
32
612-67-4134
Madayan
35
8
40
Id
2013
Functional dependency:
- rating determines Hourly_wages
A decomposition of a relation schema R consists of replacing
the relation schema by two (or more) relation schemas each of
which contains a subset of attributes of R and together include all
CS3754 Class Notes #7, John Shieh
attributes in R
7
Functional Dependencies
• A functional dependency (FD) is a kind of IC that generalizes the
concept of a key.
• Let R be a relation schema, and X and Y be sets of nonempty sets
of attributes in R.
– An FD X Y exists, if in every relation instance for R, any two tuples that
agree on the value of X also agree on the value of Y.
– More formally
• Let R be a relation schema and let X and Y be nonempty sets of
attributes in R. An FD X Y exists in R if every instance of R
preserves the FD X Y.
• We say that an instance r of R preserves the FD X Y if the
following holds for every pair of tuples t1 and t2 in r
If t1.X = t2.X, then t1.Y = t2.Y
2013
The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X
CS3754 Class Notes #7, John Shieh
8
Take
Examples:
course_ID course_name is preserved? yes
{student_ID, course_ID} course_name is preserved ?
yes
if no two rows agree on value, then is trivially
preserved.
2013
CS3754 Class Notes #7, John Shieh
9
The table instance also preserves the following
student_ID student_name
Student_ID, course_ID {student_name, course_name}
student_ID, course_ID
{student_ID, student_name, course_ID, course_Name}
student_name student_name (a trivial dependency)
student_name, course_name student_name (also trivial)
many more ….
2013
CS3754 Class Notes #7, John Shieh
10
How do we know if a FD exist in R?
• Can we check all instances of R to see if the FD is preserved?
– Definitely, not possible!
– Whether or not a functional dependency exists must be determined by
assumptions given in advance, or common sense, not by individual
relation instances.
• Given an instance r of R, we can check if r preserves some
functional dependency f, but we cannot tell if f holds over R.
course_ID student_name ? no
Although it is preserved by this table, it does not fit the assumption.
2013
CS3754 Class Notes #7, John Shieh
11
• The assumptions given in advance, or common sense,
impose some constraints, and are called the semantics
of a database
• Assumptions given in advance impose explicit
constraints; common sense imposes implicit
constraints
2013
CS3754 Class Notes #7, John Shieh
12
Example:
• Application is to keep track of information about
employees in a company.
• Information to be kept track of includes:
eid:
employee’s id number
ename: employee name
address: address of the employee
sex:
employee’s sex
dname: name of the department that the employee works for
dhname: department head’s name
dhsex: department head’s sex
2013
CS3754 Class Notes #7, John Shieh
13
Let’s construct a relation schema as follows:
Employee
eid
ename
address sex
dname
dhname
dhsex
Which of the following dependencies are true?
1. eid ename
2.
3.
4.
5.
6.
7.
8.
2013
ename eid
eid address
eid sex
sex address
dhname dname
dhname eid
dhsex sex
Assumptions:
a:Employee’s id number is unique
b:Each employee has a unique address
c:Each employee works for only one dept.
d:A person can be the head of at most one department
e:All department heads have different names
Implicit: common sense
CS3754 Class Notes #7, John Shieh
14
• is a superkey for relation schema R iff attri(R)
where attri(R) denotes the set of all the attributes in schema R
• is a candidate key (or simply, key) for R iff
attri(R), and
is minimal, i.e., for any , attri(R)
• In other words, a candidate key is a minimal superkey
(student_ID, course_ID) is a candidate key (and the only one)
(student_ID, course_ID, course_name) is a superkey, but not a candidate key
(student_ID, course_ID, student_name) is another non-candidate superkey
(student_ID, course_ID, course_name, student_name) is also a non-candidate
superkey
2013
CS3754 Class Notes #7, John Shieh
15
Normal Forms
1st Normal Form
No repeating data groups
2nd Normal Form
No partial key dependency
3rd Normal Form
No transitive dependency
Boyce-Codd Normal Form Reduce keys dependency
4th Normal Form
No multi-valued dependency
5th Normal Form
No join dependency
1NF 2NF 3NF BCNF 4NF 5NF
2013
CS3754 Class Notes #7, John Shieh
16
Normal Form (NF)
• 1NF: each attribute or column value must be
atomic
• 2NF: if a schema is 1NF, and if its all attributes
that are not part of the primary key are fully
functionally dependent on the primary key
• 3NF: if a schema is 2NF, and all transitive
dependencies have been removed
Ex: employeeDept(employeeID, name, job, deptID,
deptName) has to convert to
employee(employeeID, name, job, deptID)
Dept(deptID, deptName)
17
CS4753/2006F
2NF
• It means that each non-key attribute must be
functionally dependent on all parts of the primary key
(i.e., the combination of the composite attributes of the
key).
• Example: not 2NF
Employee(employeeID, name, job, departmentID, skill)
employeeID, skill name, job, departmentID
employeeID name, job, departmentID
(Note: determine)
• Break the table into two tables to become 2NF
Employee(employeeID, name, job, departmentID)
employeeSkills(employeeID, skill)
18
CS4753/2006F
3NF
• Example: 2NF but not 3NF
Employee(employeeID, name, job, departmentID, departmentName)
Here
employeeID departmentID
employeeID departmentName
Also
departmentID departmentName, departmentID is not a
key
Therefore, employeeID departmentName is a transitive dependency
• Convert the schema to 3NF by breaking to two tables:
Employee(employeeID, name, job, departmentID)
Department(departmentID, departmentName)
19
CS4753/2006F
Normal Forms Defined
Informally
• 1st normal form
– All attributes depend on the key
• 2nd normal form
– All attributes depend on the whole key
• 3rd normal form
– All attributes depend on nothing but the key
20
CS4753/2006F
SUMMARY OF NORMAL FORMS based on
Primary Keys
21
CS4753/2006F
© Copyright 2026 Paperzz