Note7

Schema Refinement and
Normal Forms
2013
CS3754 Class Notes #7, John Shieh
1
Normalization
• It is a process that we can use to remove
design flaws from a database
• A number of normal forms, which are sets
of rules describing what we should and
should not do in our table structure
• 3NF is sufficient to avoid the data
redundancy problem of a designed
relational database
2
CS4753/2006F
Problems caused by redundancy
• Redundant Storage
– Some information is stored repeatedly.
• Update Anomalies
– If one copy of such repeated data is updated, an
inconsistency is created, unless all copies are similarly
updated.
• Insertion anomalies
– It may not be possible to store certain information
unless some other, unrelated, information is stored.
• Deletion Anomalies
– It may not be possible to delete certain information
without losing some other, unrelated, information.
2013
CS3754 Class Notes #7, John Shieh
3
Id
name
lot
rating
Hourly_wages
Hours_worked
123-22-3666
Attishoo
48
8
10
40
231-31-5368
Smiley
22
8
10
30
131-24-3650
Smethurst
35
5
7
30
434-26-3751
Guldu
35
5
7
32
612-67-4134
Madayan
35
8
10
40
• Redundant Storage
– The hourly wages depend on rating levels. So, for
example, hourly wage 10 for rating level 8 is repeated
three times.
• Update Anomalies
– The hourly_wages in the first tuple could be updated
without making a similar change in the second tuple.
2013
CS3754 Class Notes #7, John Shieh
4
Id
name
lot
rating
Hourly_wages
Hours_worked
123-22-3666
Attishoo
48
8
10
40
231-31-5368
Smiley
22
8
10
30
131-24-3650
Smethurst
35
5
7
30
434-26-3751
Guldu
35
5
7
32
612-67-4134
Madayan
35
8
10
40
• Insertion Anomalies
– We cannot insert a tuple for an employee unless we
know the hourly wage for the employee’s rating value.
• Deletion Anomalies
– If we delete all tuples with a given rating value (e.g.
tuples of Smethurst and Guldu) we lose the association
between the rating value and its hourly_wage value.
2013
CS3754 Class Notes #7, John Shieh
5
Decompositions
• Intuitively, redundancy arise when a relational
schema forces an association between attributes
that is not natural.
• Functional dependencies can be used to identify
such situations and suggest refinements to the
schema.
• The essential idea is that many problems arising
from redundancy can be addressed by replacing a
relation with a collection of ‘smaller’ relation.
2013
CS3754 Class Notes #7, John Shieh
6
Id
name
lot
rating
Hourly_wages
Hours_worked
123-22-3666
Attishoo
48
8
10
40
231-31-5368
Smiley
22
8
10
30
131-24-3650
Smethurst
35
5
7
30
434-26-3751
Guldu
35
5
7
32
612-67-4134
Madayan
35
8
10
40
name
lot
rating
Hours_worked
rating
Hourly_wages
123-22-3666
Attishoo
48
8
40
8
10
231-31-5368
Smiley
22
8
30
5
7
131-24-3650
Smethurst
35
5
30
434-26-3751
Guldu
35
5
32
612-67-4134
Madayan
35
8
40
Id
2013
Functional dependency:
- rating determines Hourly_wages
A decomposition of a relation schema R consists of replacing
the relation schema by two (or more) relation schemas each of
which contains a subset of attributes of R and together include all
CS3754 Class Notes #7, John Shieh
attributes in R
7
Functional Dependencies
• A functional dependency (FD) is a kind of IC that generalizes the
concept of a key.
• Let R be a relation schema, and X and Y be sets of nonempty sets
of attributes in R.
– An FD X Y exists, if in every relation instance for R, any two tuples that
agree on the value of X also agree on the value of Y.
– More formally
• Let R be a relation schema and let X and Y be nonempty sets of
attributes in R. An FD X Y exists in R if every instance of R
preserves the FD X Y.
• We say that an instance r of R preserves the FD X Y if the
following holds for every pair of tuples t1 and t2 in r
If t1.X = t2.X, then t1.Y = t2.Y
2013
The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X
CS3754 Class Notes #7, John Shieh
8
Take
Examples:
course_ID  course_name is preserved? yes
{student_ID, course_ID}  course_name is preserved ?
yes
if no two rows agree on  value, then    is trivially
preserved.
2013
CS3754 Class Notes #7, John Shieh
9
The table instance also preserves the following
student_ID  student_name
Student_ID, course_ID  {student_name, course_name}
student_ID, course_ID 
{student_ID, student_name, course_ID, course_Name}
student_name  student_name (a trivial dependency)
student_name, course_name  student_name (also trivial)
 many more ….
2013
CS3754 Class Notes #7, John Shieh
10
How do we know if a FD exist in R?
• Can we check all instances of R to see if the FD is preserved?
– Definitely, not possible!
– Whether or not a functional dependency exists must be determined by
assumptions given in advance, or common sense, not by individual
relation instances.
• Given an instance r of R, we can check if r preserves some
functional dependency f, but we cannot tell if f holds over R.
course_ID  student_name ? no
Although it is preserved by this table, it does not fit the assumption.
2013
CS3754 Class Notes #7, John Shieh
11
• The assumptions given in advance, or common sense,
impose some constraints, and are called the semantics
of a database
• Assumptions given in advance impose explicit
constraints; common sense imposes implicit
constraints
2013
CS3754 Class Notes #7, John Shieh
12
Example:
• Application is to keep track of information about
employees in a company.
• Information to be kept track of includes:
eid:
employee’s id number
ename: employee name
address: address of the employee
sex:
employee’s sex
dname: name of the department that the employee works for
dhname: department head’s name
dhsex: department head’s sex
2013
CS3754 Class Notes #7, John Shieh
13
Let’s construct a relation schema as follows:
Employee
eid
ename
address sex
dname
dhname
dhsex
Which of the following dependencies are true?
 1. eid  ename
 2.
 3.
 4.
 5.
 6.
 7.
 8.
2013
ename  eid
eid  address
eid  sex
sex  address
dhname  dname
dhname  eid
dhsex  sex
Assumptions:
a:Employee’s id number is unique
b:Each employee has a unique address
c:Each employee works for only one dept.
d:A person can be the head of at most one department
e:All department heads have different names
Implicit: common sense
CS3754 Class Notes #7, John Shieh
14
•  is a superkey for relation schema R iff   attri(R)
where attri(R) denotes the set of all the attributes in schema R
•  is a candidate key (or simply, key) for R iff
   attri(R), and
  is minimal, i.e., for any   ,   attri(R)
• In other words, a candidate key is a minimal superkey
(student_ID, course_ID) is a candidate key (and the only one)
(student_ID, course_ID, course_name) is a superkey, but not a candidate key
(student_ID, course_ID, student_name) is another non-candidate superkey
(student_ID, course_ID, course_name, student_name) is also a non-candidate
superkey
2013
CS3754 Class Notes #7, John Shieh
15
Normal Forms
1st Normal Form
No repeating data groups
2nd Normal Form
No partial key dependency
3rd Normal Form
No transitive dependency
Boyce-Codd Normal Form Reduce keys dependency
4th Normal Form
No multi-valued dependency
5th Normal Form
No join dependency
1NF  2NF  3NF  BCNF  4NF  5NF
2013
CS3754 Class Notes #7, John Shieh
16
Normal Form (NF)
• 1NF: each attribute or column value must be
atomic
• 2NF: if a schema is 1NF, and if its all attributes
that are not part of the primary key are fully
functionally dependent on the primary key
• 3NF: if a schema is 2NF, and all transitive
dependencies have been removed
Ex: employeeDept(employeeID, name, job, deptID,
deptName) has to convert to
employee(employeeID, name, job, deptID)
Dept(deptID, deptName)
17
CS4753/2006F
2NF
• It means that each non-key attribute must be
functionally dependent on all parts of the primary key
(i.e., the combination of the composite attributes of the
key).
• Example: not 2NF
Employee(employeeID, name, job, departmentID, skill)
employeeID, skill  name, job, departmentID
employeeID  name, job, departmentID
(Note:  determine)
• Break the table into two tables to become 2NF
Employee(employeeID, name, job, departmentID)
employeeSkills(employeeID, skill)
18
CS4753/2006F
3NF
• Example: 2NF but not 3NF
Employee(employeeID, name, job, departmentID, departmentName)
Here
employeeID  departmentID
employeeID  departmentName
Also
departmentID  departmentName, departmentID is not a
key
Therefore, employeeID  departmentName is a transitive dependency
• Convert the schema to 3NF by breaking to two tables:
Employee(employeeID, name, job, departmentID)
Department(departmentID, departmentName)
19
CS4753/2006F
Normal Forms Defined
Informally
• 1st normal form
– All attributes depend on the key
• 2nd normal form
– All attributes depend on the whole key
• 3rd normal form
– All attributes depend on nothing but the key
20
CS4753/2006F
SUMMARY OF NORMAL FORMS based on
Primary Keys
21
CS4753/2006F