normal forms

Database Systems
Lecture # 17
Chapter # 10
Normalization
Bad Database Design
• Duplication of data
• Updating
• Deleting
Redundant
Deleting
Update
Normalization
• Normalization is a design technique that is widely used
as a guide in designing relational databases.
• Normalization is essentially a two step process that puts
data into tabular form by removing repeating groups and
then removes duplicated data from the relational tables.
• Normalization theory is based on the concepts of
normal forms.
• A relational table is said to be in a particular normal
form if it satisfied a certain set of constraints.
• There are currently five normal forms that have been
defined.
• In this course, we will cover the first three normal forms
Normalization
• The goal of normalization is to create a set of
relational tables that are free of redundant data and
that can be consistently and correctly modified or
updated.
• This means that all tables in a relational database
should be in the third normal form (3NF).
Normalization
• A relational table is in 3NF if and only if all non-key
columns are:
– Mutually independent and
– Fully dependent upon the primary key
• Mutual independence means that no non-key column is
dependent upon any combination of the other columns
• The first two normal forms are intermediate steps to
achieve the goal of having all tables in 3NF
• In order to better understand the 2NF and higher forms,
it is necessary to understand the concepts of functional
dependencies
Functional Dependencies
• The concept of functional dependencies is the basis for the first
three normal forms.
• A column, Y, of the relational table R is said to be functionally
dependent upon column X of R if and only if each value of X in
R is associated with precisely one value of Y at any given time.
• X and Y may be composite. Saying that column Y is functionally
dependent upon X is the same as saying the values of column X
identify the values of column Y.
• If column X is a primary key, then all columns in the relational
table R must be functionally dependent upon X.
• A short-hand notation for describing a functional dependency
is:
R.x —> R.y
• which can be read as in the relational table named R, column x
functionally determines (identifies) column y in the relational
table R.
Example FD
• Drinkers(name, addr, drinkLiked, manf, favdrink).
• Reasonable FD’s to assert:
1.name -> addr
2.name -> favdrink
3.drinkLiked -> manf
Functional Dependency: The value of one attribute
(the determinant) determines the value of
another attribute
Example DF
name
Janeway
Janeway
Spock
addr
Voyager
Voyager
Enterprise
Because name -> addr
drinkLiked
Bud
WickedAle
Bud
manf
A.B.
Pete’s
A.B.
favDrink
WickedAle
WickedAle
Bud
Because name -> favDrink
Because drinkLiked -> manf
Basic Idea
• To know what FD’s hold in a projection, we start with
given FD’s and find all FD’s that follow from given
ones.
• Then, restrict to those FD’s that involve only
attributes of the projected schema.
Normalization
• What is normalization? Basically, it's the process of
efficiently organizing data in a database.
• There are two goals of the normalization process:
• Eliminate redundant data (for example, storing the
same data in more than one table) and
• Ensure data dependencies make sense (only storing
related data in a table).
• Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure
that data is logically stored.
First Normalization Form 1FN
• Eliminate duplicative columns from the same table.
BY the values in each column of a table are atomic.
By atomic we mean that there are no sets of values
within a column.
• Create separate tables for each group of related data
and identify each row with a unique column or set of
columns (the primary key).
First Normal Form
• No multivalued attributes
• Every attribute value is atomic
• Fig. 5-25 is not in 1st Normal Form (multivalued
attributes)  it is not a relation
• Fig. 5-26 is in 1st Normal form
• All relations are in 1st Normal Form
Table with multivalued attributes, not in 1st normal form
Note: this is NOT a relation
Table with no multivalued attributes and unique rows, in 1st
normal form
Note: this is relation, but not a well-structured one
Second normal form 2NF
• Where the First Normal Form deals with atomicity of
data, the Second Normal Form (or 2NF) deals with
relationships between composite key columns and nonkey columns:
– Meet all the requirements of the first normal form.
– Any non-key columns must depend on the entire
primary key. In the case of a composite primary key,
this means that a non-key column cannot depend on
only part of the composite key.
– Create relationships between these new tables and
their predecessors through the use of foreign keys.
– A relation R is in 2nf if every non-primary attribute A
in R is fully Functionally dependent on the primary
key.
Second Normal Form
• 1NF PLUS every non-key attribute is fully
functionally dependent on the ENTIRE
primary key
– Every non-key attribute must be defined by the
entire key, not by only part of the key
– No partial functional dependencies
Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address
Customer_ID  Customer_Name, Customer_Address
Product_ID  Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID  Order_Quantity
Therefore, NOT in 2nd Normal Form
Getting it into Second Normal Form
Partial Dependencies are removed, but there
are still transitive dependencies
Third normal form 3NF
• Remove columns that are not dependent upon the
primary key.
• Third Normal Form (3NF) requires that all columns
depend directly on the primary key.
– Example:
• Publisher (Publisher_ID, Name, Address, City, State,
Zip)
• Zip (Zip, City, State)
Third Normal Form
• 2NF PLUS no transitive dependencies (functional
dependencies on non-primary-key attributes)
• Note: this is called transitive, because the primary
key is a determinant for another attribute, which in
turn is a determinant for a third
• Solution: non-key determinant with transitive
dependencies go into a new table; non-key
determinant becomes primary key in the new table
and stays as foreign key in the old table
Getting it into Third Normal Form
Transitive dependencies are removed
Figure 5.22 Steps in
normalization