Notes

Normalization and Entity-Relationship (ER) Modeling
Benefits of Normalization and ER Modeling
• Easily maps into a relational model
• Directly maps to relational database tables
• Can be used as a “blueprint” for a database design
• Simple to understand, short learning curve
• Useful to communicate with customers
Basic Concepts of ER Modeling
• Introduced in 1976 by MIT’s Professor Peter Chen
• Separates entities (objects, people, classes) from relationships (associations)
• Localizes data as attributes
• Entities play roles in relationships
Entities are nouns
– People
– Jobs, Accounts
– Objects (Chairs, Cars, Books)
Relationships can be expressed as verbs
– Owns, Rents, Borrows, Checks out, Buys
– has-a
Attributes are data associated with entities and relationships. Typically adjectives.
– Color, Height, Weight
– Name, Title, ID
– Date, Timespan
Roles are entities’ parts or jobs in relationships. Typically nouns.
– Further classify entities
• Generally unnecessary unless entity may play multiple roles
Example: Husband, Wife
1
husband
1
married to
wife
Example: Library checkout
E
Date
SSN
Name
Due Date
M
ID
N
Customer
Book
Checks Out
Example: Student/Class
SSN
Student ID
Date Enrolled
Capacity
Name
Section
N
M
Student
takesa
Class
Instructor
CourseCode
General form for ER modeling
Attribute
Attribute
Attribute
Attribute
Attribute
M
Entity/Role
Attribute
Attribute
Attribute
N
Relationship
Entity/Role
Normalization
• A technique used in designing relational databases
• Removes redundancy
• Minimizes dependences
• Localizes data
• Simplifies modifications
• 5 normal forms defined
– First 3 are most important
First Normal Form
• Remove “repeating groups” (duplicate columns/variable length array)
• Example
– Order (ordernum, customer, (itemName, itemCost, itemQuantity))
– Parenthesis (or overbar) indicates repeating group
– Create item table: Item (name, cost, …)
– Create order-item relationship table:
OrderItem (ordernum, itemname, quantity)
– Typically uses composite keys
Before normalization
item1name
item1quantity
Item2name
item1cost
Item2quantity
item2cost
ordernum
customer
Item3name
item3cost
Item3quantity
Order
After normalization
ordernum
customer
quantity
N
Order
name
cost
M
contain
s
Item
Second Normal Form
• Remove non-dependent attributes
– All attributes in a table must be dependent on the entire primary key
– Cannot be dependent on only part of the primary key
– Reduces redundancy and dependencies
• Applies only to tables with composite keys
• Example
– Part (part number, supplier_name, price, supplier_addess)
– Supplier address is dependent only on supplier
– Create supplier table: Supplier (supplier_name, supplier_address)
Before Normalization
part_number
supplier_name
price
supplier_address
Part
After Normalization
supplier_name
part_number
supplier_name
supplier_address
price
Part
Supplier
Third Normal Form
• Non-key attributes must be independent of each other. No transitive
dependencies
– i.e. Non-key attributes must depend only on the primary key
– Reduces redundancy and dependencies
– Can be viewed as an extension to second normal form
(i.e. 2NF = no partial dependencies; 3NF = no non-key dependencies)
•
Example
– CD (title, artist, publisher, publisher_address)
– Publisher address depends on publisher
– Move to separate table
•
Example
– Order (ordernum, customer, unit_price, quantity, total)
– Total can be calculated from unit_price and quantity. It is therefore
not independent.
– Total can simply be removed entirely
Anomalies
A goal of normalization is to remove (at least reduce) anomalies.
Types of anomalies:
Update
Inconsistent Data
Additions
Deletions
Example from Pratt paper using this relation:
Order(Order#, Date, Part#, Description, Quantity)
o Relation is in 1NF since Description depends only on Part# and not
Order#.
o Note there will be redundancy since the description is repeated in
each record with the same Part#, so Update anomaly is that multiple
Description entries would have to be changed for any single part
description change.
o Because of the duplication there could easily be Inconsistent Data,
where the descriptions for the same part are different.
o Can’t add a new part until it has been ordered, since the Order#
attribute is part of the PK and can’t be null. This is an Addition
anomaly.
o If a record is deleted that is the only entry for a particular Part#, the
Description of that part would be lost. This is a Deletion Anomaly.
Review of Normal Forms
•
•
•
•
Unnormalized
– Has repeating group
First Normal Form (1NF)
– Has no repeating group/duplicate columns
Second Normal Form (2NF)
– Has all non-key attributes dependent on “the whole key”
Third Normal Form (3NF)
– Has all non-key attributes dependent on “nothing but the key”
Unnormalized
Remove Repeating Groups
First Normal Form
Remove Partial Dependencies
Second Normal Form
Remove Transitive Dependencies
Third Normal Form