entity relationships - UTSC

CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
ERDs and normalization pg 1/7
Please note…
These rules apply to structured analysis and design of relational databases.
They are not exactly the same as the class associations in the class diagrams we will cover in later lectures.
Relational databases are a mature and well-supported technology. They were designed to:
 eliminate data redundancy
 provide the simplest possible representation of data
 allow complex queries
They are used for systems that were designed as structured or object-oriented.
The deliverables for structured logical modeling are:
 every data element is accounted for whether it’s raw or derived
 relationships within the data are normalized
 data requirements are structured into relationships
 the physical relational database can be designed
A relational database model is a representation of data in tables or relations. Relations are named 2dimensional tables of data with named columns (attributes) and an arbitrary number of rows (records).
An entity is anything about which you need to carry information. Relationships between entities follow
certain rules.
These are valid relationships between entities A and B
A
B
for each occurence of entity A, there is always exactly one
occurence of entity B, and vice versa
A
B
for each occurence of entity A, there is sometimes one (but
never more than one) occurence of entity B, and vice
versa
A
B
for each occurence of entity A, there is always exactly one
occurence of entity B, but for each occurence of entity B
there is sometimes one (but never more than one)
occurence of entity A
A
B
for each occurence of entity A, there is always at least one
occurence of entity B, but for each occurence of entity B
c is sometimes one occurence of entity A (but never
there
more than one)
A
B
for each occurence of entity A, there can be zero, one or
many occurences of entity B, and for each occurence of
entity B, there is sometimes one (but never more)
occurence of entity A
A
B
for each occurence of entity A there is at least one
occurence of entity B, and for each occurence of entity B
there is always exactly one occurence of entity A
A
B
for each occurence of entity A there can be zero, one or
many occurences of entity B, and for each occurence of
entity B there is always exactly one occurence of entity A
CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
ERDs and normalization pg 2/7
A many-to-many relationship must be corrected by creating an associative entity. The associative entity’s
primary key is a composite of the primary keys for the entities it associates. For example, students may take
several courses and a course will usually have several students enrolled in it. The new associative entity
will have a primary key of student id + course id. Note that the new entity is the perfect place to store a
student’s grade for that course.
the problem
the solution
A
B
A
AB
B
A
B
A
AB
B
A
B
A
AB
B
Repeating information within an entity have to be removed. For example, an order many be for many
items. We solve this by creating an attributive entity where each occurrence of the entity carries one
instance of the repeating data. The primary key for this attributive entity is also a composite key consisting
of the primary key of the original entity plus one other attribute that makes the composite key of the
attributive unique. In the order example, we create a detail entity whose primary key is equal to the order
number (the primary key for the order) plus the product code. Note that now we carry the customer id in the
order entity and the quantity ordered in the attributive entity for each item ordered.
A
AC
A
AC
CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
ERDs and normalization pg 3/7
Normalization mean cleaning up the entities according to the some clear rules.
If you consider the data within an entity, it can be presented as a table with the following characteristics:
1.
2.
3.
4.
Each entry (row-column intersection) has only one value
All entries in a column are instances of the same attribute
Each row is unique
Sequence of columns in not important
We might get a table such as the one below after looking at some orders for a company that ships nuts,
bolts, etc. This table satisfies the above criteria.
order
number
13405
13405
13405
13405
13406
13406
13407
13407
13408
13409
customer
Epsilon Ltd.
Epsilon Ltd.
Epsilon Ltd.
Epsilon Ltd.
Tau Corp.
Tau Corp.
Delta Inc.
Delta Inc.
Epsilon. Ltd.
Alpha Corp.
ship-to
address
Sarnia
Sarnia
Sarnia
Sarnia
Windsor
Windsor
Detroit
Detroit
Sarnia
Chicago
carrier
Fedex
Fedex
Fedex
Fedex
Fedex
Fedex
UPS
UPS
Fedex
UPS
ship
date
June 15
June 15
June 15
June 15
June 16
June 16
June 15
June 15
June 16
June 16
item
ordered
nails
bolts
screws
nuts
bolts
nuts
nails
screws
bolts
bolts
quantity
ordered
300
2,300
450
2,300
4,000
4,000
369
566
490
650
packaging
box
bin
box
bin
bin
bin
box
bin
box
bin
quantity
ordered
300
2,300
450
2,300
4,000
4,000
369
566
490
650
packaging
box
bin
box
bin
bin
bin
box
bin
box
bin
To make this table into first normal form (1NF) we must remove repeating data.
We do this by considering an order and seeing what is being repeated.
order
number
13405
carrier
Epsilon Ltd.
ship-to
address
Sarnia
Fedex
ship
date
June 15
13406
Tau Corp.
Windsor
Fedex
June 16
13407
Delta Inc.
Detroit
UPS
June 15
13408
13409
Epsilon. Ltd.
Alpha Corp.
Sarnia
Chicago
Fedex
UPS
June 16
June 16
customer
item
ordered
nails
bolts
screws
nuts
bolts
nuts
nails
screws
bolts
bolts
We remove the repeating data and put it into an attributive entity. So now we get the following two tables.
CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
The original table,
order
number
13405
13406
13407
13408
13409
And a new attributive table for the
repeating data. Note the composite
key (order number + item ordered)
customer
Epsilon Ltd.
Tau Corp.
Delta Inc.
Epsilon. Ltd.
Alpha Corp.
ERDs and normalization pg 4/7
ship-to
address
Sarnia
Windsor
Detroit
Sarnia
Chicago
order
number
13405
13405
13405
13405
13406
13406
13407
13407
13408
13409
item
ordered
nails
bolts
screws
nuts
bolts
nuts
nails
screws
bolts
bolts
To achieve second normal form (2NF) we remove non-key attributes
that do not depend on the entire key. By talking to the client you
found out that they always used bins if the quantities were large and
boxes if they were not. You were able to define that business rule
using this new table…
and remove the packaging attribute from the order/item
table. Now if your client decided that you could carry more
than 600 nails per box, orders would not need to be
changed.
order
number
13405
13405
13405
13405
13406
13406
13407
13407
13408
13409
carrier
Fedex
Fedex
UPS
Fedex
UPS
quantity
ordered
300
2,300
450
2,300
4,000
4,000
369
566
490
650
item
nails
bolts
screws
nuts
item
ordered
nails
bolts
screws
nuts
bolts
nuts
nails
screws
bolts
bolts
ship
date
June 15
June 16
June 15
June 16
June 16
packaging
box
bin
box
bin
bin
bin
box
bin
box
bin
box max.
600
800
500
800
quantity
ordered
300
2,300
450
2,300
4,000
4,000
369
566
490
650
CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
ERDs and normalization pg 5/7
To achieve third normal form (3NF)we remove non-key attributes that depend on other non-key
attributes.
After checking with the client, you found out that they usually
shipped by Fedex if the customer was in a Canadian city and by UPS
if the customer was in a city in USA. This means the carrier
(shipper) was dependent on location, not on the order. So you create
a new table defining this business rule.
You also found out that customers only have one location
for receiving products. This means that the customer’s shipto address is dependent on the customer and not the order.
So you need another table to hold this information.
And finally, your order table looks like this.
order
number
13405
13406
13407
13408
13409
ship-to
address
Sarnia
Windsor
Detroit
Chicago
customer
Epsilon Ltd.
Tau Corp.
Delta Inc.
Alpha Corp.
customer
Epsilon Ltd.
Tau Corp.
Delta Inc.
Epsilon. Ltd.
Alpha Corp.
carrier
Fedex
Fedex
UPS
UPS
ship-to
address
Sarnia
Windsor
Detroit
Chicago
ship
date
June 15
June 16
June 15
June 16
June 16
Now, for the five tables we ended up with, all non-key attributes depend on the whole primary key and
nothing but the whole primary key.
And we have avoided some common problems in database design:
insertion
you don’t have to supply information about customers and carriers at the same time
deletion
if you delete certain information you are not losing unrelated information
modification
if you change a carrier, you don’t have to change all outstanding orders using
the former carrier
CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
ERDs and normalization pg 6/7
Now we can draw the entity relationship diagram for our example. (The box maximum is dependent on the
key for the item entity.)
customer
order
carrier
order
detail
entity
customer
order
order detail
item
carrier
primary key
customer name
order number
order number +
item identification
item identification
ship-to address
item
foreign key
ship-to address
customer name
order number +
item identification
non-key attributes
ship-to date
quantity ordered
box maximum
carrier name
A foreign key is an attribute that is a primary key for another table. That’s how you get all the information
for each order in the original table we started with.
Note that the order/detail table is an attributive entity that holds the repeating data for orders, but it is also
an associative entity because it solves the many-to-many problem “an item can appear on many orders and
an order can be for many items”.
CSCC40
Analysis and Design of Information Systems
University of Toronto at Scarborough
ERDs and normalization pg 7/7
So how do you complete the modeling for an entire system? Let’s combine the above model with the
following information you need for billing.
item
price
nails
bolts
screws
nuts
customer
$0.15
$0.20
$0.17
$0.10
Epsilon Ltd.
Tau Corp.
Delta Inc.
Alpha Corp.
bill-to
address
Toronto
Hamilton
Chicago
Chicago
This is rather a trivial example because we already have entities with the same primary keys. We simply
add the new attributes to existing tables.
item
box max.
nails
bolts
screws
nuts
600
800
500
800
ship-to
address
Sarnia
Windsor
Detroit
Chicago
customer
Epsilon Ltd.
Tau Corp.
Delta Inc.
Alpha Corp.
price
$0.15
$0.20
$0.17
$0.10
bill-to
address
Toronto
Hamilton
Chicago
Chicago
But if we were presented with the entirely new tables, we would have to make sure that the resulting
information is 3NF and we could navigate using primary and foreign keys.
There are still a couple of interesting variations to cover.
Unary (self referential relationships) For example, in an organization where one employee reports to
another, you might find the following employee table. A foreign key attribute holds the employee id of a
person’s boss. Note that you could figure out the organization chart from this information. (presumably the
President’s foreign key is blank.)
entity
employee
primary key
employee id
foreign key
employee id (of the superior)
non-key attribute
address, birthday, etc
Is-a relationship (subclasses) For example, we might be carrying different information about full-time
and part-time employees. We solve this by creating the more tables. It is not a problem that two tables have
the same key. This is simply an instance of a primary key also functioning as a foreign key.
entity
employee
part-time employee
full time employee
primary key
employee id
employee id
employee id
non-key attribute
address, birthday, etc
hourly rate
salary