The Entity-Relationship Model

Database Systems I
The Entity-Relationship Model
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
176
Overview of Database Development
Requirements Analysis
• What data are to be stored in the enterprise?
• What are the required applications?
• What are the most important operations?
High-level database design
•
•
•
What are the entities and relationships in the enterprise?
What information about these entities and
relationships should we store in the database?
What are the integrity constraints or business rules that
hold?
ER model or UML to represent high-level design
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
177
Overview of Database Development
Conceptual database design
•
•
What data model to implement the DBS?
E.g., relational data model
Map the high-level design (e.g., ER diagram) to a
(conceptual) database schema of the chosen data model.
Physical database design
•
•
•
•
What DBMS to use?
What are the typical workloads of the DBS?
Build indexes to support efficient query processing.
What redesign of the conceptual database schema is
necessary from the point of view of efficient
implementation?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
178
Overview of Database Development
Requirements Analysis / Ideas
High-Level Database Design
Conceptual Database Design / Relational Database Schema
Physical Database Design / Relational DBMS
 Similar to software development
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
179
Entity-Relationship Model
Short: ER model.
A lot of similarities with other modeling
languages such as UML.
Concepts
•
•
•
•
Entities / Entity sets,
Attributes,
Relationships/ Relationship sets, and
Constraints.
Offers more modeling concepts than the
relational data model (which only offers
relations).
Closer to the way in which people think.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
180
Entity-Relationship Diagrams
An Entity-Relationship diagram (ER diagram) is a
graph with nodes representing entity sets,
attributes and relationship sets.
Entity sets denoted by rectangles.
Attributes denoted by ovals.
Relationship sets denoted by diamonds.
Edges (lines) connect entity sets to their
attributes and relationship sets to their entity
sets.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
181
Entities and Entity Sets
Entity: Real-world object distinguishable from
other objects, e.g. employee Miller.
Entity can be physical or abstract object.
An entity is associated with attributes
describing its properties.
Attribute values are atomic, e.g. strings, integer
or real numbers.
Some variations of the ER model support
structured attributes.
Entity set: A collection of similar entities.
E.g., all employees.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
182
Entities and Entity Sets
ssn
name
age
Employees
All entities in an entity set have the same set of
attributes. (At least, for the moment!)
Each entity set has a key, i.e. a minimal set of
attributes to uniquely identify an entity of this
set. Key attributes are underlined.
Each attribute has a domain, i.e. a set of all
possible attribute values.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
183
Entities and Entity Sets
firstname
lastname
birthdate
Employees
salary
A key must be unique across all possible (not just
the current) entities of its set.
A key can consist of more than one attribute.
There can be more than one key for a given
entity set, but we choose one (primary key) for
the ER diagram.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
184
Relationships and Relationship Sets
Relationship: Association among two or more
entities. E.g., Miller works in Pharmacy
department.
Relationship set: Collection of similar
relationships among two or more entity sets.
name
ssn
dname
age
Employees
did
Works_In
budget
Departments
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
185
Relationships and Relationship Sets
An n-ary relationship set R
relates n entity sets E1 ... En.
Each relationship in R involves
entities e1 E1, ..., en  En.
Binary relationship sets most
common.
Same entity set can participate in
different relationship sets, or in
different “roles” in same set.
name
ssn
age
Employees
supervisor
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
subordinate
Reports_To
186
Relationships and Relationship Sets
Relationship sets can also have attributes.
Useful for properties that cannot reasonably
be associated with one of the participating
entity sets.
since
name
ssn
dname
age
Employees
did
Works_In
budget
Departments
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
187
Instances of an ER Diagram
Entity set contains a set of entities. Each
entity has one value for each of its
attributes.
No duplicate instances.
Employees
ssn
name
age
12345678
“John Miller” 30
14789632
“Paul Li”
25
...
...
...
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
188
Instances of an ER Diagram
Relationship set contains a set (no duplicates!)
of relationships, each relating a set of entities,
one from each of the participating entity sets.
Components are entities, not attribute values.
Works_In
Employee (ssn)
Department (did)
12345678
1
14789632
1
56756322
2
...
...
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
189
Relationships and Relationship Sets
Multiway relationship sets (n > 2) are used
whenever binary relationships cannot
capture the application semantics.
description
name
ssn
tid
age
Works_For
Employees
Tasks
Projects
pid
pbudget
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
190
Relationships and Relationship Sets
name
ssn
tid
age
Works_For
Employees
description
Tasks
Projects
pid
Works_For
pbudget
Employee (ssn) Tasks (tid)
Project (pid)
12345678
1000
101
12345678
1500
106
56756322
1500
106
...
...
...
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
191
Multiplicity of Relationships
An employee can
work in many
departments; a
dept can have
many employees.
Each dept has at
most one manager,
who may manage
several (many)
departments.
since
name
ssn
dname
age
Works_In
Employees
budget
did
Departments
since
name
ssn
dname
age
Employees
did
Manages
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
budget
Departments
192
Multiplicity of Relationships
The different types of (binary) relationships
from a multiplicity point of view:
•
•
•
•
One to one
One to many
Many to one
Many to many
one-to-one
one-to-many
many-to-one
many-to-many
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
193
Key Constraints
A key constraint on a relationship set specifies
that the marked entity set participates in at most
one relationship of this relationship set.
Entity set is marked with an arrow.
since
name
ssn
dname
age
Employees
did
Manages
budget
Departments
Key constraint
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
194
Participation Constraints
A participation constraint on a relationship set
specifies that the marked entity set participates
in at least one relationship of this relationship set.
Entity set is marked with a bold line.
since
name
ssn
dname
did
age
Employees
Manages
budget
Departments
Works_In
since
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
Participation
constraint
195
Weak Entities
A weak entity exists only in the context of another
(owner) entity.
The weak entity can be identified uniquely only by
considering the primary key of the owner and its own
partial key.
•
•
Owner entity set and weak entity set must participate in a oneto-many relationship set (one owner, many weak entities).
Weak entity set must have total participation in this supporting
relationship set.
name
ssn
age
Employees
cost
name
Policy
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
age
Dependents
196
Subclasses
Sometimes, an entity set contains some entities
that do share many, but not all properties with
the entity set. In this case, we want to define class
(entity set) hierarchies.
A ISA B: every A entity is also considered to be
a B entity. A specializes B, B generalizes A.
A is called subclass, B is called superclass.
A subclass inherits the attributes of a superclass,
and may define additional attributes.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
197
Subclasses
name
ssn
age
Employees
hourly_wages
hours_worked
ISA
contractid
Hourly_Emps
Contract_Emps
Hourly_Emps and Contract_Emps inherit the ssn
(key!), name and age attributes from Employees.
They define additional attributes hourly_wages,
hours_worked and contractid, resp.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
198
Subclasses
Overlap constraints:
Can Joe be an Hourly_Emps as well as a
Contract_Emps entity?
(Hourly_Emps OVERLAPS Contract_Emps)
Covering constraints:
Does every Employees entity have to be either
an Hourly_Emps or a Contract_Emps entity?
Hourly_Emps AND Contract_Emps COVER Employees
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
199
Subclasses
There are several good reasons for using ISA
relationships and subclasses:
• Do not have to redefine all the attributes.
• Can add descriptive attributes specific to a
subclass.
• To identify entitity sets that participate in a
relationship set as precisely as possible.
ISA relationships form a tree structure
(taxonomy) with one entity set serving as root.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
200
Design Principles
Faithfulness
•
•
Design must be faithful to the specification / reality.
Relevant aspects of reality must be represented in
the model.
Avoiding redundancy
•
•
•
Redundant representation blows up ER diagram
and makes it harder to understand.
Redundant representation wastes storage.
Redundancy may lead to inconsistencies in the
database.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
201
Design Principles
Keep it simple
•
•
•
The simpler, the easier to understand for some
(external) reader of the ER diagrams.
Avoid introducing more elements than necessary.
If possible, prefer attributes over entity sets and
relationship sets.
Formulate constraints as far as possible
•
•
A lot of data semantics can (and should) be
captured.
But some constraints cannot be captured in ER
diagrams.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
202
High-Level Design With ER Model
Major design choices
• Should a concept be modeled as an entity or
an attribute?
• Should a concept be modeled as an entity or
a relationship?
• What relationships to use: binary or ternary?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
203
Entity vs. Attribute
Should address be an attribute of Employees or
an entity (connected to Employees by a
relationship)?
Depends upon the use we want to make of
address information, and the semantics of the
data:
If we have several addresses per employee,
address must be an entity (since attributes cannot
be set-valued).
If the structure (city, street, etc.) is important,
e.g., we want to retrieve employees in a given
city, address must be modeled as an entity (since
attribute values are atomic).
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
204
Entity vs. Attribute
Works_In2 does not
allow an employee to
name
ssn
work in the same
department for two or Employees
more periods (why?).
We want to record
several values of the
descriptive attributes for
each instance of this
relationship.
from
to
dname
lot
did
budget
Departments
Works_In2
name
dname
ssn
lot
Employees
from
did
Works_In3
Duration
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
budget
Departments
to
205
Entity vs. Relationship
since
name
ssn
dbudget
lot
Employees
dname
did
Manages2
budget
Departments
This ER diagram o.k. if a manager gets a separate
discretionary budget for each dept.
But what if a manager gets a discretionary
budget that covers all managed depts?
•
•
Redundancy of dbudget, which is stored for each dept
managed by the manager.
Misleading: suggests dbudget tied to managed dept.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
206
Entity vs. Relationship
What about
this diagram?
since
name
did
dbudget
ssn
Employees
The following
ER diagram is
more
appropriate
and avoids the
above
problems!
dname
lot
Manages2
Departments
name
ssn
budget
dname
lot
Employees
did
budget
Departments
Manages3
since
apptnum
Mgr_Appts
dbudget
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
207
Binary vs. Ternary Relationships
name
ssn
pname
lot
Employees
age
Dependents
Covers
Policies
policyid
cost
If each policy is owned by just one employee:
•
•
Key constraint on Policies would mean policy can
only cover 1 dependent!
Bad design!
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
208
Binary vs. Ternary Relationships
This diagram
is a better
design.
name
ssn
pname
lot
Dependents
Employees
What are the
additional
constraints in
this diagram?
age
Purchaser
Beneficiary
Policies
policyid
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
cost
209
Binary vs. Ternary Relationships
Previous example illustrated a case when two
binary relationships were better than one ternary
relationship.
An example in the other direction: a ternary
relation Contracts relates entity sets Parts,
Departments and Suppliers, and has descriptive
attribute qty. No combination of binary
relationships is an adequate substitute:
•
•
S “can-supply” P, D “needs” P, and D “deals-with” S
does not imply that D has agreed to buy P from S.
How do we record qty?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
210
Conceptual Design: ER to Relational
How to represent
• Entity sets,
• Relationship sets,
• Attributes,
• Key and participation constraints,
• Subclasses,
• Weak entity sets
...?
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
211
Entity Sets
Entity sets are translated to tables.
ssn
name
Employees
lot
CREATE TABLE Employees
(ssn CHAR(11),
name CHAR(20),
lot INTEGER,
PRIMARY KEY (ssn));
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
212
Relationship Sets
Relationship sets are
also translated to tables.
• Keys for each
participating entity
set (as foreign keys).
The combination of
these keys forms a
superkey for the table.
•
All descriptive
attributes
of the relationship set.
CREATE TABLE Works_In(
ssn CHAR(11),
did INTEGER,
since DATE,
PRIMARY KEY (ssn, did),
FOREIGN KEY (ssn)
REFERENCES Employees,
FOREIGN KEY (did)
REFERENCES Departments);
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
213
Key Constraints
Each dept has at
most one manager,
according to the
key constraint on
Manages.
since
name
ssn
dname
lot
Employees
did
Manages
budget
Departments
Translation to
relational model?
one-to-one
one-to-many
many-to-one
many-to-many
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
214
Key Constraints
Map relationship set
to a table:
• Separate tables for
Employees and
Departments.
• Note that did is
the key now!
Since each
department has a
unique manager, we
could instead
combine Manages
and Departments.
CREATE TABLE Manages(
ssn CHAR(11),
did INTEGER,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
FOREIGN KEY (did) REFERENCES Departments)
CREATE TABLE Dept_Mgr(
did INTEGER,
dname CHAR(20),
budget REAL,
manager CHAR(11),
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (manager) REFERENCES Employees)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
215
Participation Constraints
We can capture participation constraints involving
one entity set in a binary relationship, using NOT
NULL.
In other cases, we need CHECK constraints.
CREATE TABLE Dept_Mgr(
did INTEGER,
dname CHAR(20),
budget REAL,
manager CHAR(11) NOT NULL,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (manager) REFERENCES Employees,
ON DELETE NO ACTION)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
216
Weak Entity Sets
A weak entity set can be identified uniquely only
by considering the primary key of another
(owner) entity set.
•
•
Owner entity set and weak entity set must
participate in a one-to-many relationship set (one
owner, many weak entities).
Weak entity set must have total participation in this
identifying relationship set.
name
ssn
lot
Employees
cost
pname
Policy
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
age
Dependents
217
Weak Entity Sets
Weak entity set and identifying relationship
set are translated into a single table.
•
When the owner entity is deleted, all owned weak
entities must also be deleted.
CREATE TABLE Dep_Policy (
pname CHAR(20),
age INTEGER,
cost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
218
Subclasses
If we declare A ISA B, every A
entity is also considered to be a B
entity.
Attributes of B are inherited to A.
Overlap constraints: Can Joe be an
Hourly_Emps as well as a
Contract_Emps entity?
(Allowed/disallowed)
hourly_wages
name
ssn
lot
Employees
hours_worked
ISA
Covering constraints:
Does every Employees entity
either have to be an Hourly_Emps
or a Contract_Emps entity?
(Yes/no)
contractid
Hourly_Emps
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
Contract_Emps
219
Subclasses
ER style translation
•
•
•
•
One table for each of the entity sets (superclass and
subclasses).
ISA relationship does not require additional table.
All tables have the same key, i.e. the key of the
superclass.
E.g.: One table each for Employees, Hourly_Emps
and Contract_Emps.
General employee attributes are recorded in
Employees. For hourly emps and contract emps,
extra info recorded in the respective relations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
220
Subclasses
CREATE TABLE Employees(
ssn CHAR(11),
name CHAR(20),
lot INTEGER,
PRIMARY KEY (ssn))
CREATE TABLE Hourly_Emps(
ssn CHAR(11),
hourly_wages REAL,
hours_worked INTEGER,
PRIMARY KEY (ssn),
FOREIGN KEY (ssn)
REFERENCES Employees,
ON DELETE CASCADE)
Queries involving all employees easy, those
involving just Hourly_Emps require a join to get
their special attributes.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
221
Subclasses
Alternative translation
• Create tables for the subclasses only. These tables
have all attributes of the superclass(es) and the
subclass.
• This approach is applicable only if the subclasses
cover the superclass.
• E.g.:
Hourly_Emps: ssn, name, lot, hourly_wages,hours_worked.
Contract_Emps: ssn, name, lot, contractid.
Queries involving all employees difficult, those on
Hourly_Emps and Contract_Emps alone are easy.
Only applicable, if
Hourly_Emps AND Contract_Emps COVER
Employees
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
222
Binary vs. Ternary Relationships
name
ssn
If each policy is
owned by just
one employee:
•
Key constraint
on Policies
would mean
policy can only
cover one
dependent!
pname
lot
Employees
Dependents
Covers
Bad design
Policies
policyid
cost
name
ssn
age
pname
lot
age
Dependents
Employees
Purchaser
Better design
policyid
Beneficiary
Policies
cost
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
223
Binary vs. Ternary Relationships
CREATE TABLE Policies (
policyid INTEGER,
The key
cost REAL,
constraints allow ssn CHAR(11) NOT NULL,
us to combine
PRIMARY KEY (policyid).
Purchaser with
FOREIGN KEY (ssn) REFERENCES Employees,
Policies and
ON DELETE CASCADE)
Beneficiary with CREATE TABLE Dependents (
Dependents.
pname CHAR(20),
Participation
age INTEGER,
constraints lead
policyid INTEGER NOT NULL,
to NOT NULL
PRIMARY KEY (pname, policyid).
constraints.
FOREIGN KEY (policyid) REFERENCES Policies,
ON DELETE CASCADE)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
224
Summary
High-level design follows requirements analysis and
yields a high-level description of data to be stored.
ER model popular for high-level design.
•
Constructs are expressive, close to the way people think
about their applications.
Basic constructs: entities, relationships, and attributes
(of entities and relationships).
Some additional constructs: weak entities, subclasses,
and constraints.
ER design is subjective. There are often many ways
to model a given scenario! Analyzing alternatives
can be tricky, especially for a large enterprise.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
225
Summary
There are guidelines to translate ER diagrams
to a relational database schema.
However, there are often alternatives that need
to be carefully considered.
Entity sets and relationship sets are all
represented by relations.
Some constructs of the ER model cannot be
easily translated, e.g. multiple participation
constraints.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester
226