E/R model basics - Department of Computer Science

Entity-relationship model
Introduction
So far in the course, we have studied a lot about extracting useful information from an already-existing
database. However, we have only barely brushed up against getting data into a database. The next
segment of this course will focus on this latter problem: how to design a useful database schema, and
how to load (and alter) data in a database.
One of the most important steps in creating a database happens before ever touching a database
engine: how to model the slice of the real world that the database will eventually represent. A good
database design balances detail and simplicity, providing sufficient detail to answer the kinds of
questions that will be asked while providing a simplified view of the world that makes those questions
reasonably easy to ask and the answers reasonably easy to interpret.
To give a concrete example, suppose we had been tasked to create a database back-end for a CAD
(computer-aided design) tool that specializes in the automotive industry. Clearly, the kinds of “car
database” examples we’ve used in class would be woefully inadequate: a CAD engineer will not care in
the slightest about VIN numbers and license plates (other than where they will be found in the final
product); the engineer will care very much about the engine and wheels, but will demand far more
detail than anything you would find in a car lot advertisement. At the other extreme, a moleculeaccurate model of the car is also not useful to a car designer: nobody wants to manipulate individual
molecules and atoms when designing something you can see with the naked eye! Instead, a useful
database would probably store a collection of polygons in 3-D space, with adequate support to find and
group polygons in various ways (by the part they correspond to, the location relative to a virtual
camera’s field of view, etc.).
Since very few database designers are CAD experts, any real database design for our imaginary
application would have to start by exploring with tool developers what, precisely, they need the
database to provide for them; similarly, since the CAD tool developers are not database experts, they
would also need some guidance in expressing what they want in terms of what a database engine can
possibly do (we really can’t do 3-D rendering, for example). In practice, this early design process is
iterative, and proposed designs can change rapidly as people explore different options. Some form of
visual representation, or model, of the problem is extremely useful to facilitate the design process and
allow participants to express their thoughts clearly; we will focus on a particular model known as the
Entity-Relationship Model (ERM).
Entity-relationship model
The ER model consists of three basic concepts, illustrated in two different styles below:
name
title
Actors
address
•
•
•
StarsIn
Movies
role
year
Entity set [rectangle] – represents a type of object we wish to model, such as a “customers,”
“retailers” or “landmarks.” Each entity set can contain zero or more entities; “Canadian Tire”
and “Tim Hortons” might be entities from the entity set “Retailers.” Entities almost always
correspond to nouns in plain English descriptions of a design. 1
Relationship set [diamond] – represents a type of relationship or link between two or more
entity sets. Relationship sets often correspond to verbs (e.g. entities doing something to other
entities), but sometimes as nouns as well; many relationships are common enough—and
important enough—that they have their own name. Examples include “friendship,” “rivalry,”
and “marriage.” Note that such relationship nouns can usually be converted into verbs, such as
“friends with” or “married to.”
Attribute [oval] – represents a piece of information or detail about an entity or relationship. For
example, color is an attribute of many objects, while starting and ending dates are common
attributes of relationships. Attributes are often associated with adjectives and quantities in plain
English: colors (red, green, blue, etc.) or salary ($10,000, $50,000, etc.) are good examples.
As you can see from the above descriptions, there is some overlap between the concepts: some
relationships are almost important enough to be entities in their own right; the same is true for some
types of attributes, such as addresses. This flexibility is good because it allows different designs to
model the real world in different ways and at different levels of detail. We will come back to this issue
later on.
Entity vs. entity set
One point to be careful of: an entity set is a class of things, while an entity is one object of the class. For
example, “Batman Returns" would be an entity from the entity set “movies.” Similarly, CSCC43 would be
an entity from the entity set "courses." A similar distinction exists for relationships vs. relationship sets;
Romeo and Juliet as well as Anthony and Cleopatra would both be relationships from the relationship
set "ill-fated romances."
In relational algebra these would correspond to relations and tuples, while in object oriented
programming these would correspond to classes and objects or instances.
Recursive relationships
1
The converse is not always true: not all nouns should be represented as Entities!
Many relationship sets join two or more entity sets, but some join the
same entity set twice. These are known as recursive relationships. In
some relationships the two entities involved are symmetric, such as
friends; in other cases the relationship is asymmetric, such as employee
and boss. In the last example, both employee and boss are people, but
there's a big difference between them as far as the relationship is
concerned. Another example is shown on the right: two employees can
be colleagues, and we don't usually distinguish between the two.
However in the case of sovereign succession, it matters very much was
the predecessor and two is the successor.
N-way relationships
Many relationships involve more than two entity sets.
Roles
For example, we may need to model actors in movies
using a ternary relationship if we allow actors to play
multiple roles. A binary relationship between actors and
movies may not suffice, especially if the roles being
Actors
StarsIn
Movies
played span multiple movies and have been promoted to
proper entities. As another example, each line item from the TPC-H schema involves an order, a part,
and a supplier. In practice, it's rare to see anything more complicated than three-way relationships, but
in theory any number of entity sets can be involved in a relationship set.
Attributes
Attributes describe elementary properties
of entities or relationships (e.g., Surname,
Salary and Age are attributes of
Employee, as shown on the right).
Attributes are used to describe details,
aspects of the relationship or entity set that are not important enough to be entities their own right. For
example, most of the time we don't care to track age 10 vs. age 12 separately. In the student example
shown here, both entities and relationships have attributes. Note that the set of attributes we choose
to add or model depends strongly on the application. Under some circumstances it might be very
relevant to store the student's name along with their student number and enrolment date. Similarly, if
all courses always had exactly one exam, it might not be very important to track the date the exam is
given.
Sometimes, it's attractive to model groups of
related attributes as composite values. The
example of the right groups the different parts of
an address into a single composite attribute. It
might also make sense to group surname and
given name into a single composite name field.
You should always be careful about using composite attributes, however, because they can also be a
warning sign that you're trying to represent an entity set with attributes. Many databases track
addresses as entities in their own right, for example. In the end, it's up to the database designer to
decide whether the composite attribute is important enough to promote to an entity set.
Putting it together
The figure above gives a more complete example of using entities, relationships, and attributes to model
a company hierarchy. Observe that employees and departments can be involved in different kinds of
relationships: all employees are presumably members of a department, and some employees also
manage departments.
Constraints
What important purpose of modeling a database is to identify the constraints that the database should
impose on the data stored in it. The most common constraints involve cardinality: how many times a
given entity may, or must, participate in a given a relationship. In the previous example of a company
hierarchy, it would be reasonable to enforce that every employee must belong to one department, and
no more than one department. It might also be reasonable to enforce that no employee may manage
more than one department, while allowing most employees not to manage any departments. We
usually represent cardinality constraints using a pair of numbers in parentheses: (m,n). In this case, m
represents the minimum cardinality, and n represents the maximum cardinality. If M is zero, the
relationship is optional, in the sense that an entity need not participate. If M is one or greater, the
entity must participate at least that many times. N gives a maximum cardinality, and prevents an entity
from participating too many times.
In the example to the right, an order can
participate in at most one sale, and every
invoice is tied to a sale. Cardinality
restrictions apply to the entities, not the
relationship itself: by definition every sale
involves exactly one order and one invoice.
The only question is whether orders and
invoices participate in sales. It's up to the
database designer to decide what cardinalities make sense. It probably doesn't make sense to have a
city with no inhabitants residing in it, for example, but a database engine would be happy to enforce the
constraint if we asked it to.
Technically, it is possible to have multi valued attributes, or attributes which can occur more than once
in an entity or relationship. It's entirely possible for students to have more than one e-mail address, for
example. In general, you should avoid using multi valued attributes, and convert them into entities
instead, because the relational can't express multi valued attributes.
Classes
address
The ER model provides a powerful feature similar to object oriented
People
name
languages, where a given entity set can be refined or extended with
subclasses. Subclasses in the ER model are not exactly the same as in the
isa
isa
object oriented model, however. In object oriented languages, a given
object is always a member of exactly one class, while the ER model
Actors Directors
permits a given entity to be a member of multiple classes simultaneously.
An "actor" entity is also a "people" entity, and is allowed to be a
"director" entity as well. In fact, there is no need for the different classes to even form a hierarchy. If a
movie database needs to track animal actors, for example, it would make sense not to have the actors
class descend from people. That way, you can have an actor who is a dog as well as an actor who is a
person.
In the general case, it's best to think of classes in ER as auxiliary information that can be added to other
entities. Someone can be an actor, a student, and a piano player, all at the same time, and the
attributes of all those classes will simply be unioned together. Try to express that in the object oriented
model!
Keys and a weak entity sets
Just like in relational algebra, the ER model has a
concept of keys. The key of an entity set is a set
of attributes that uniquely identifies each
possible entity in the set. The key of a
relationship set is the union of keys from entities
participating in the relationship.
name
number
Players
name
PlaysFor
(1, 1)
(0, N)
Teams
Sometimes, the attributes of an entity are
not enough to distinguish it from other
name
entities. For example, a player on a sports
(1, 1)
team is normally identified by jersey
In
(0, N)
number, but nothing prevents two players
Subdomain
Subdomains
on different teams from having the same
(1, 1)
name
jersey numbers. It's actually the
In
combination of jersey number and team
Domain
Domains name that uniquely identifies a player. In
(0, N)
this case, we would call the players entity
set a "weak entity set" because the jersey number only weakly identifies the entity. Weak entity sets
can be chained: a weak entity to be related to yet another weak entity and so forth. In the example to
the left, knowing the host name of a machine on a network does very little to identify that machine
globally, unless we include information about subdomain and domain. In general, we can chain
together any number of weak entities, as long as we do not form any cycles among them. As we will see
later, there's a strong relationship between weak entity sets in ER and foreign keys in relational algebra.
Hosts
name