Chapter 3 - University of Manitoba

MIS 2000 * Data Analysis & Diagramming * Bob Travica
Chapter 3
Data Analysis and Diagramming
Introduction
This chapter introduces data analysis and data diagramming. These make one of core skills
taught in this course. A big part of any skill is practical or procedural (how-to-do) knowledge. So
you will study how to run data analysis and how to create data diagrams.
Data analysis implies looking at the world so that you recognize important objects that can be
represented in information systems (IS). More precisely, business objects are represented with
appropriate data in databases as the heart of any IS. A result of data analysis can be graphically
presented in diagrams. Data analysis and diagraming are essential for study of IS. But these are
also useful in other areas; for example, in supply chain, production, marketing, and human
resources.
Data analysis and data diagramming is necessary to perform in order to design information
systems. In addition, understanding data diagramming complements analysis of business
process, another core skill taught in this course. In every process, data are used as well created,
stored, transformed, deleted, or transferred. Data diagrams represent these data. You will
study one of these called schema or tables diagram.
Creating Data Diagrams
Components of a Data Diagram
A data diagram typically consists of four components:
1. Entity. An entity is a representation of some business object. For example, an object in
business reality is customer (person) and it is represented the entity customer in a
database. When talking about entities, we use nouns.
Entities can be:




persons (e.g., customer, student, employee)
physical things and services (e.g., tangible products like car, services supplied like
courses)
organizational units (e.g., a university department)
events (e.g., purchase, sale, course registration)
1
MIS 2000 * Data Analysis & Diagramming * Bob Travica

concepts (e.g., student performance, employee profile, purchase order, customer
order).
Specific examples of an entity are called instances (e.g., the customer John Jones is an instance
of the entity Customer).
2. Attribute. An attribute is a piece of data, an aspect of an entity (e.g., customer number,
name, address, and telephone number are all attributes of the entity customer). (Recall
knowledge of relational databases you may have.) When talking about attributes, we use
nouns. This can be somewhat confusing since entities are indicated in the same way.
An attribute of a particular importance is called primary key (key). It is important
because it identifies each individual instance of an entity. Note that a key can be a
combination of attributes and not just a single attribute. When a primary key appears in
another table, it is the foreign key.
3. Relationship. A relationship represents a connection between entities (e.g., a customer
places an order). A relationship in a data diagram usually reflects a relationship between
things in reality. When talking about relationships, we use verbs since something performs
some action on something else.
4. Multiplicity. Multiplicity defines how many instances of related entities can participate in
the relationship (e.g., a customer may place many orders, and each order is placed by one
particular customer). In a data diagram, multiplicity is specified by numbers (e.g., 1, 2…) and
letters (e.g., M for “many”, that is, some quantity that varies).
Note that multiplicity is included for the sake of having a complete data diagram. You
will not be tested on multiplicity.
The figure below is a data diagram containing all the four components discussed above. Note
how the keys are marked by underlying and named (using the entity name and words “ID” or
“Number”). The data types behind the keys named with “ID” may also be numbers or some
combination of alphanumeric and special characters (e.g., the vehicle registration number).
Entities
CUSTOMER
CustomerID
LastName
MiddleName
FirstName
Address
City
Province
PostalCode
Phone
1
places
M
ORDER
OrderNumber
CustomerID
OrderDate
Relationship
Foreign Key
Multiplicity
Primary Keys
Attributes
Figure 1. Data Diagram for a Customer Order System
2
MIS 2000 * Data Analysis & Diagramming * Bob Travica
Reading of the diagram above: A customer places many orders (M = many). And the other way,
each (one) order is placed by one customer. (The latter part is less apparent, but do not worry
about it.)
A Procedure for Developing Data Diagram
Typically, you will start with a case study or perhaps some business documents belonging to the
company in case. The following process will demonstrate how to convert your findings into a
data diagram.
1. Identify Entities. Identify the persons, organizations, things, events, and concepts that
you want to present as entities in your data diagram.
2. Identify Relationships. Figure out relationships between pairs of entities. It helps to
name relationships to see more clearly the part of reality the diagram captures.
3. Draw a Rough Diagram. Draw rectangles for entities and lines for relationships
connecting the entities.
4. Define Primary Keys. Identify the data attribute(s) that can be used for identifying
uniquely each instance of an entity. Write the keys into the entity boxes and underline
each.
5. Identify Attributes. List other data attribute(s) for your entities.
6. Map Attributes into Entities. For each attribute, match it with that specific entity it
belongs to and write the attribute name in its entity symbol.
7. Add Foreign Keys. Note that this is an attribute that is the primary key in another table
and that it serves for connecting entities.
8. Draw a Fully Attributed Diagram. Add any remaining attributes to their entities, and any
other detail needed to complete the diagram.
9. Check Results. Ask yourself, does the final data diagram accurately depict the case I am
analyzing?
3
MIS 2000 * Data Analysis & Diagramming * Bob Travica
Example
The above procedure will be illustrated by working out the following case. A customers in a
store BigBuy places orders with the store. An order contains products. The important data are
customer`s first name, middle name, last name, street address, city, province, postal code and
phone numbers, product name, unit price, order date, and the quantity of the product ordered.
Each of the following sections corresponds to a specific step above.
1. Identify Entities
Mark those words that you think may correspond to entities in the data diagram you want to
draw. For example:
A customer places orders with the store. An order contains products.
Notice that the store is not thought of as an entity. The store is location of sales operations, or
the subject that records these operations. As such, the store is not a data entity to be
represented in an information system.
2. Identify Relationships
In this step, the aim is to identify relationships, that is, the connections between pairs of
entities. There are minimally two relationships in our example. Focus on words in italics:


A customer places an order.
An order contains products.
It is a custom to read relationships from left to right and from top to bottom. Sometimes you
may need to arrange entities so that this order is supported. If this is not possible, an
arrowhead can be used to guide the reader (e.g., Customer <applies to PurchasingHistory).
3. Draw a Rough Data Diagram
Create a rough diagram based the description from step 2 as shown in Figure 2.
Customer
places
contains
Order
Product
Figure 2. A Partial Data Diagram
4
MIS 2000 * Data Analysis & Diagramming * Bob Travica
4. Define primary keys
A primary key is an attribute, or a group of attributes, that can be used to uniquely identify a
specific instance of an entity. The name "Bob Smith" is not a primary key as there are many
people with that name. Whole numbers are better to use for primary keys because each
number is unique. In this example, the keys are: CustomerID, ProductID, and OrderNumber.
Customer
CustomerID
Order
places OrderNumber
contains
Product
ProductID
Figure 3. A Partial Data Diagram with Primary Keys
5. Identify Attributes
A data attribute is an aspect or characteristic common to all or most instances of a particular
entity. In this step, you try to identify and name all the attributes essential to the business you
are studying without trying to match them to particular entities. The best way to do this is by
studying forms, files, and reports currently available and taking a note of each potential
attribute. Cross out extraneous items such as signatures and data that repeats (e.g., the
company name and address). If so indicated, cross out any attributes that are no longer used or
will not be used in the future. The remaining items should represent the attributes you need.
The attributes indicated in our case are the customer first name, customer middle name,
customer last name, street address, city, province, postal code, product name, unit price,
quantity in stock, order date, and quantity of product ordered.
6. Map Attributes
For each attribute, you need to match it with exactly one entity. Often it seems like an attribute
should go with more than one entity (e.g., name). In this case, you need to add a modifier to
the attribute name to make it unique (e.g., customer name vs. product name). When an
attribute may belong to different entities, the rule of thumbs is to determine which entity an
attribute describes “best.” For example, the attribute unit price logically belongs to the entity
product rather than order.
7. Add Foreign Keys
Foreign keys are not “genuine” attributes but are added to support relationships in a relational
database. The word “foreign key” is usually shortened to FK, as opposed to PK for “primary
key.” A FK is created by reading a relationship from a source (e.g., Customer) to a destination
5
MIS 2000 * Data Analysis & Diagramming * Bob Travica
(e.g., Order), and then “exporting” the PK from the source to the destination. So this PK
becomes a FK.
In this example, the PK named CustomerID in the entity Customer becomes the FK named
CustomerID in entity Order. A PK is always underlined (or marked in some other way such as
boldfacing), while FK is not so marked. We recognize than an attribute is a FK by following the
association line starting from a PK then looking at the attribute in another entity the line ends
with.
8. Draw a Fully Attributed Data Diagram (Schema)
If you have attributes-leftovers without corresponding entities, you may have missed an entity
and its corresponding relationships. Identify these, and add them to your list. In our example,
there is just one such “odd” attribute, the quantity of products being ordered –
QuantityOrdered. This attribute does not really belong to either Product or Order but to both –
to a product that is on order. This is a “bridge” entity – technically called association entity –
because it belongs to the association between two entities rather to any of them. An
association entity is represented in a separate table.
In our example, the association entity is OrderDetail, which contains the attribute
QuantityOrdered (see Figure 4). Therefore, one order can contain 1kg of apples and 1 kg of
oranges, which are represented by their ProductIDs. Another order contains 2kg apples; etc.
Notice that an order can have one or more products. Also, the same product (apples) appears
on different orders. This possibility shows another purpose of association entity to interface the
entities that both have the multiplicity of many (like Product and Order).
Customer
(Person)
Entity Name (Type)
Order
Product
(Concept)
(Thing)
Primary Key
CustomerID
OrderNumber
ProductID
Foreign Key
none
CustomerID
none
Other
Attributes
LastName
MiddleName
FirstName
Address
City
Province
PostalCode
Phone1
OrderDate
ProductName
UnitPrice
Attribute
Figure 4. Attribute-Entity Mapping
6
OrderDetail –
(Association Entity)
Combines keys of
associated entities –
OrderNumber and
ProductID
OrderNumber and
ProductID
QuantityOrdered
MIS 2000 * Data Analysis & Diagramming * Bob Travica
An association entity is really a special case of entity with some unusual characteristics (call it
weird, if you wish). Because it does not stand on its own, it does not have a key on its own.
Rather, it takes keys from the associated entities. That is one weird thing. Another is that this
key obviously takes two attributes, so sometimes it is called combined key. In this example, it is
the combination of the primary keys OrderNumber and ProductID. And another weird thing:
the PK is created out of FKs! Check Figure 4.
Using the steps covered above and Figure 4, we can complete the data diagram as in Figure 5. It
has all the entities, associations and attributes; it is ”fully attributed.”
The diagram in Figure 5 is called schema. It shows entities that, technically speaking, are
implemented in tables as they appear in a relational database. If you implement in MS Access
the design discussed in this case, you will get the tables as these we have arrived at. In MS
Access, you can see a schema with the function called Relationships. Note that MS Access will
automatically write a correct multiplicity provided that you draw relationships between tables
by using a proper procedure: When you get tables via function Relationships, click a PK and while not releasing the mouse button - draw a line to a desired FK in another table (e.g., draw a
line between CustomerID in table Customer and CustomerID in table Order).
CUSTOMER
CustomerID
LastName
MiddleName
FirstName
Address
City
Province
PostalCode
Phone1
Phone2
Email
1
ORDER
M
OrderNumber
OrderDate
CustomerID
1
M
ORDER_DETAIL
OrderNumber
ProductID
QuantityOrdered
PRODUCT
ProductID
ProductName
UnitPrice
M
1
Figure 5. Complete Diagram (Schema) of Customer Ordering System
Note: Associations are not named in a schema. Names are part of the analysis process and help
to understand what is going on. If you include association names in a schema, that will not be
treated as a mistake. For example, Customer places Order; Order has Order_Detail, Product
is specified in Order_Detail.
9. Check Your Results
Look at your diagram from the point of view of a person who is familiar with the situation,
form, or process being modeled. Is everything clear? Also, look over the list of attributes
associated with each entity to see if anything has been omitted.
7
MIS 2000 * Data Analysis & Diagramming * Bob Travica
The diagram we have got – schema – shows entities as tables. You see a schema when you get the
Relationships report in MS Access. The association entity Order_Detail becomes a table
Order_Detail, and tables Order and Product are associated through it.
Notice how associations connect the primary and foreign keys. The underlined attribute is the
primary key. The foreign key is not underlined even though it plays a role in establishing
associations between tables.
Summary
Data analysis represents a part of business, such as business process, with concepts of data
entities, attributes belonging to an entity, and associations between entities.
We use nouns to talk about entities and verbs to indicate associations. Although attributes are
also indicated by nouns, an attribute is a part of an entity.
Special attributes are primary key (PK) and foreign key (FK). PK uniquely identifies each instance
of an entity. FK is an attribute that is the PK in another table, and it is used to establish
associations between entities implemented in a relational database. Two tables get associated
by linking a PK in one table with a FK in another table.
A special sort of entity is association entity. It results from a relation between two entities, and
so it does not have a PK on its own. Rather, it combines FKs that reference the associated tables
to create PK. The role of association entity is to store attributes that do not belong to either of
the associated entities but to their relationship. These attributes may vary whenever those
entities get associated.
In a relational database, entities are implemented as tables. A diagram of tables, along with
their attributes and associations, is called schema. Everything that applies to entities applies to
tables as well.
Questions for Review
1. What are the differences between entity, attribute, and association? Give examples.
2. When you read a description of some business situation how can you recognize an entity as
opposed to an attribute? And how can you recognize an association? Give examples.
3. What is special about association entity? Provide an example.
4. How do PK and FK help tables to get associated?
8
MIS 2000 * Data Analysis & Diagramming * Bob Travica
5. What is schema? What are the main items in a schema?
Exercise
Completing the following exercise will help you to:
 Practice identifying entities, attributes, and relationships
 Practice creating schema (tables diagram).
The Speed Skating Canada Performance Tracking System
Speed Skating Canada (SSC) maintains performance records for Canadian speed skaters. Each
performance record is identified by a number, and it must clearly identify a skater, the
competition the skater was in, and the skater’s finishing position and time scored.
SSC also maintains skater details, such as name, gender, and age. Each skater has a number of
performance records.
In addition, SSC records competitions by name, date, weather conditions, and the number of
competitors. Each competition generates a number of performance records.
Create a schema for SSC’s speed skaters performance tracking. Do not forget to specify primary
and foreign keys!
9