Entity-relationship model Introduction So far in the course, we have studied a lot about extracting useful information from an already-existing database. However, we have only barely brushed up against getting data into a database. The next segment of this course will focus on this latter problem: how to design a useful database schema, and how to load (and alter) data in a database. One of the most important steps in creating a database happens before ever touching a database engine: how to model the slice of the real world that the database will eventually represent. A good database design balances detail and simplicity, providing sufficient detail to answer the kinds of questions that will be asked while providing a simplified view of the world that makes those questions reasonably easy to ask and the answers reasonably easy to interpret. To give a concrete example, suppose we had been tasked to create a database back-end for a CAD (computer-aided design) tool that specializes in the automotive industry. Clearly, the kinds of “car database” examples we’ve used in class would be woefully inadequate: a CAD engineer will not care in the slightest about VIN numbers and license plates (other than where they will be found in the final product); the engineer will care very much about the engine and wheels, but will demand far more detail than anything you would find in a car lot advertisement. At the other extreme, a moleculeaccurate model of the car is also not useful to a car designer: nobody wants to manipulate individual molecules and atoms when designing something you can see with the naked eye! Instead, a useful database would probably store a collection of polygons in 3-D space, with adequate support to find and group polygons in various ways (by the part they correspond to, the location relative to a virtual camera’s field of view, etc.). Since very few database designers are CAD experts, any real database design for our imaginary application would have to start by exploring with tool developers what, precisely, they need the database to provide for them; similarly, since the CAD tool developers are not database experts, they would also need some guidance in expressing what they want in terms of what a database engine can possibly do (we really can’t do 3-D rendering, for example). In practice, this early design process is iterative, and proposed designs can change rapidly as people explore different options. Some form of visual representation, or model, of the problem is extremely useful to facilitate the design process and allow participants to express their thoughts clearly; we will focus on a particular model known as the Entity-Relationship Model (ERM). Entity-relationship model The ER model consists of three basic concepts, illustrated in two different styles below: name title Actors address • • • StarsIn Movies role year Entity set [rectangle] – represents a type of object we wish to model, such as a “customers,” “retailers” or “landmarks.” Each entity set can contain zero or more entities; “Canadian Tire” and “Tim Hortons” might be entities from the entity set “Retailers.” Entities almost always correspond to nouns in plain English descriptions of a design. 1 Relationship set [diamond] – represents a type of relationship or link between two or more entity sets. Relationship sets often correspond to verbs (e.g. entities doing something to other entities), but sometimes as nouns as well; many relationships are common enough—and important enough—that they have their own name. Examples include “friendship,” “rivalry,” and “marriage.” Note that such relationship nouns can usually be converted into verbs, such as “friends with” or “married to.” Attribute [oval] – represents a piece of information or detail about an entity or relationship. For example, color is an attribute of many objects, while starting and ending dates are common attributes of relationships. Attributes are often associated with adjectives and quantities in plain English: colors (red, green, blue, etc.) or salary ($10,000, $50,000, etc.) are good examples. As you can see from the above descriptions, there is some overlap between the concepts: some relationships are almost important enough to be entities in their own right; the same is true for some types of attributes, such as addresses. This flexibility is good because it allows different designs to model the real world in different ways and at different levels of detail. We will come back to this issue later on. Entity vs. entity set One point to be careful of: an entity set is a class of things, while an entity is one object of the class. For example, “Batman Returns" would be an entity from the entity set “movies.” Similarly, CSCC43 would be an entity from the entity set "courses." A similar distinction exists for relationships vs. relationship sets; Romeo and Juliet as well as Anthony and Cleopatra would both be relationships from the relationship set "ill-fated romances." In relational algebra these would correspond to relations and tuples, while in object oriented programming these would correspond to classes and objects or instances. Recursive relationships 1 The converse is not always true: not all nouns should be represented as Entities! Many relationship sets join two or more entity sets, but some join the same entity set twice. These are known as recursive relationships. In some relationships the two entities involved are symmetric, such as friends; in other cases the relationship is asymmetric, such as employee and boss. In the last example, both employee and boss are people, but there's a big difference between them as far as the relationship is concerned. Another example is shown on the right: two employees can be colleagues, and we don't usually distinguish between the two. However in the case of sovereign succession, it matters very much was the predecessor and two is the successor. N-way relationships Many relationships involve more than two entity sets. Roles For example, we may need to model actors in movies using a ternary relationship if we allow actors to play multiple roles. A binary relationship between actors and movies may not suffice, especially if the roles being Actors StarsIn Movies played span multiple movies and have been promoted to proper entities. As another example, each line item from the TPC-H schema involves an order, a part, and a supplier. In practice, it's rare to see anything more complicated than three-way relationships, but in theory any number of entity sets can be involved in a relationship set. Attributes Attributes describe elementary properties of entities or relationships (e.g., Surname, Salary and Age are attributes of Employee, as shown on the right). Attributes are used to describe details, aspects of the relationship or entity set that are not important enough to be entities their own right. For example, most of the time we don't care to track age 10 vs. age 12 separately. In the student example shown here, both entities and relationships have attributes. Note that the set of attributes we choose to add or model depends strongly on the application. Under some circumstances it might be very relevant to store the student's name along with their student number and enrolment date. Similarly, if all courses always had exactly one exam, it might not be very important to track the date the exam is given. Sometimes, it's attractive to model groups of related attributes as composite values. The example of the right groups the different parts of an address into a single composite attribute. It might also make sense to group surname and given name into a single composite name field. You should always be careful about using composite attributes, however, because they can also be a warning sign that you're trying to represent an entity set with attributes. Many databases track addresses as entities in their own right, for example. In the end, it's up to the database designer to decide whether the composite attribute is important enough to promote to an entity set. Putting it together The figure above gives a more complete example of using entities, relationships, and attributes to model a company hierarchy. Observe that employees and departments can be involved in different kinds of relationships: all employees are presumably members of a department, and some employees also manage departments. Constraints What important purpose of modeling a database is to identify the constraints that the database should impose on the data stored in it. The most common constraints involve cardinality: how many times a given entity may, or must, participate in a given a relationship. In the previous example of a company hierarchy, it would be reasonable to enforce that every employee must belong to one department, and no more than one department. It might also be reasonable to enforce that no employee may manage more than one department, while allowing most employees not to manage any departments. We usually represent cardinality constraints using a pair of numbers in parentheses: (m,n). In this case, m represents the minimum cardinality, and n represents the maximum cardinality. If M is zero, the relationship is optional, in the sense that an entity need not participate. If M is one or greater, the entity must participate at least that many times. N gives a maximum cardinality, and prevents an entity from participating too many times. In the example to the right, an order can participate in at most one sale, and every invoice is tied to a sale. Cardinality restrictions apply to the entities, not the relationship itself: by definition every sale involves exactly one order and one invoice. The only question is whether orders and invoices participate in sales. It's up to the database designer to decide what cardinalities make sense. It probably doesn't make sense to have a city with no inhabitants residing in it, for example, but a database engine would be happy to enforce the constraint if we asked it to. Technically, it is possible to have multi valued attributes, or attributes which can occur more than once in an entity or relationship. It's entirely possible for students to have more than one e-mail address, for example. In general, you should avoid using multi valued attributes, and convert them into entities instead, because the relational can't express multi valued attributes. Classes address The ER model provides a powerful feature similar to object oriented People name languages, where a given entity set can be refined or extended with subclasses. Subclasses in the ER model are not exactly the same as in the isa isa object oriented model, however. In object oriented languages, a given object is always a member of exactly one class, while the ER model Actors Directors permits a given entity to be a member of multiple classes simultaneously. An "actor" entity is also a "people" entity, and is allowed to be a "director" entity as well. In fact, there is no need for the different classes to even form a hierarchy. If a movie database needs to track animal actors, for example, it would make sense not to have the actors class descend from people. That way, you can have an actor who is a dog as well as an actor who is a person. In the general case, it's best to think of classes in ER as auxiliary information that can be added to other entities. Someone can be an actor, a student, and a piano player, all at the same time, and the attributes of all those classes will simply be unioned together. Try to express that in the object oriented model! Keys and a weak entity sets Just like in relational algebra, the ER model has a concept of keys. The key of an entity set is a set of attributes that uniquely identifies each possible entity in the set. The key of a relationship set is the union of keys from entities participating in the relationship. name number Players name PlaysFor (1, 1) (0, N) Teams Sometimes, the attributes of an entity are not enough to distinguish it from other name entities. For example, a player on a sports (1, 1) team is normally identified by jersey In (0, N) number, but nothing prevents two players Subdomain Subdomains on different teams from having the same (1, 1) name jersey numbers. It's actually the In combination of jersey number and team Domain Domains name that uniquely identifies a player. In (0, N) this case, we would call the players entity set a "weak entity set" because the jersey number only weakly identifies the entity. Weak entity sets can be chained: a weak entity to be related to yet another weak entity and so forth. In the example to the left, knowing the host name of a machine on a network does very little to identify that machine globally, unless we include information about subdomain and domain. In general, we can chain together any number of weak entities, as long as we do not form any cycles among them. As we will see later, there's a strong relationship between weak entity sets in ER and foreign keys in relational algebra. Hosts name
© Copyright 2026 Paperzz