Data Management: Databases and Organizations Richard Watson Summary of Chapters 3-6 prepared by Kirk Scott 1 Data Modeling and SQL • • • • Chapter 3. The Single Entity Chapter 4. The One-to-Many Relationship Chapter 5. The Many-to-Many Relationship Chapter 6. One-to-One and Recursive Relationships 2 Introduction • Large parts of these overheads will be somewhat repetitive • They cover in general terms some of the things that were specifically illustrated by concrete SQL examples • However, the repetition shouldn’t be harmful • It should put the examples into a broader context, and add new examples to flesh the ideas out • The ultimate goal is for the basic concepts and diagramming to be clear so that there will be no trouble considering design questions in unit 14 3 Chapter 3. The Single Entity • The author starts with the entity relationship diagramming conventions and the concept of a single entity • The author represents an entity with a box containing its name in capital letters inside, at the top • Full field names are given after that in small letters • The primary key field is marked with an asterisk 4 5 6 7 8 9 • Different diagramming conventions are perfectly acceptable, as long as you are consistent • The name of the entity may be given above the box representing it • You may choose to capitalize just the first letter of the name 10 • In theory, you could qualify field names, although this would be redundant, given the entity name at the top • You could also use short names for fields if space is at a premium • Primary keys could be marked with pk or underlined 11 Chapter 4. The One-to-Many Relationship • The author uses the crow’s foot to mark a one-to-many relationship in an ER diagram • In a simple ER diagram fields may not be listed, just entity names and crow’s feet • In a more complete diagram, fields can be listed 12 • The author does not include the embedded pk/fk in the list of fields in the fk/many table because it is redundant • I do not follow this convention • I believe that in the interests of clarity it is worthwhile to include the fk in the list of fields 13 The Book Doesn’t Show the Foreign Key Field; It’s Implicit 14 15 An ER Diagram Plus a New Form of Schema Diagram (with Explicit FK) 16 17 Chapter 5. The Many-to-Many Relationship • As is known, the many-to-many relationship is the most “complicated” of the relationships • The book presents some interesting examples that arise in real situations • They illustrate ideas that are not immediately apparent from the examples that have gone before • The first example is based on a bill of sale, shown on the next overhead 18 The Bill of Sale Example: An Interesting Case of a pk/fk Relationship 19 • The book analyzes this situation as consisting of base entities which are a sale and the items which are sold • There is a many-to-many relationship between these base entities because each sale can consist of many items • Also, each item can be present in many sales • The book’s ER for this analysis is shown on the next overhead 20 + Means saleno is part of the primary key of LINEITEM 21 • LINEITEM contains every line of every (bill of) sale 22 itemno identifies a kind of item, not an individual item 23 • A given kind of item may appear on many different sales 24 itemno is not part of the primary key of LINEITEM 25 • A given item may appear on many different lines of a given sale • Practically speaking, more of a given item can be added to a sale by adding a new line to LINEITEM rather than modifying an existing LINEITEM 26 • LINEITEM is not a pk/fk-pk/fk table in the middle • This is a practical solution to a real-world data modeling problem • It is not a theoretically minimalist representation of relationships 27 • When first introducing many-to-many relationships, I referred to the table in the middle • More formally, the book refers to an associative entity • The associative entity is the table in the middle that captures the relationship between two base entities 28 • In the ER notation for this example the + sign is used • This has not been seen before • For the purposes of understanding the book’s example, it is important to know what this means 29 • The + sign is shown over a crow’s foot • It symbolizes the fact that the embedded fk is part of the pk of the table it’s embedded in • You have seen an example of a table in the middle where the pk is the concatenation of the two embedded fk’s • This example is not the same as that 30 • In this example the saleno is the pk of the Sale table • It is embedded as a fk in the Lineitem table • A saleno value will appear in the Lineitem table as many times as there are separate lines belonging to the sale • These separate lines are identified by lineno’s • The lineno’s are not embedded fk’s based on the unique identifiers, itemno’s, of entries in the Item table 31 • An alternative way of representing the relationship would be to list the fields of the table in the middle this way: • saleno pk, fk • lineno pk • itemno fk • lineqty • lineprice • Note again that the saleno is both a pk and a fk, while the lineno is purely pk 32 • At first glance it may seem a little strange, but the table in the middle contains every line of every sale, listed separately • It is the saleno and the lineno together which uniquely identify the entries in the Lineitem table • This model actually reflects reality well • It differs, in particular, from the car sale example 33 • In the car sale example, there were individual cars that were sold • In the example database they were only shown as being sold once • In reality, the same car might be sold more than once • This could be modeled by making the salesdate part of the pk of the Carsale table 34 • In the Sale, Lineitem, Item example, the items are not actually individual items • An item is a kind of item, like a screw or a shovel or a microwave oven • The seller may have many of each kind of item in stock and doesn’t distinguish between individual items 35 • Multiple instances of the same (kind of) item may be sold to the same customer • Also, the same (kind of) item can be sold to more than one customer • It’s not incredibly difficult, but it’s worth emphasizing that the itemno does appear in the table in the middle as a fk • This tells which item that line of a sale was in reference to • However, the itemno is not part of the pk of the table in the middle 36 • In a perfect world, you might argue that each item should appear on only one line of a sale • If so, then you could dispense with individual line numbers and use the itemno as part of the pk instead • However, reality makes the given solution better 37 You Want to Support Customer Decisions in Mid-Stream • When creating a data model, it should be flexible and accommodate all possibilities • Could a customer, in the middle of making a purchase, decide that more instances of a certain item were desired? • If so, do you allow this, and how do you support it? 38 • From a business point of view, few things are more destructive than a computer system whose model imposes artificial constraints on the user (seller and customer) • Of course, if a customer decides that more instances are desired you want to sell them 39 • Have you ever heard things like these: • “I’d like to let you buy more, but the computer won’t allow it.” • “I’d like to let you buy more, but it will be necessary to start a completely new bill of sale.” • “I’d like to let you buy more, but it will be necessary to go back and modify the earlier line of the sale for that item.” 40 • In any of the previous scenarios, both the customer and the salesperson want to scream • The best scenario would go like this: • “Oh, you want 20 instead of 10? We’ll just add another line here at the bottom for another 10.” • Now everybody sighs with satisfaction… 41 The Set and Logical Operators in SQL Form an Algebra • SQL has operators like AND, OR, NOT • Similarly, there are set operators like UNION • Although Microsoft Access SQL doesn’t support INTERSECT, some implementations do • Taken together, these elements form the basis for an algebra 42 The Cartesian Product is an Algebraic Product, Which has an Inverse • The Cartesian product represents a form of multiplication for relations • The results of a join operation are a subset of the results of a product • In an algebraic system, the existence of a multiplication operation implies the existence of a division operation 43 For All/Double Not Exists Accomplish Relational Division (Invert the Product) • As pointed out when doing the concrete SQL examples, there is no FOR ALL operator • However, double NOT EXISTS can accomplish the same thing • FOR ALL/double NOT EXISTS is roughly analogous to division in a relational system • Before we’re finished with SQL we will see queries which are actually stated in terms of division 44 • This is the point where Watson takes up the case of double not exists • The book shows a ER diagram of 3 tables capturing a many-to-many relationship • This diagram is labeled generically, but it is of the same structure as the Lineitem example 45 • It then outlines the double NOT EXISTS query that could be written for it • The fact that this models the Lineitem example is not important • The table in the middle could have a completely concatenated primary key • It could also have its own, separate primary key 46 In a Three-Way Relationship, the Tables are: Target, Target-Source, and Source • The important point is that the base tables are at the ends of the ER diagram • The book refers to these as target and source, respectively • The table in the middle, the associative entity, is labeled Target-Source by the book 47 A Diagram and Query for a LINEITEMLike Design 48 The Order of the Tables in the Query • If you want to find those rows of the target which are in relation to all of the rows of the source, • Then in the double NOT EXISTS query: – The target appears first, in the outermost query – The source appears second, in the middle, in the first nested subquery – And the table in the middle appears last, in the second nested subquery 49 Translation of the Query: Find the Sales that Included Every Item • If the table in the middle were a Cartesian product, it would match every sale with every item • The table in the middle isn’t necessarily the Cartesian product • The query will find only those sales which were matched with every item 50 Remember, this Example has been a Review of a (Simple) Many-to-Many Relationship • The next example will illustrate the inclusion of more relationships in a design 51 A Design with a Cycle • The next diagram illustrates a design containing a cycle • Such designs will become especially important when considering normalization, the theory of correctness in designs • For the time being simply note that there is nothing preventing designs with cycles 52 Two Many-to-Many Relationships with Others… 53 A Concatenated Key with Date • The next example design is one where both of the embedded foreign keys are part of the primary key of a table in the middle • However, it is more complicated than that because a date field is also included in the primary key • This allows the same pair of base values to be paired with each other more than once 54 Each Customer and Magazine Can Be Paired with Each Other More Than Once 55 A Simple Concatenated Key • The next design is actually somewhat simpler • It also has two embedded pk/fk’s in the table in the middle • The table in the middle isn’t pure key though • There is also a non-key attribute field for the table in the middle 56 This is Actually Simpler: One Gift per Donor per Year 57 The Music CD Library: A Larger Database Design Example • Some of the design examples given so far from chapters 3 and 4 could be parts of a simple database for a collection of music CD’s • At the end of chapter 5, with the capability to model many-to-many relationships, the authors expand this example 58 • On the next overhead an 8 entity design is shown • Note that 4 of the 8 entities can be classified as associative entities • These are the entities: CD, Composition, Label, Person, Person-CD, PersonComposition, Person-Track, Track 59 • You can say you understand the model if: • You can define what each entity means • You can define what each relationship means 60 61 Extending the Model to Match Reality Better (This Could Make You Dizzy) • The next overhead shows the music CD design blossoming further • The Person-Track table has been removed • Recording and Person-Recording tables have been added • In the book, the new relationships are analyzed • I will not list the analysis here • The new design reflects additional assumptions and capabilities • The new design should be a better model of reality, with fewer exceptions and more flexibility 62 63 Chapter 6. One-to-One and Recursive Relationships • What one-to-one relationships are should be clear • The book uses the term recursive relationship for those cases where a table is in a relationship with itself 64 One-to-One Relationships • You may recall some of the different options for capturing one-to-one relationships • If this is truly one-to-one in all cases at all times, then this can be a single relation • Otherwise, you end up embedding the pk of one entity as a fk in another 65 Model 1-1 as pk Embedded as fk and Monitor Data Integrity • Maintaining this as a one-to-one relationship then becomes a question of data integrity • When choosing which pk to embed as a fk, you should take into consideration any possible exceptions or changes in the relationship in the future • The book has a number of examples which illustrate details of this concept 66 A Design Starting Point: A Diagram Which is Not ER • The book’s examples start with a company with a two level management hierarchy • There are bosses of departments and there is an overall managing director • The (non-ER) diagram on the following overhead illustrates this 67 Departments Have Managers; Managers Have a Boss 68 • Next the book shows an ER diagram illustrating that departments have employees and that departments have bosses • A garden variety crow’s foot doesn’t have to be labeled • A one-to-one relationships should be labeled 69 One-to-One Arcs Have to be Labeled; Which Way Does the Embedding Go? 70 • The foregoing diagram doesn’t explicitly show whether the pk of Dept is embedded as a fk in Emp or vice-versa • In this case it is likely that the pk of Emp is embedded as a fk in Dept • This is because, all else being equal, a department will have a boss • However, few employees will be bosses • There would be lots of nulls if there were a “department which you’re the boss of” field in Emp 71 A One-to-Many Recursive Relationship—A Table in a Relationship with Itself • Next, the book considers recording which employee is which other employee’s boss • This leads to what the book calls a recursive relationship • This is when there is a one-to-many relationship between a table and itself • Such a one-to-many relationship should be labeled because the meaning of the embedding would not necessarily be clear • An ER diagram illustrating this follows 72 Many Employees Have One Other Employee Who is Their Boss (The Relationship is Labeled) 73 Question 1: Is an Employee’s Boss the Boss of the Employee’s Department? • The previous design may not be ideal • If every employee is assigned to a department, it would seem that the employee’s boss would be the boss of that department • At first glance, at the very least, this appears to be redundant • Redundancy means that information is repeated, and it opens up the possibility of inconsistencies between the repeated representations of the same data 74 Question 2: Are Bosses Members of the Departments They’re Bosses of? • However, this is another problem that arises from real life • Ask yourself, what departments are the bosses of departments assigned to? • For example, if “Bob” is the head of Marketing and his department is listed as Marketing, is he his own boss? • It should be apparent that his boss is the managing director 75 The Design Contains Apparent Redundancies, but is Flexible • Another detail that might be considered is split assignments or temporary assignments • If an employee is split 50-50 between departments, who is their boss? • If an employee is only temporarily assigned to a department, who is their boss? • The apparently redundant design allows such cases to be handled with full flexibility 76 A One-to-One Recursive Relationship that Forms a Linked List • The next example the book pursues is a little artificial • However, something like it might arise in real life, and this provides an introduction to the idea • It is possible for there to be a one-to-one relationship between a table and itself 77 The Monarchs of England • The following overhead illustrates the idea with the succession of monarchs • The idea is that the pk of the monarch table is embedded as a fk in the table • Every monarch except the first has the previous monarch recorded • The problem could also be solved by simply recording a numbering for the monarchs 78 Just Assigning a Reign Number Might Be Clearer and Easier 79 A Many-to-Many Recursive Relationship • The next example considers a table in a manyto-many relationship with itself • This is another example drawn from real life which is very instructive about how relational databases work • It is helpful because it brings out one of the limitations of relational databases • It provides insight into the subject of objectoriented databases 80 • Whenever a table is in a relationship with itself, the book refers to this as a recursive relationship • As far as I’m concerned, the use of the term recursive is optional, although descriptive • I am just as happy in this context with saying “in a relationship with itself” • In any case, consider the ER diagram on the next overhead and the explanatory remarks that follow 81 This is Something Practical, Not Artificial 82 • The idea is that the Product table contains entries for stand-alone products (possible subproducts) and for products (super-products) that consist of collections of other products • Potentially the Product table might also contain things (sub-products) which themselves aren’t even individual products, but which only exist as components of finished products 83 • The Assembly table is the table which shows the relationship between products and subproducts (whether those sub-products have an independent existence or not) • Notice that both of the crows’ feet in the diagram have + signs on them • This means that the pk of an assembly is the concatenation of the embedded fk’s of a (super) product and a (sub) product 84 • In addition, the Assembly table has a quantity field, telling how many of the sub-product there are in the super-product • If you assume that this is just a two-level hierarchy with super-products and subproducts, things seem relatively clear • However, both from a database point of view and a real life point of view, there is no need for this restriction to apply 85 • There is no reason why a given product might not consist of several other (sub) products • Each of these (sub) products, in turn might be super-products consisting of other subproducts, and so on • Now the descriptiveness of the term recursion becomes apparent 86 • There is no theoretical limit on how deeply things might be related in this kind of “has-a” relationship • Practically speaking, the only limit is how many rows there are in the Product table • This last claim leads to one more observation 87 • Data integrity would require that no product be a super-product or sub-product of itself • Otherwise you would have a containment cycle • It seems apparent that in real life this shouldn’t occur 88 This is the Relational DB Way of Capturing a Tree-Like Set of Relationships • The product-assembly relationship crops up reasonably frequently in real life • If you think about it, what’s really being captured is a tree-like containment structure • Manufacturing is a problem domain where this is relevant 89 • Working from the top down, a car has various components, including doors • Doors may be made of a variety of panels, among other things • The panels may consist of various items, including screws • And so on, down the line 90 • The given relational design works, to a certain extent, but it has shortcomings • For example, it is not necessarily an easy way to understand or a natural way to envision tree-like relationships • In particular, consider what you know about SQL and what kind of query you might liked to execute against products and assemblies 91 An SQL Query Can Go One Level Down the Tree; It Can’t Go Arbitrary Levels Down • SQL is non-procedural • For a given product you could ask for all of its immediate sub-products or sub-assemblies • However, it would not be possible to form a query that would retrieve all of the constituent parts of a given product • SQL won’t allow you to travel “down the tree” 92 Object-Oriented Databases Are Inherently Tree-Like • It is these problems that led, at least in part, to the development of what are known as object-oriented databases • In essence, O-O databases are constructed around tree-like containment 93 Object-Oriented Databases Are Valuable, but They are a Niche Only • Although extremely useful in some problem domains, it is estimated that O-O db’s have about 5% of the commercial market • The remaining 95% is relational because relational db’s are applicable and convenient in so many other problem domains 94 The CD Music Library Again— • The chapter concludes with the latest version of the CD music library • It illustrates several points • Although the ER diagram is useful for getting the big picture, it’s becoming clear that without written text explaining the problem and the assumptions made, you haven’t completely and clearly documented what’s going on 95 • This example illustrates another point, which is also relevant to the final project • You might have thought that a CD music library was a pretty simple, toy application • Notice that it has grown to 13 tables, twice as many as you’re required to have for your project 96 • It is likely that before you’re finished with your project, you will be simplifying the problem you tackled so that you meet the minimum requirements without inviting too much trouble for yourself 97 • The previous version of the design had these tables: CD, Composition, Label, Person, Person-CD, Person-Composition, PersonRecording, Recording Track • This latest version has these tables added to it: Group, Group-CD, Group-Recording, Person-Group • The ER diagram is shown on the following overhead 98 99 The End 100
© Copyright 2026 Paperzz