Chapter 9: Normalization • Part 1: A Simple Example • Part 2: Another Example & The Formal Stuff A Problem: Keeping Track of Invoices (cont’d) Suppose we have some invoices that we may or may not want to refer to later… 1 A Problem: Keeping Track of Invoices (cont’d) Fig. 9.1 Could store in an excel file but, as seen, might have problems if have complex questions relating to the data: 1. How many 4” bolts did Frankenstein Parts order in 2002? 2. What items were sold on a certain date? Solution: A Normalized Database • First Normal Form (NF1): No Repeating Elements or Groups of Elements • In Fig. Fig 9.1, 9 1 rows 2 2, 3, 3 4 represent invoice 125, 125 which in DB terms is a single tuple • In NF1 want to get rid of repeating elements, which are: – – – – column H2 to H4, column J2 to J4, column K2 to K4 etc these contain lists of values, and these are hated by NF1 NF1 wants atomicity: each attribute is simple & indivisible the repeating data for invoice 125 is cells: H2-M2, H3-M3, H4M4 • Can satisfy NF1, simply by separating each item in these lists into its own row (See Fig. 9.2). 2 Solution: NF1 Cont’d Fig. 9.2 But, were trying to reduce & simplify, now have introduced more data! No matter, this will be addressed later (with NF3) Solution: NF1 Cont’d • Have only done half of NF1. NF1 addresses: 1. 2. Row of data can’t have repeat groups of similar data (atomicity) 9 Each row of data must have a unique identifier (or Primary Key) • In order to look at 2., have to convert Fig 9.2 into a RDBMS (see the orders table in MS Access Fig. 9.3) Fig. 9.3 • As can be seen, no one column ids each row, so have to use two together: order_id & item_id • Together the concatenated primary key ids each row 3 Solution: NF1 Cont’d • The underlying structure of the orders table can be represented as Fig. 9.4 • Identify Id tif the th columns l th thatt make k up the th key with the PK notation. • Fig. 9.4 begins the Entity Relationship Diagram (or ERD). • DB schema now satisfies the 2 requirements of NF1: atomicity & uniqueness. Thus it meets the most basic criteria of a relational db. Fig. 9.4 orders order_id(PK) order_date customer_id customer_name customer_address customer_city customer_state item_id(PK) item_description item_qty item_price item total price item_total_price order_total_price primary i Solution: NF2 • Second Normal Form (NF2): No Partial Dependencies on a Concatenated Key • Next have to test each table for partial dependencies on a concatenated key • Means that for a table with a concatenated primary key, each column that is not part of the primary key must depend upon the entire concatenated key for its existence. • If a column depends upon only 1 part of the concatenated key, then entire table has failed NF2 & must create another table to fix it. • For each column must ask the question: q – Can this column exist without one or the other part of the concatenated primary key? – If answer is “yes” – even once – table fails NF2 4 Solution: NF2 Cont’d • Refer to Fig. 9.4 again to recall orders table structure. orders • Recall the meaning of the two columns order_id(PK) Fig. 9.4 in the primary key: order_date – order_id ids invoice this item comes from. – item_id is the inventory items unique identifier. Can think of it as a part number. • Don't analyze these columns (since they are part of the primary key). • Instead consider the remaining columns... columns customer_id customer_name customer_address customer_city customer_state item_id(PK) item_description item_qty item_price item total price item_total_price order_total_price Solution: NF2 Cont’d • order_date is the date on which the order was made. – relies on order_id; an order date has to have an order, otherwise it is only a date – can an order date exist without an item_id? yes: order_date relies li on order_id, d id nott item_id it id (a ( specific ifi order d doesn’t d ’t have h to have a specific item) – so order_date fails NF28 • customer_id is ID of the customer who placed the order – does it rely on order_id? No: a customer can exist without placing any orders. – does it rely on item_id? item id? No (same reason). reason) – customer_id does not rely on either member of the PK – What to do? NF3 will come to the rescue here, hence ? for all the rest of the customer_* columns 5 Solution: NF2 Cont’d • item_description is next column not itself part of PK. It is the plain-language description of the inventory item. – relies on item_id, but can it exist without an order_id? – Yes! An inventory item (& (&"description") description ) could sit on a shelf, shelf and never be purchased... It can exist independent of an order. – item_description fails the test. 8 • item_qty is no. of items purchased on a particular invoice. – can it exist without an item_id? No: cant have "amount of nothing" g – can it exist without an order_id? No: a quantity purchased with an invoice is meaningless without an invoice. – So this column does not violate NF2 – item_qty depends on both parts of our concatenated PK.9 Solution: NF2 Cont’d • item_price is similar to item_description. It depends on the item_id but not on order_id, so it does violate NF2. 8 • item_total_price is tricky: – seems to depend on both order_id & item_id, so passes NF2. – but it is a derived value: it is item_qty times item_price. – so, in fact, it doesn’t belong in the db at all. – can easily be reconstructed outside of db; to include it would be redundant (and could quite possibly introduce corruption). – therefore can discard it • order_total_price the th sum off all ll the th item_total_price fields for a particular order, is another derived value. – can discard this field too for the same reason as item_total_price 6 Solution: NF2 Cont’d Fig. 9.4 orders order_id(PK) order_date customer_id customer_name customer_address customer_city customer_state item_id(PK) item_description item qty item_qty item_price item_total_price order_total_price Fig. 9.4 (New) orders order_id(PK) order_date 8 customer_id customer_name customer_address ? customer_city customer_state item_id(PK) item_description 8 item qty9 item_qty item_price8 item_total_price order_total_price ? ? ? ? Solution: NF2 Cont’d • What to do with a table that fails NF2, as this one has? – First take out the second half of the concatenated PK (item_id) &p put it in its own table. – All columns that depend on item_id - whether in whole or in part - follow it into the new table, order_items (see Fig. 9.5). – The other fields — those that rely on just the first half of the PK (order_id) and those we aren't sure about — stay where they are. orders Fig. 9.5 order_id(PK) order_date customer_id customer_name customer_address customer_city customer_state order_items order_id(PK) item_id(PK) item_description item_qty item_price 7 Solution: NF2 Cont’d • things to notice abut Fig. 9.5: 1. have brought a copy of order_id to the order_items table to allow each order_item to "remember" which order it is a part of. 2 orders 2. d t bl h table has fewer f rows than th before b f & no longer l has h a concatenated PK. PK consists of a single column, order_id. 3. order_items table does have a concatenated primary key. • Crows feet mean in Fig. 9.5: – – each order can be associated with any number of order-items, but at least one; each order-item is associated with one order, and only one. orders Fig. 9.5 order_id(PK) order_date customer_id customer_name customer_address customer_city customer_state order_items order_id(PK) item_id(PK) item_description item_qty item_price Solution: NF2 Phase II • Remember, NF2 only applies to tables with a concatenated PK. Now orders has a single-column PK, it has passed NF2. • order_items, however, still has a concatenated PK. – have to pass it thro NF2 analysis again to see if it passes. – ask the same question we did before: – Can this column exist without one or the other part of the concatenated PK? Fig. 9.6 • Fig. 9.6 shows order_items table structure. order_items • item_description relies on item_id, but order_id(PK) d id(PK) not order_id, so this again fails NF28 item_id(PK) item_description • item_qty relies on both parts of PK, item_qty does not violate NF2 9 item_price • item_price relies on item_id but not on order_id, so it does violate NF2 8 8 Solution: NF2 Phase II Cont’d Fig. 9.6 order_items Fig. 9.6 (New) order_id(PK) item_id(PK) item_description p item_qty item_price order_items order_id(PK) item_id(PK) item description 8 item_description item_qty9 item_price 8 • On first pass thro NF2 test, lost all fields relying on item_id & put them into new table. This time, only taking fields failing the test: ie item_qty stays. What's different this time? • First p pass,, removed item_id key y from orders altogether g cos of the 1:M relationship between orders & order-items. – Therefore item_qty field had to follow item_id into the new table. • Second pass, item_id wasn’t taken from order-items table cos of the M:1 relationship between order-items & items. – Therefore, since item_qty does not violate NF2 this time, it is permitted to stay in the table with the two PK parts that it relies on. Solution: NF2 Phase II Cont’d • Crows feet mean in Fig. 9.7: – each item can be associated with any number of lines on any number of invoices, including zero; – each order-item is associated with one item, and only one. – These two lines are examples of 1:M relationships. • This h 3-table 3 bl structure, is h how express a M:N relationship: l h – Each order can have many items; each item can belong to many orders. • Notes: – Didn’t bring a copy of order_id column into new table cos individual items needn’t know the orders they are part of, as order_items remembers this r’ship via the order_id & item_id columns. Taken together these columns comprise the PK of order_items, order items, but taken separately they are FKs to rows in other tables. – New table does not have a concatenated PK, so it passes NF2. orders Fig. 9.7 order_id(PK) order_date customer_id customer_name customer_address customer_city customer_state order_items order_id(PK) item_id(PK) item_qty items item_id(PK) item_description item_price 9 Solution: NF3 • Third Normal Form (NF3): No Dependencies on Non-Key Attributes • Can return to repeating Customer info problem. As db stands, if customer places >1 order have to input customer customer's s contact info again cos there are columns in orders that rely on "non-key attributes". • To understand this, consider order_date. Can it exist independent of order_id? – No!: an "order date" is meaningless without an order. – order_date depends on a key attribute (order_id is "key attribute" because it is table’s PK). • What about customer_name — can it exist on its own, outside of the orders table? – Yes. It is meaningful to talk about a customer name without referring to an order or invoice. Solution: NF3 Cont’d • Same goes for customer_address, customer_city, & customer_state. These 4 columns actually rely on customer_id, which is not a key in this table (it is a non-key attribute). • These fields belong in their own table customers, customers with customer_id as PK (see Fig 9.8). • However, notice in Fig 9.8 that relationship has been severed btw orders table and the Customer data that used to inhabit it. orders order_id(PK) customer_id(FK) order_date order_items order_id(PK) item_id(PK) item_qty items item_id(PK) item_description item_price Fig. 9.8 customers customer_id(PK) customer_name customer_address customer_city customer_state 10 Solution: NF3 Cont’d • Restore relationship by creating a foreign key (indicated by (FK)) in orders – As know, FK is a column that points to the PK in another table. – Fig 9.9 describes this relationship, and shows our completed ERD. • Relationship between orders & customers may be expressed in this way: – each order is made by one, and only one customer; – each customer can make any number of orders, including zero order_items orders items order_id(PK) item_id(PK) item_qty order_id(PK) customer_id(FK) order_date item_id(PK) item_description item_price Fig. 9.9 customers customer_id(PK) customer_name customer_address customer_city customer_state Solution: NF3 Cont’d • Last point to note: – order_id and item_id columns in order_items perform a dual purpose: not only do they function as the (concatenated) PK for order_items d it , they th also l individually i di id ll serve as FKs FK to t the th orders d table and items table respectively. – This is shown in Fig. 9.10 orders order_id(PK) customer_id(FK) order_date order_items order_id(FK) item_id(FK) item_qty items PK item_id(PK) item_description item_price Fig. 9.10 customers customer_id(PK) customer_name customer_address customer_city customer_state 11 Normalisation cont’d Introduction to Database Design • As we have seen, an important part of database design is deciding on a suitable logical structure or schema to implement ... called database design design. SP • Considering supplier parts example (S,P,SP) S there is a feeling of correctness. • Normalisation theory is a P formalism of simple ideas with a practical application in logical database schema design. • Normalisation theory should allow us to recognise relations with undesirable properties, tell us what is "wrong" & how to "correct" it. S# P# QTY S1 P1 300 S# SName Status City S1 P2 200 S1 Smith 20 Paris S1 P3 400 S2 Jones 10 Paris S2 P1 300 S3 Blake 30 Rome S2 P2 400 S3 P2 200 P# PName Colour Weight City P1 Nut Red 12 London P2 Bolt Green 17 P3 Screw Blue 27 Rome P4 Screw Red 14 London Paris 12 Intro to Database Design Cont’d • Normalisation theory is built around normal forms - each normal form has a set of satisfiable criteria. • Normal forms exist in a hierarchy: – 1NF -> 2NF -> 3NF -> BCNF -> 4NF -> PJ/NF (5NF) • • • • Codd defined 1NF, 2NF, 3NF in 1972. 3NF had inadequacies so revised in ‘74 by Boyce/Codd (BCNF). 1977 Fagin defined 4NF, 1979 defined 5NF. 6NF,7NF ?... dependencies theory suggests there may be higher NFs but not practicable in database environment. • DB designers should aim for higher NFs but this is not law - just recommended as normalisation simply provides guidelines for database design. • There are often good reason for not using normalisation theory. Introduction to Database Design Cont’d • In order to describe the various normal forms we must first introduce some definitions: • Functional Dependency – Given relation R, attribute Y of R is functionally dependent on X of R, R.X -> R.Y, or R.X functionally determines R.Y ... – ... iff each R.X value has associated with it precisely one R.Y value, where X and/or Y may be composite. – R.X called the determinant, R.Y called the dependent • S.SNAME, S.STATUS and S.CITY are each functionally dependent on S.S# S S# • If R.X is a candidate key or if R.X is the primary key, then all R.Y must be functionally dependent on R.X • In SP we have a composite primary key so SP.(S#,P#) -> SP.QTY 13 Introduction to Database Design Cont’d • There is no requirement in the definition of functional dependence that R.X be a candidate key, thus: R.X -> R.Y iff whenever 2 tuples of R.X are the same then the corresponding R.Y values are also the same. – R.Y is fully functionally dependent on R.X …. – …. iff it is functionally dependent on R.X & not fully functionally dependent on any subset of R.X – Example: S.(S#,STATUS) -> S.CITY is true but not full functional dependence as S.S# -> > S.CITY – If R.X -> R.Y but not fully then R.X must be composite Normalisation: Example 2 • Given the report in Fig 9.11, need to put it in a tidy DB. • Problems with current form: – PROJ_NUM is supposed to be PK or part of PK but contains nulls. Maybe aybe PROJ OJ_NUM+EMP U _NUM U will de define e eac each row. o – The table entries contain inconsistencies (e.g. JOB_CLASS “Elect. Engineer” could be “EE” or “E. Eng” or others) Fig. 9.11 14 Normalisation: Example 2 Cont’d • Further problems with current form: – The table has data redundancies leading to the following anomalies: 1. Update Anomalies: Modifying (e.g.) JOB_CLASS for Employee 105 requires lots of alterations (one for each employee 105). 2. Insertion Anomalies: To complete a row definition, a new employee must be given a project; if not yet assigned, this must be assumed to complete the employee tuple. 3. Deletion Anomalies: If employee 103 quits, every row with EMP_NUM=103 must be deleted with the potential loss of other data. – Inefficiency: If a large number of new employees are hired, a l t off redundant/unassigned lot d d t/ i dd data t mustt b be assumed d and d input. i t – Integrity: Possible data integrity problems may arise out of the above. Example 2: Conversion to NF1 • So… Problems with Fig. 9.11: – Data cannot be as shown in Fig. 9.11 cos have to be able to identify all tuples with a PK. – PROJ_NUM cannot be PK in Fig. 9.11 cos of nulls – Cannot have the repeating groups shown in Fig. Fig 9.11 9 11 so have to alter table to remove them. • Step 1. Eliminate the repeating groups – Eliminate the null values. – Now have Fig. 9.12 Fig. 9.12 15 Example 2: Conversion to NF1 Cont’d • Step 2. Identify the Primary Key – Layout in Fig. 9.12 is only a cosmetic change – need a PK to uniquely identify all tuples. – This may be seen to be PROJ_NUM+EMP_NUM • Step 3. Identify all dependencies – The identification of the PK means already have the following: PROJ_NUM,EMP_NUM PROJ_NAME,EMP_NAME,JOB_CLASS,CHG_HOUR, HOURS Fig. 8.12 Example 2: Conversion to NF1 Cont’d • Step 3. Cont’d – But there are additional dependencies: 1. The project number determines the project name: PROJ_NUM PROJ_NAME 2. If know employee number, also know their name, job classification and their charge per hour: EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOURS 3. Also knowing job classification means also know the charge per hour: JOB_CLASS CHG_HOURS – These dependencies are shown in the Dependency Diagram in Fig. 9.13 – Dependency Diagrams are useful for getting an overall view of relationships among attributes. Normal Fig. 9.13 PROJ_ NUM PROJ_ NAME EMP_ NUM EMP_ NAME JOB_ CLASS CHG_ HOUR Partial HOURS Transitive 16 Example 2: Conversion to NF1 Cont’d • Looking at Fig. 9.13, can see that: 1. PK attributes are bold, underlined and a different colour. 2. Arrows above (blue) denote desirable FDs (those based on PK) 3. Arrows below the diagram (red and green) are less desirable: a) Partial Dependencies: dependencies based on part of composite PK – Need only know PROJ_NUM to know PROJ_NAME, so PROJ_NAME is only dependent on part of the PK. – Need only know EMP_NUM to find the EMP_NAME, JOB_CLASS, CHG_HOUR. b) Transitive Dependencies: Dependency of 1 non-prime attribute on another – From Fig. 9.13, can see that CHG_HOUR is dependent on JOB_CLASS – Neither of these is part of PK (i.e. a Prime Attribute). Normal Fig. 9.13 PROJ_ NUM PROJ_ NAME EMP_ NUM EMP_ NAME JOB_ CLASS CHG_ HOUR Partial HOURS Transitive Example 2: Conversion to NF1 Cont’d • Properties of NF1: A table in NF1 must have: 1. All key attributes defined 1 2. No repeating groups in the table (i.e each row/column entry must have only one value) • Problem with Fig. 9.13 is the partial dependencies. • This can be eliminated with NF2 17 Example 2: Conversion to NF2 • Step 1. Identify all key components: PROJ_NUM EMP_NUM PROJ NUM EMP PROJ_NUM, EMP_NUM NUM – Each component becomes the key of a new table. – Three new tables project, employee, assign • Step 2. Identify the dependent attributes – Use Fig. 9.13 to determine which attributes are dependent on which others, using the arrows in the dependency diagram project(PROJ NUM PROJ_NAME) project(PROJ_NUM, PROJ NAME) employee(EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOURS) assign(PROJ_NUM, EMP_NUM, ASSIGN_HOURS) – Results are shown in Fig. 9.14 Example 2: Conversion to NF2 Cont’d • At this point, most anomalies discussed above have been eliminated e.g. if want to add/change/delete a project record, only need to alter 1 row of project • So a table is in NF2 iff 1. It is in NF1 And 2. It has no partial dependencies (can still have transitive dependencies) • Fig. 9.14 still has a transitive dependency which can generate anomalies e.g. if charge per hour changes for a job classification held by many employees, that change must be made for all (leading to possible update anomalies) • Resolve transitive dependencies in NF3 PROJ_ NUM project PROJ_ NAME EMP_ NUM Fig. 9.14 EMP_ NAME JOB_ CLASS CHG_ HOUR employee PROJ_ NUM EMP_ NUM ASSIGN _HOURS assign 18 Example 2: Conversion to NF3 • Step 1. Identify each new determinant – For each transitive dependency, write its determinant as a PK for a new table (recall: determinant is any attribute whose value determines other values within a row). – If have 3 transitive dependencies, have 3 different determinants – Here only have one: JOB_CLASS • Step 2. Identify the dependent attributes – Identify the attributes dependent on each determinant identified in Step 1. Here, have JOB_CLASS CHG_HOUR – Name the table to reflect its contents & function, here JOB is ok • Step 3. Remove dependent attrib from transitive dependencies – Remove all dependent attributes from dependent relationship(s) from each table with transitive relationships – JOB_CLASS remains in the employee table as FK Example 2: Conversion to NF3 • Final dependency diagram is shown in Fig. 9.15 Fig. 9.15 PROJ_ NUM PROJ_ NAME project EMP_ NUM EMP_ NAME JOB_ CLASS employee JOB_ CLASS CHG_ HOUR job PROJ_ NUM EMP_ NUM ASSIGN _HOURS assign • Or 4 Tables: project(PROJ_NUM, PROJ_NAME) assign(EMP_NUM, PROJ_NUM, ASSIGN_HOURS) employee(EMP NUM EMP_NAME, employee(EMP_NUM, EMP NAME JOB_CLASS) JOB CLASS) job(JOB_CLASS, CHG_HOUR) • A table is in NF3 iff – It is in NF2 And – It contains no transitive dependencies. 19
© Copyright 2025 Paperzz