Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~ Activity: Sample User Report FudgeCo Livery Driver Permissions Report Driver Driver Driver Driver ID # Name Chg/Hr Territories 101 Bill Melator $100.00 West, North, Central 102 Willie Work $75.00 South, East 103 Sal Ladd $75.00 Central, East, North 104 Carol Ling $100.00 Central, West Vehicle Lic Plate PPF673 PXK3D7T 445GH2 59DLLK PXK3D7T 663ETMP 667GM8 445GH2 59DLLK 667GM8 PPF673 59DLLK Vehicle Make Cadillac Chevy Lincon Lincon Chevy Chevy Audi Lincon Lincon Audi Cadillac Lincon Vehicle Model Escalde Tahoe Towncar Continental Tahoe Suburban A8 Towncar Continental A8 Escalde Continental Vehicle Size M L S S L L M S S M M S Vehicle Permission Chg/Hr Exp. Date $ 100.00 12/31/2004 $ 120.00 12/31/2004 $ 80.00 1/30/2005 $ 80.00 4/30/2005 $ 120.00 1/15/2005 $ 120.00 4/1/2005 $ 100.00 9/1/2005 $ 80.00 9/1/2005 $ 80.00 12/31/2004 $ 100.00 7/31/2004 $ 100.00 7/31/2004 $ 80.00 10/1/2004 Can you build the underlying data model from this? How many tables? What are the relationships? Understanding Functional Dependence For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. SSN Written: A B 123-45-6789 A determines B 123-46-9987 B is the determinant 123-02-0902 123-02-0993 Ex: SSN Name (Name is functionally dependent on SSN) Name George Foreman Georgeina Forman George Foreman George Foreman Normalization Lingo Prime attribute = Any attribute which is a primary key, or in the case of a composite key is part of a PK Non-Prime Attribute = Any attribute which is not part of the PK. Key Attribute = Prime Attribute Non-Key Attribute = Non-Prime Attribute Normalization and FD Technically, normalization is just the analysis of Functional Dependencies of all columns with respect to the primary key. There are three “levels” of analysis: 1. Functional Dependence – any non-prime attributes which as FD on the PK. 2. Partial Functional Dependence – any non-key attributes which are FD on part of the PK. 3. Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s). Activity: IYCDTYCN! FudgeCo Livery Driver Permissions Report Driver Driver Driver Driver ID # Name Chg/Hr Territories 101 Bill Melator $100.00 West, North, Central 102 Willie Work $75.00 South, East 103 Sal Ladd $75.00 Central, East, North 104 Carol Ling $100.00 Central, West Vehicle Lic Plate PPF673 PXK3D7T 445GH2 59DLLK PXK3D7T 663ETMP 667GM8 445GH2 59DLLK 667GM8 PPF673 59DLLK Vehicle Make Cadillac Chevy Lincon Lincon Chevy Chevy Audi Lincon Lincon Audi Cadillac Lincon Vehicle Model Escalde Tahoe Towncar Continental Tahoe Suburban A8 Towncar Continental A8 Escalde Continental Vehicle Size M L S S L L M S S M M S Vehicle Permission Chg/Hr Exp. Date $ 100.00 12/31/2004 $ 120.00 12/31/2004 $ 80.00 1/30/2005 $ 80.00 4/30/2005 $ 120.00 1/15/2005 $ 120.00 4/1/2005 $ 100.00 9/1/2005 $ 80.00 9/1/2005 $ 80.00 12/31/2004 $ 100.00 7/31/2004 $ 100.00 7/31/2004 $ 80.00 10/1/2004 Identify the: Identify the: Functional Dependencies (WRT the PK) Primary Key? Partial Functional Dependencies (WRT part of the PK) Prime Attributes? Non-Prime Attributes Transitive Functional Dependencies (WRT some non-prime attribute) The Dependency Diagram The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes. Primary Key Partial Dependency Driver Driver Driver Driver Vehicle Vehicle Vehicle Vehicle Vehicle Permission ID # Name Chg/Hr Territories Lic Plate Make Model Size Chg/Hr Exp. Date Partial Dependency Transitive Dependency Not FD (Multi-Valued) Normal Forms A Normal Form represents the current “state” of the data model. There are 4 basic normal forms: Zero Normal Form (0NF) Non-key attributes exist which are not FD on PK. First Normal Form (1NF) All non-key attributes FD on entire PK. Second Normal Form (2NF) In 1NF and No partial functional dependencies exist. Third Normal Form (3NF) In 2NF and No transitive functional dependencies exist. First Normal Form (1NF) Definition: All non-key attributes must be FD on the entire PK. (There must be PKFD for all attributes.) Rule: Move each non-key FD column into its own new table. How to Apply the Rule: For each non-key FD column: 1. Place non-FD column into a new table 2. Copy the PK (or part of it) from the original table into the new table. This will be a FK in the new table. 3. Assign a PK to the new table (typically a composite key of the original Non-FD column and the FK.) 1NF: Example 1/2 My Friends & Their Hobbies FID 101 102 103 104 105 Email [email protected] [email protected] [email protected] [email protected] [email protected] Name Seymour Ofu Isabelle Gunnering Pete Moss Erin Dutyres Chuck Itupp Hobby1 Hobby2 Hobby3 Hobby4 Basketball Golf Skiing Hiking Golf Skiing Skiing Basketball Basketball Golf What’s wrong with this data model? What should be PK be? Why? Is there an attribute not FD on the PK? Is it in 1NF already? What if Erin takes up bass fishing? I’m planning a ski trip, whom should I contact? (How do I know Hobby3, skiing and not Hobby1)? 1NF: Example 2/2 Friends FID 101 102 103 104 105 Email [email protected] [email protected] [email protected] [email protected] [email protected] Name Seymour Ofu Isabelle Gunnering Pete Moss Erin Dutyres Chuck Itupp What was done: Hobbies FID 101 101 101 101 102 102 103 104 105 105 Hobby Basketball Golf Skiing Hiking Golf Skiing Skiing Basketball Basketball Golf Hobbies table created. Contains the originally non FD column, “hobby” The PK (FID) was copied into the hobbies table. The PK of the Hobbies table is the combination of FID and Hobby. Questions: Is this in 1NF? Can you reproduce the previous data model from this one? Who likes skiing? Basketball? Second Normal Form (2NF) Definition: The data model must be in 1NF AND No partial functional dependencies can exist. Rule: Move each partially FD non-key column into its own new table. How to Apply the Rule: For each partial dependency: 1. Move all partially FD columns into a new table 2. Copy the determinant into the new table. 3. Make the determinant of the partial dependency: 1. The PK for the new table, FK to the existing table. 2NF: Example 1/2 Software Installed on IST Servers Serial Num VNK334 VNK334 VNK334 ASD44P ASD44P 81HLV3 81HLV3 Server Name www.ist www.ist www.ist iststudents iststudents istwebct istwebct Make HP HP HP Dell Dell Dell Dell Model SWID SW Title Date Installed Netserver LH4 101 Windows 2000 Server 8/1/2003 Netserver LH4 201 MS SQL Server 2000 8/1/2003 Netserver LH4 302 Segate BackupExec 7 10/1/2003 Poweredge 2550 101 Windows 2000 Server 4/15/2002 Poweredge 2550 202 MS SQL Server 2000 4/17/2003 Poweredge 4400 111 Red Hat Linux 7.3 12/4/2003 Poweredge 4400 301 Webct 4.1 12/12/2003 What’s wrong with this data model? What should be PK be? Why? Do any partial dependencies exist? Where? What is the determinant for each, if any? Is it in 1NF already? 2NF? I made a mistake, 81HLV3 is a Power edge 5500, not a 4400? 2NF: Example 2/2 Servers Serial Num VNK334 ASD44P 81HLV3 Server Name www.ist iststudents istwebct Make HP Dell Dell Model Netserver LH4 Poweredge 2550 Poweredge 4400 SWInstallation Serial Num SWID Date Installed VNK334 101 8/1/2003 VNK334 201 8/1/2003 VNK334 302 10/1/2003 ASD44P 101 4/15/2002 ASD44P 202 4/17/2003 81HLV3 111 12/4/2003 81HLV3 301 12/12/2003 Software SWID 101 201 302 111 301 SW Title Windows 2000 Server MS SQL Server 2000 Segate BackupExec 7 Red Hat Linux 7.3 Webct 4.1 What was done: Serial Num + SWID is the primary key. Servers, Software tables created from partial dependencies, where Serial Num,SWID are the determinants. Serial Num, is the PK for Servers, SWID is the PK for Software, each are also FK’s for the SWInstallation table Questions: Is this in 2NF? Can you reproduce the previous data model from this one? Third Normal Form (3NF) Definition: The data model must be in 2NF AND No transitive functional dependencies can exist. Rule: Move each transitive FD non-key column into its own new table. How to Apply the Rule: For each transitive dependency: 1. Move all transitive FD columns into a new table. 2. Copy the determinant column into the new table. 3. Make the determinant of the transitive dependency: The the PK for the new table. The FK for the original table. 3NF: Example 1/2 Fudgebar - Beer Inventory /Distribution Report Beer ID 101 105 102 106 107 108 Beer Name Keg Qty Distrib ID Mikealobe 3 501 Dudweiser 0 501 Meisterchau 2 601 Mil's Beast 4 601 Koors 3 701 Koors Lite 1 701 Distributor Anhoser-Busch Anhoser-Busch Millor Millor Koors Koors Dist. Phone 555-4901 555-4901 555-6691 555-6691 555-4904 555-4904 What’s wrong with this data model? What should be PK be? Why? Do any transitive dependencies exist? Where? What is the determinant for each, if any? Is it in 1NF already? 2NF? 3NF? I made a mistake, Koors phone number is 4905? What’s wrong? 3NF: Example 2/2 Distributors Beers Beer ID 101 105 102 106 107 108 Beer Name Keg Qty Distrib ID Mikealobe 3 501 Dudweiser 0 501 Meisterchau 2 601 Mil's Beast 4 601 Koors 3 701 Koors Lite 1 701 Distrib ID Distributor Dist. Phone 501 Anhoser-Busch 555-4901 601 Millor 555-6691 701 Koors 555-4904 What was done: Beer ID is the PK. All transitive dependencies moved into a new table, Distributors. Distrib ID is the determinant. PK of Distributors table, FK in original Beer table. Questions: Is this in 3NF? Can you reproduce the previous data model from this one? Higher Normal Forms Yes, there IS more… … and it will blow your mind. Boyce-Codd Normal Form (BCNF) Rule: Eliminate key-transitive dependencies A table in BCNF Means: The table is in 3NF It includes no Non-Key attribute which determines a key attribute, or part of a key attribute. BCNF: An Example Fourth Normal Form (4NF) RULE: Eliminate multiple sets of multivalued dependencies. A table in 4NF Means: The table is in 3NF It includes no sets of attributes which contain multi-valued dependencies. 4NF: An Example Figure 4.15 Set of Tables in 4NF Figure 4.14 Multivalued Dependencies How “far” should one Normalize? For relational databases: 1NF is required, at minimum for practical RDBMS implementations. The majority of the time data models are normalized to 3NF. Sometimes certain tables are left in 1NF or 2NF, for performance or practical reasons. Higher normal forms BCNF, 4NF are rare. In General, the Higher the NF of your DM: The more complicated the internal DM The more “programming” required to reproduce the external DM. But, the lesser the chance for data anomalies!! It’s a total trade-off: Database complexity vs. data anomalies. Mike’s “Road To 3NF” To normalize correctly, follow this process for each table in the data model: Designate a candidate key Any partial dependencies? PKFD for all attributes? n Apply 1NF Rule y 1NF y Apply 2NF Rule n Party Hard ! 2NF Any transitive dependencies? y Apply 3NF Rule n 3NF Normalization Summary Cheat Sheet 0NF O 1NF O 2NF O 1NF O (Resolve non FD) N 2NF (Resolve Partial FD) N1 O N2 3NF (Resolve Transitive FD) O N Data Normalization Questions?
© Copyright 2025 Paperzz