Chapter 3 Normalization (Database Design) 1 Chapter 3 : Objectives What normalization is and what role it plays in database design About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF How normal forms can be transformed from lower normal forms to higher normal forms That normalization and E-R modeling are used concurrently to produce a good database design That some situations require denormalization to generate information efficiently 2 What is Normalization ? A technique for producing a set of relations with desirable properties (minimum data redundancy), given the data requirements of an enterprise. First developed by E.F. Codd (1972) 3 Why Normalization ? Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data, its relationships, and constraints. To achieve this objective, must identify a suitable set of relations. A relation can be normalized to a specific form to prevent possible occurrence of update anomalies. 4 Why Normalization ? As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies. 5 Why Normalize ? An ER Model is not always available as a starting point for design To reduce redundant data in existing design To increase integrity of data and stability of design To identify missing tables, columns and constraints 6 Why Normalization ? SID S1 S1 S1 S2 S2 S3 S3 S3 Name Joseph Joseph Joseph Alice Alice Tom Tom Tom Grade A B A A A B B A Course# CIS800 CIS820 CIS872 CIS800 CIS872 CIS800 CIS872 CIS860 Text b1 b2 b5 b1 b5 b1 b5 b1 Major CIS CIS CIS CS CS Acct Acct Acct Dept CIS CIS CIS MCS MCS Acct Acct Acct Is there any redundant data? Can we insert a new Course# with a new textbook? What should be done if 'CIS' is changed to 'MIS'? What would happen if we remove all CIS800 students? 7 Update Anomalies A poorly designed relation contains redundant data. Data redundancy causes update anomalies : Insertion Anomalies -An independent piece of information cannot be recorded in a relation unless an irrelevant information must be inserted together at the same time. Modification Anomalies - The update of a piece of information must occur at multiple locations (if not, it leads to inconsistency). Deletion Anomalies - The deletion of a piece of information unintentionally removes other information. Normalization is used to remove update anomalies by decomposing the relations into smaller relations based on the constraints. 8 First Normal Form All key attributes defined No repeating groups in table All attributes dependent on primary key 9 Example : 1NF Example : FIRST(S#,STATUS,CITY,P#,QTY) - Suppliers supply certain quantities of parts and are located in some city. - Status of the supplier is determined by the location (city) of that supplier. Fig : Relation FIRST S# S1 S1 S1 S1 S1 S1 S2 S2 S3 S4 S4 S4 STATUS 20 20 20 20 20 20 10 10 10 20 20 20 CITY London London London London London London Paris Paris Paris London London London P# P1 P2 P3 P4 P5 P6 P1 P2 P2 P2 P4 P5 QTY 300 200 400 200 100 100 300 400 200 200 300 400 10 Example : 1NF S# S1 S1 S1 S1 S1 S1 S2 S2 S3 S4 S4 S4 STATUS 20 20 20 20 20 20 10 10 10 20 20 20 CITY London London London London London London Paris Paris Paris London London London P# P1 P2 P3 P4 P5 P6 P1 P2 P2 P2 P4 P5 QTY 300 200 400 200 100 100 300 400 200 200 300 400 What is the Primary key of FIRST ? Is FIRST in 1NF ? Does FIRST have update anomalies ? 11 Example : 1NF To overcome the above anomalies in the relation FIRST , FIRST has to be decomposed into smaller relations (projections). => Decomposition 12 Non-loss Decomposition To overcome the update anomalies in a relation R, R is decomposed into smaller relations (projections). A bad decomposition loses information. In a good decomposition The join of decomposed relations restores the original relation. Decomposed relations can be maintained independently. 13 Non-loss Decomposition Rissanen's Rules: Two projections R1 and R2 of a relation R are independent iff Every FD in R can be logically derived from those in R1 and R2 and The common attributes of R1 and R2 form a candidate key for at least one of the pair. 14 Functional Dependency Given two attributes X and Y of a relation R, Y is functionally dependent on X iff each Xvalue in R has associated with it exactly one Y-value in R(if two tuples agree on their Xvalue, they also agree on their Y-value). X Y 15 Functional Dependency Example: List all FDs in FIRST FD Diagram of FIRST: S# CITY QTY P# STATUS 16 Properties of FDs X is called the determinant of Y X may or may not be the key attribute of R X and Y may be composite X and Y could be mutually dependent on each other Husband Wife Wife Husband The same Y-value may occur in multiple tuples 17 Properties of FDs An FD could be trivial or non-trivial An FD is trivial if it is impossible for it not to be satisfied if the RHS is a subset of the LHS In Normalization, the FDs of interest are : non-trivial hold for all times 18 Full Functional Dependency : Given attributes X and Y of a relation R, Y is fully functionally dependent on X if Y is functionally dependent on X but not on any proper subset of X. 19 Transitive Functional Dependency Given attributes X, Y and Z of a relation R, Z is transitively dependent on X iff X Y, Y Z and X Z 20 UNNORMALIZED FORM (UNF) A table that contains one or more repeating groups. Examples : Is FIRST Unnormalized ? Is relation SCH given below Unnormalized ? Student Courses Hobbies -----------------------------------------------------Jin DBMS Tennis OS Cooking Habib C++ Networking CA Reading Surfing 21 UNF to 1NF Remove repeating group by entering appropriate data into the empty columns of rows containing repeating data (‘flattening’ the table). Student Courses Hobbies -----------------------------------------------------------------------Jin DBMS Tennis Jin DBMS Cooking Jin OS Tennis Jin OS Cooking Habib C++ Reading Habib C++ Surfing Habib Networking Reading Habib Networking Surfing Habib CA Reading Habib CA Surfing 22 FIRST NORMAL FORM (1NF)revisited A relation R is in 1NF iff all attribute domains contain atomic values only. Example : FIRST is in 1NF FIRST has update anomalies 23 FIRST NORMAL FORM (1NF) FD diagram of FIRST: QTY S# CITY P# STATUS 24 SECOND NORMAL FORM (2NF) A relation R is in 2NF iff it is in 1NF and every nonprime-key attribute is fully functionally dependent on the primary key. A non-prime-key attribute does not belong to any candidate key. Is FIRST in 2NF ? How to normalize FIRST into 2NF ? Remove all partial (or non-full) dependencies from FIRST by decomposing FIRST into two smaller relations SP and SECOND 25 SECOND NORMAL FORM (2NF) Is the decomposition of First into SP and SECOND good ? SP(S#,P#,QTY) SECOND(S#,CITY,STATUS) 26 SECOND NORMAL FORM (2NF) FD Diagram of SP and SECOND S# CITY S# QTY P# STATUS SECOND is in 2NF SECOND has update anomalies 27 THIRD NORMAL FORM(3NF) A relation is in 3NF iff it is in 2NF and in which every non-prime-key attribute is nontransitively dependent on the primary key. Is SECOND(S#,CITY,STATUS) in 3NF ? 28 THIRD NORMAL FORM(3NF) How to normalize SECOND into 3NF ? Remove all transitive dependencies from SECOND by decomposing SECOND into two smaller relations SC and CS. SC(S#,CITY) CS(CITY,STATUS) Make sure that the decomposition is good ! 29 30 THIRD NORMAL FORM(3NF) FD Diagram of SC and CS S# CITY CITY STATUS 31 Help me Codd ! The key(1NF), the whole key(2NF), and nothing but the key (3NF) - so help me Codd !! 32 THIRD NORMAL FORM(3NF) Relation R(A,B,C,D) given below is in 3NF Still has Update Anomalies !! 33 Example : Example: TEACH (Student, Course, Instructor) A student can take several courses and can be taught by several instructors. For each course, each student of that course is taught by only one teacher. An instructor teaches only one course. Two Candidate keys : (Student, Course) (Student, Instructor) 34 Example : Choose (Student, Course) as the Primary key FD Diagram for TEACHES STUDENT INSTRUCTOR COURSE Is TEACHES in 3NF ? Does TEACHES have update anomalies ? What happens if (Student, Instructor) is chosen as the Primary key! 35 BOYCE/CODD NORMAL FORM A relation is in BCNF iff every determinant is a Candidate key. Update Anomalies occur in a 3NF relation R if R has multiple candidate keys Those candidate keys are composite and The candidate keys are overlapped How to Normalize TEACHES in BCNF ? Decompose TEACHES into SI (Student, Instructor) and IC (Instructor, Course) 36 Denormalization Normalization is one of many database design goals But Is it always the best possible design ? 37 Why Denormalize ? Normalization is the process of putting one fact in one appropriate place. This optimizes updates at the expense of retrievals. When a fact is stored in only one place, retrieving many different but related facts usually requires going to many different places. This tends to slow the retrieval process. Updating is quicker, however, because the fact you're updating exists in only one place. 38 Why Denormalize ? A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance. A normalized design will often store different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query that draws information from several relations (a join operation) can be slow. 39 Denormalization Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data. technique to move from higher to lower normal forms of database modeling in order to speed up database access. 40 An Example of Denormalization Consider a relation : Contact (Name, Street, Zip, City, Province) Name Street Zip City Province John 401 Sunset Av. N91 Q23 Windsor ON Harry 402 Sunset Av. N4T 3R5 Windsor ON Bill 123 First St. N4Y 7Y8 London ON PK of Contact ? Is Contact in 3NF ? 41 Example of Denormalization(contd) Was it worth the decomposition ? 42
© Copyright 2026 Paperzz