normalization first normal form (1nf)

NORMALIZATION
FIRST NORMAL FORM (1NF):
A relation R is in 1NF if all attributes have
atomic value
= one value for an attribute
= no repeating groups
= no multivalued attributes
= no composite attributes
Example
Non-1NF
EMP (E#, ENAME, SKILL). Here SKILL is
a multi-valued attribute.
EMP( E#, ENAME, SKILL1, SKILL2,
SKILL3, SKILL4, ....). Skill as a repeating
group attribute.
NON-1NF 1NF
There are two methods of converting a NON1NF into a 1NF relation. Method 1 mapps
out the multi-valued (or repeating group)
attribute into another table, while method 2
keeps the multi-valued attribute but simply
uses a composite PK.
Method 1: Conversion to 1NF
1. Create one relation for repeating groups by
adding the key of original relation.
2. Remove the attributes of repeating groups
from the original relation.
Example
SKILL (E#, SKILL)
EMP (E#, ENAME)
Note the composite PK of SKILL relation.
Method 2: We can also flatten the
table as follows:
EMP (E#, Skill, Ename)
- Elmasri's book uses this method.
- This method repeats the repeating group
value in a separate tuple.
- Note the composite PK.
What are advantages and
disadvantages of the two
methods?
DEPT (D#, DNAME, DMGRSSN, DLOC), where
DLOC is a multi-valued attribute
EMP_PROJ (SSN, ENAME, PROJS(PNUMBER,
HOURS)), where PROJS is a composite attribute.
Typically, most relational database systems assume
your DB should be 1NF
SECOND NORMAL FORM
(2NF)
A relation R is in 2NF if
(a) R is in 1NF, and
(b) each attribute is fully functionally
dependent on the whole key of R
Example
INVENTORY (WH, PART, QTY,
WH_ADDR)
WH, PART --> QTY
WH --> WH_ADDR (This is not in 2NF,
since WH is a part of a key)
Key: WH+PART
Problem of non-2NF (update
anomaly):
• Warehouse address is repeated for every
part stored
• If the address is changed, needs multiple
updates
• If no parts in a warehouse, can't keep the
warehouse address
2NF Decomposition:
• Create a separate relation for each PD
• Remove the RHS of the PD from the
original relation.
• The above Non-2NF can be transformed
into the following 2NF relations.
INVENTORY(WH, PART, QTY)
WAREHOUSE(WH, WH_ADDR)
Example
• NOTE that Non-2NF occurs only when we have
a composite key.
• EMP_PROJ (SSN, P#, HOURS, ENAME,
PNAME, PLOC)
SSN, P# --> HOURS SSN --> ENAME
(* Violate 2NF; SSN is a part of a key*)
P# --> PNAME, PLOC (* Violate 2NF; P# is a
part of a key *)
2NF decomposition
R1 (SSN, P#, HOURS)
R2 (SSN,
ENAME)
R3 (P#, PNAME, PLOC)
THIRD NORMAL FORM
(3NF)
A relation R is in 3NF if
a) it is in 2 NF and
b) it has no transitive dependencies.
That is, each nonkey attribute must be functionally
dependent on the key and nothing else. If you have
any FD whose LHS is not a PK (or CK), then R is
not in 3NF.
Example
WORK (EMP#, DEPT, LOC)
2NF
(1) EMP# --> DEPT
Y
(2) DEPT --> LOC
Y
KEY: EMP#
3NF
Y
N
WORK is in 2NF, but not in 3NF because of
FD (2).
Problem of Non-3NF
• Dept. location is repeated for every
employee
• If the location is changed, needs multiple
updates
• If you forget to change all records, can
cause inconsistency
• If a dept. has no employees, can't keep dept
location
3NF DECOMPOSITION
Algorithm for a given minimal cover:
1) Combine the RHS of FDs if they have common
LHS
2) Create a separate table for each FD.
3) Check for Lossless decomposition.
(Check whether a CK of the original realtion
appears in any of the decomposed relation). IF not
lossless, then add a table consisting of a CK.
Example
• R1 (EMP#, DEPT), R2 (DEPT, LOC)
• The original relation WORK is not in 3NF,
but R1 and R2 are in 3NF.
• Note that the LHS of a FD becomes the PK
of each decomposed table.
• Our 3NF definition we used above is an
informal one used by many industry
designers. Some DB text books, including
Elmasri's book use a more rigorous
definition that is shown below.
Formal Def. of 3NF
A relation R is in 3NF if, for all X --> A in R
(1) X is a super key or
(2) A is a prime attribute (where X and A
could be a set of attributes)
In other words, all attributes, except prime
attributes, must be dependent on any
candidate keys.
• The only difference between the informal
definition and the formal definition is the second
condition in the formal definition. That is, the
formal definition allows transitive dependency
whose RHS is a prime attribute, where a prime
attribute is an attribute that belongs to any
candidate key. The difference between these two
definition is very minor and many real-world DB
designers just use the informal definition. For you
reference, we showed the formal definition of
3NF.
SUMMARY OF
NORMALIZATION
- As we go to higher normal forms, we create a more number of
relations.
- Each higher normal form removes a certain type of dependency that
causes redundancy.
- As a relation becomes a higher normal form:
- We have a more number of relations
- That increases more number of joins in query forming
- Which increases more number of join processings
- And also more referential integrity constraints need to be maintained
- And thus schema is complicated and performance is drcreased.
So, many real-world DB designers stop at 3NF, which reasonably
removes typical redundnacy and still maintains performance. So,
strive to achive 3NF in your real-world RDB!