The Practical Need for Fourth Normal Form

The Practical Need for Fourth Normal Form
Margaret
S. W%
425 Beldon
Iowa
City,
Phone:
Ave
Iowa
(319)
52246
335-0846
ABSTRACT
Many
practitioners
encountered.
data
violating
normal
form
and
We report
fourth
academicians
believe
that
upon
of forty
organizational
a study
normal
is more
form.
important
data
Consequently,
than
previously
violating
fourth
normal
database$
the
need
to
nine
form
is rarely
of them
understand
contained
and
use
fourth
believed.
INTRODUCTION
A paramount
issue in the design of any database is
what data fields should be grouped together into records.
In the relational
model, the data fields are grouped into
logical structures called relations.
The determination
of
which data fields are placed together in a relation is based
upon the concept of normal forms; the process is known
as normalization.
The set of data fields comprising the
database is progressively organized into relations in first
through fdth normal form
placed upon the relations
that
the relation
There
fourth
has no modification
is
some
normal
neglect
form
the topic
Stamper
normal
Price
in
[10]
are
so
rarely
in thk
book.”
and 4NF
highly
theoretical
step in the normalization
process, we may find that the
relations no longer require further reorganization,
in that
either
ignore
practitioners
case, we say that final normal form has been achieved.
Frequently,
a set of data may be in final normal form
Thus,
the practical
stops
with
normal
created to replace 3NF.
with
3NF in the normalization
process.
normal form called Domain Key
not applied to normalize data but
relation has been normalized to its
Permission
grantad
direct
to
copy
provided
commercial
that
of the publication
that
copying
and/or
specific
e 1992
ACM
fee
tha copies
advantage,
title
Machinery.
without
To copy
all or part
the ACM
otherwise,
is placed after
There is an addkional
normal form which is
serves to verify that a
final form by showing
of this
are not made
and its date
is by permission
BCNF
copyright
appear,
material
or distributed
notica
and notica
of the Association
or to republish,
is
and the
is given
The
a fee
9..,$1.50
19
are
by the
the
next
section,
for
4NF
paper
and discuss
state
the
section,
into
this
fourth
section.
Finally,
given
in the last
section.
in
these
the
results
are
In
The
second
of this
study.
of the
empirical
presented
of
need
procedures
the organization
a summary
results
databases.
of 4NF.
investigation
recc~rd
objectives.
the practical
world
in
update
process
result
methodology
insights
Additional
during
and
normalization
the
and
defined
is to present
we present
gained
were
will
the advantages
the research
issue.
apparently
normalization
of real
of
in
academicians
investigated
examination
we
theory
in meeting
which
[1,6,9]
view
by Edwards
forms
4NF
ineffective
of this
The
an academic
some
of the
are still
texts
inconsistencies
violating
study
4NF
third
for
are not
“Although
they
4NF.
is merely
the normal
data
purpose
details
4NF
termination
for
research.
permission.
0-89791-468-6192100021001
that
explain
is represented
BCNF
database
of an empirical
the
for Computing
requires
designs
section
for
the
the
or
prevent
anomalies,
5NF,
fifth
business
they
that
database
use of normalization
Because
to
fully
that
than
Other
4NF
and
in
hence,
may
in M [S.
“fourth
[8] states
be less rare
do not
3NF
practitioners.
In theory,
or
courses
that
obscure;
Mittra
view
and thus
encountered
in nature.”
he states
(3NF).
Because the definition
of 3NF does not treat
certain instances of data adequately, an additional normal
form
called Boyce-Codd
normal
form
(BCNF)
was
order
may
regarding
[3] where
form
state
as to be almost
described
academicians
management
and
applications
anomalies.
that
as unimportant
forms
BCNF
only to third
(4NF)
in database
(5NF) according to constraints
in each normal form. At any
when it has been normalized
evidence
these
In
of data
in
results
the
is
NORMALIZATION
The deffition
follows:
OF DATA
To normalize
from frost normal
a set of data items, we may proceed
form (lNF)
to second normal form
A relation
or groups
only
of fields
attribute
atomic
values.
normal
form
A holds
in R, then
in R such
R.
To
->->
Y
(FY,Z)
ad
(XJ,Z’).
A
relation
schema
do not
= t2[X].
fourth
holds
for
tuples
form,
R(X,Y,Z)
the
is required.
if
and
only
concept
of
X
MVD
X ->->
a subset of X or XUY
Y is said to be. trivial
ifY
and
is
Table
nine
databases
1
with
data
violating
4NF
Functional
Total
Orszanization
Area
Fields
Z!I&S
Financial
Firm
Sales
Management
System
307
24
Physician
Database
Data
on Individual
Physician
68
6
63
9
55
8
47
17
42
7
33
5
18
7
16
4
Publishing
ColIege
Firm
Job
Apartment
Complex
Division
Scheduling
Student
of Nursing
Skills
Maintenance
of Sponsored
Grants
Database
Tracking
Database
*
Record
Research
Animal
Laboratory
Athletic
Tutoring
Automotive
Parts
Inventory
Program
Assignment
Retailer
Inventory
real world
of Animals
of Tutors
20
data, we
if R is in
fidly to the original
deftition
given by Fagin
However,
the theoretical
differences
do not affect
normalization
of real world databases.
= R.
The
of normrdizing
BCNF
and does not
contain
any nontrivial
multivalued
dependencies.
We note that this simpler statement does not conform
whenever
are tuples of R, then so are (Ly’,z)
LY’,z’)
fact
dependency does
is necessary.
To
may restate the deftition
of 4NF as follows:
A relation R is in fourth normal form
o3
An MVD
if
deftition
of 4NF.
For the purpose
and t
a superkey
a
obtain 4NF, the relation R is decomposed
into two or
more relations which meets the constraints given by the
X ->
tl
if, w~enever
Y
is not independent,
i.e., a mukivalued
not exist, redundancy
of data values
single
dependericy
two
X is called
(MVD)
are
is in 4NF
X ->->
about X. It is possible for Y to be called a multivalued
fact about X and yet the relationship
between X and Y is
not a mukivalued
dependency.
When a multivalued
fact
R is in Boyce-Codd
exist
normal
deDendencv
lNF
MVD
column name of R .
If X ->->
Y, we also say that Y is a mukivalued
for each record.
by
a functional
there
tl[X]
define
An
permitted
if whenever
that
multivalued
are repeated
values
R*
[51 as
holds for R , then so
does the functional. dependency X -> A for every
we can proceed directly from lNF to BCNF. The use of
BCNF
rather
than
3NF
is
preferred
because
normalization
to BCNF is easily understood by students
and can be quickly implemented,
in addition, a relation in
BCNF is dSO in 3NF.
To normalize
a record type to lNF, any repeating
fields
schema
nontrivial
(2NF) to third normal form (3NF) to Boyce-Codd normal
form (BCNF) to fourth normal form (4NF). In practice,
The
of 4NF is then given by Fagin
[4].
the
RESEARCH
The
for
METHODS
databases
this
division
Each
organization
fum,
or a division
database
required
with
the exception
for
within
and
the
The
the
a large
study
500 firm,
a major
financial
for
course.
team
record
carefully
reviewed
to obtain
types
the organizations
large
was made
for
the data
to resolve
and the relationships
as a
were
Additional
then
This
study
has
at least
constituting
study.
over
Twenty
forty
indicates
received by the organization.
with
relations
to
the
Because
of this
in
data
violating
normalized
incidence,
the
4NF.
The
database
of normalized
record
The
each
tabulation
magnitude
of
databases
were
represented
average
4NF.
of over
is not
yields
Both
in
the
40 data
large
required
financial
eight
and
for
databases
the
normal
possible
form by representing
Similarly,
by
its
COI)E
we add the field
a Y or N value corresponding
value
of
FIELD
PENDING
We then have two relations as shown
the
(ACCOUNT
ID, HOME
CODE, HOME-PENDING(ACCOUNT
ID, FIELD
REQUIREMENT,
P-REQTS-
Y-N)
sales
These
was found
contain
two
and require
an
Another
fields.
OF FOURTH NORMAL
It is sometimes
if the value of
identified
to contain
relations
no further
FORM
CODE
to avoid the use of fourth
PENDING
the data differently.
codes.
As an
21
field
do
not
violate
has only
fourth
normal
form
normalization.
representation
the use of buckets.
AVOIDANCE
has not been
For example,
PENDING
PENDING
small
database
firm
of information
document
CODE-Y-N)
Pending-Reqts-YN
found
of
The
PENDI~U
in the HOME
Home-Code-YN
PENDING
is also
fields
ID, FIELD
each
number
picture
study.
fields
of a major
other
of data
value
to
a
particular
REQUIREMENT.
below
data
for
The
particular
P-REQTS-Y-N
contained
fields
organization
a general
databases.
The
which
tabulated.
each
the
item
field has been received or not.
six
study.
so rare.
has been
for
this
regarding
of data
included
system
represent
in
databases
by the 307 data
management
to violate
these
types
types
a
corresponding
BCNF
record
a particular
whether
the
of
record
of the number
organization
in this
comprising
number
types
4NF
databases
normalization
4NF
organizational
violating
databases
types
information
organizational
at
from
1 provides
found
of the
record
These
4NF.
of
given.
350
resulted
percent
Table
percent
the
data
organizational
PENDI~U
01 is present for HOME
PENDING
CODE
for a
particular
value of ACCOUNT
ID, we know that the
document
identified
as 01 has not been received.
‘To
represent this information
without violating fourth normal
form,
we will
require
that all values for HOME
PENDING
CODE be given and an additional field called
HOME-PENDING-CODE-Y-N
be included.
This latter
field will contain either a Y or N value to indiciite
in
the data.
that
in nine
twenty
of
databases
determined
once
to form two
ID, HOME
(ACCOUNT
as necessary
RESULTS OF THE STUDY
occurred
CODE)
REQUIREMENT)
It is possible to represent
the attributes
of HOME
PENDING
CODE
and
FIELD
PENDING
REQUIREMENT
so that the data does not violate 4NF.
The value of these fields is given by a numeric value that
contact
any ambiguities
between
PENDING
is decomposed
(ACCOUNT
CODE)
Pending-Reqts
were
teams
reports
4NF, thk relation
Home-Codes
a
for
and Design
by the author
in 4NF.
HOME
PENDING
a
for
of
student
These
To obtain
remainder
Analysis
ID,
FIELD
relations:
for
databases
The
by
CODE,
firms,
system
to present
3NF.
and revised
record
the data
studied
in
private
database
in the Systems
Codes (ACCOUNT
organizations.
database
4NF.
was required
types
were
three
the
functional
management
violated
were
classes
Each
of
These
that
organizations
sales
to
was studied
were
entire
a
example of this approach, let us consider the database of
the Financial
Organization.
This database in BCNF
contains the following
relation
violating
fourth normal
form:
organization
which
manufacturing
the
firm.
limited
government
the
and
firm
firm,
university.
of the
twenty
were
the
data
undergraduate
set
university,
included
Fortune
was
area
studied
a small
organizations
two
firm,
examined
was
of one business
remaining
were
of a major
organization
functional
Eighteen
reprographics
the
each
for
entirety.
units
organizations
project.
data
its
forty
of a large
The
in
from
for
In actuality,
19 possible
REQUIREMENT
this
data
is possible
the HOME
codes
field
while
has only
by
PENDING
the FIELD
five possible
Table
Violations
The
2
of Fourth
Normal
Form
In this research study, nine databases were found to contain data violating fourth normal form.
organizations is shown below in relations normalized to 4NF.
1.
Financial Organization
The relevant data for these nine
- Sales Management System
Home-ties
(ACCOUNT ID, HOME PENDING CODE)
Pending-Reqts (ACCOUNT ID, FIELD PENDING REQ UIREMEMP)
2.
Physician Identification
System
Is-Member (PHYSICIAN
Works-At (PHYSICIAN
3.
Publishing
Firm
- Scheduling
Associated-Job
Colors
4.
(JOB
(JOB
College
of Nursing
(COURSE,
Family
Housing
(APT
Division
of Sponsored
(GRANT
7.
Animal
Research
CODE,
Tutoring
DATE)
Database
NUMBER
NUMBER
INVESTIGATOR
ANIMAL
NUMBER)
ID)
Assignment
Service did not wish to classify individual
a student
Tutor-Subjects
(TUT OR ID, SUBJECI_
Tutor-Courses
(TUT OR ID, COURSE
Automotive
MAINTENANCE
SPRAYED)
KEYWORD)
(ACCOUNT
areas for which
AREA
of Animals
System - Tutor
subject
CODE)
CODE,
ID, KEYWORD)
Athletic
The Athletic
CODE)
DISCIPLINE)
CODE,
(ACCOUNT
Tutoring
SKILL
System
APT
- Grants
Account-Animals
Note:
9.
DATE,
- Inventory
Account-Investigators
8.
ADJOINfNG
(UNfVERSITY
Laboratory
Tracking
MAINTENANCE
SPRAY
(GRANT
Faculty-Interests
PREREQUIWI’E
- Maintenance
CODE,
Grant-Keyword
Database
CODE)
CODE,
(APT
Grant-Area
Skills
CODE,
Spraying
JOB NUMBER)
REQUIRED)
CODE,
SKILL
(AIT
Maintenance
6.
- Student
Office
Adjoining-Apt
ASSOCIATED
COLOR
(SKILL
Course
System
NUMBER
NUMBERj
Has-Prerequisite
5.
ID, GROUP ID)
ID)
ID, LOCATION
was deemed
qualified
courses by subject
to sewe as a tutor.
AREA)
lD)
Parts Retailer - Inventory
Vehicle-Engine-Part
(W3 HICLE,
MODEL.
YEAR
ENGINE
TYPE, ENGINE
ENGINE-PART-NUMBER)
Vehicle-Feature
(VEHICLE,
MODEL,
Note:
FEATURE
consists
The field
YEAR
FEATURE,
~
of several subfields.
22
FEATURE,
area(s)
but did wish to track both
courses and
Consequently,
we may choose to store
information
in one relation as shown below
thk
5.
same
“Multivalued
Dependencies
and a
Fagin, Ronald.
New Normal Form for Relational
Databases+
ACM
All-Codes
(ACCOUNT
ID, HOME-PENDING6.
CODE-1, .... HOME-PENDING-CODE-19,
FIELD-PENDING-REQUIREMENT-1,
.... FIELD-PENDING-REQUIREMENT-5)
7.
Although the relation All-Codes is technically in 4NF,
the use of the bucket violates the principles
of the
relational
model.
The organization
may choose the
bucket representation
because these data fields have been
in use for several years and it is unlikely that change
8.
would
9.
occur.
The
bucket
representation
Desire
of data
databases showed that
believed.
Data
violating 5NF did not occur in any of the databases which
may indicate that 5NF is only an academic issue, Because
data violating
4NF was found to occur in over twenty
percent of the organizations
in the study, it is advisable
that practitioners
understand
4NF.
It is also imperative
emphasized
in the teaching
and apply the constraints for
that the rules for 4NF be
of normalization
theory.
ACKNOWLEDGEMENT
The author wishes to thank Jeff Hoffer
for su~esting
this research project.
REFERENCES
1. Courtney, James F. and David B. Paradice.
Database Svstems for Management.
St. Louis:
Times Mh-ror/Mosby
College Publishing Co., 1988.
2. Date, C. J. An Introduction
to Data Base
m.
5th ed. Reading,
Publishing Co., 1990.
3.
Edwards,
Joe B.
Compromise}
4.
53-58.
Elmasri,
Ramez
“High
MA:
Addison-Wesley
Performance
Datamation,
~,
and Shamkant
Without
13, (July 1, 1990),
B. Navathe.
Fundamentals
of Database Svstems. Redwood
City, CA: Benjamin/Cummings
Publishing Co.
1989.
23
R.
Inc., 1991.
and Mana~ement:
New York
CONCLUSION
The study of forty real world
Svstems, ~, 3,
Pratt, Philip J. Microcomputer
Data Base
Mana~ement
Usimz dBASE III Plus. Boston,
Boyd & Fraser Publishing Co., 1988.
10. Stamper, David and Wilson Price. Database
potential problems due to its inflexibility.
We note that
the proper choice for data representation
is dependent on
the
individual
installation’s
preference
for
data
representation,
the requirements
for flexibility,
the need
4NF occurs more often than we previously
on Database
Kent, W. “A Simple Guide to Five Normal Forms
in Relational
Data base Theory, Communications
of the ACM, 26, 2, (Feb. 1983), 120-125.
Mittra, Sitansu S. PrincirJes of Relational
Database Svstems. Englewood
Cliffs, N. Y.:
Prentice-Hall,
presents
to minimize
storage space, and the trade-off
structure for ease of programming.
Transactions
(Sept. 1977), 262-278.
Fife, Dennis, W. Terry Hardgrave, and Donald
Deuth.
Database Concepts.
Cincinnati
Southwestern
Publishing Co., 1986.
McGraw-Hill,
An A~plied
Inc., 1990.
MA
Ap~roach.