Database Design Theory

Database Design Theory
www.AssignmentPoint.com
www.assignmentpoint.com
Database Design Theory
 Given some body of data to be represented in a
database, as modelled in an E-R diagram, what is
the most suitable logical structure for that data?
 How do we decide on the appropriate tables and
the attributes of the tables?
www.assignmentpoint.com
Requirements
 Accommodate data integrity.

General integrity constraints


E.g., referential integrity
Domain specific integrity constraints

E.g., no user can borrow more than 4 books.
 Robust in the sense that the design should be
application independent.
 We try to achieve this through the elimination of
redundancy.
www.assignmentpoint.com
The Danger of Redundancy
 Consider the example





For students, we want to know student ID, name
and address.
For courses, we need to know course ID, title and
lecturer.
For employees, we need to know the employee ID,
name and department.
For each department, we need to know the
department ID, the name and the location.
For each enrollment, we need to know the grade.
www.assignmentpoint.com
The Danger of Redundancy
Continued
 One solution store everything in one big table
Appl(sid,name,addr,
cid,title,
eid, ename,
deptID, dname, loc,
grade)
 Clearly, this leads to redundancy.

For example, we need to store the student’s
address for every course they have been
registered for.
www.assignmentpoint.com
The Danger of Redundancy.
Conclusion
 If everything in one table, then


Greater space requirements
Insertion anomalies


Deletion anomalies


Cannot store information on student who has not
passed a course yet.
We may want to delete a course but some student
may be registered only for that course.
Update anomalies


If a student changes their address, many tuples need
to be updated.
www.assignmentpoint.com
Danger of inconsistency in database.
Good Database Design
 The basic idea:

A “good” database is one in which each table
consists of a primary key and a set of mutually
independent attributes.
 Strategy for achieving a good database design:

Identify undesirable dependencies in a table and
decompose by projection.
www.assignmentpoint.com
Functional Dependencies (FD’s)
 Attribute (set) Y is functionally dependent on
attribute (set) X if, whenever two tuples have the
same value for X, they also have the same value
for Y.
 Notation: X  Y
 X is called the determinant.
 An FD A B is non-trivial if and only if B  A and
BA.
www.assignmentpoint.com
Functional Dependencies in Our Example.
 If everything is in one table, then these FD’s exist:





sid  name, addr
cid  title
eid  ename
deptid  dname, loc
sid, cid  grade
 Note we also have

sid, cid, eid, deptid 
All other attributes
www.assignmentpoint.com
Keys Again
 A set of attributes X in a relation R is a superkey if
every attribute in R is functionally dependent on
X.
 A candidate key is a minimal superkey.
 Alternate keys are candidate keys that have not
been selected as primary keys.
 A prime attribute is a member of a candidate key.
www.assignmentpoint.com
Armstrong’s Axioms
 Let X,Y and Z be sets of attributes of a relation R


Reflexivity: (X Y)  (X  Y)
Augmentation
(X  Y)  (XZ  YZ)

Transitivity
((X Y) & (Y Z))  (X Z)
 Axioms are sound and complete


Can derive all FDs that follow from a given set of
FDs.
Derive only true FDs
www.assignmentpoint.com
Some Consequences of Armstrong’s
Axioms
 The following are implied by Armstrong’s axioms:



Decomposition
(X  YZ) (X  Y)
Union
((X  Y)&(X  Z))  (X  YZ)
Pseudo transitivity
((X  Y)&(WY  Z))  (WX  Z)
www.assignmentpoint.com
Closure of a Set of Functional
Dependencies
 If F is a set of functional dependencies, the
closure of F, F+, is the set of all functional
dependencies logically implied by those in F.
 Useful since it allows us to determine candidate
keys (there must be functional dependency to all
other attributes), but very expensive to compute.
www.assignmentpoint.com
Closure Under a Set of Functional
Dependencies
 Since F+ is too expensive to compute, we use
closure of X under a set of functional
dependencies, X+.
 (X  Y) in F if and only if
Y  X+.
 Since X+ is relatively easy to compute, we can
now verify whether X is a superkey.
www.assignmentpoint.com
Computing X+.
 To compute X+ under a set of FDs F:
INPUT: X, F
OUTPUT: X+
S := X
WHILE
there is a (ZY) in F
with Z  S and Y  S
DO S := SY
ENDWHILE
X+ = S
www.assignmentpoint.com
Decomposition
 Recall that having identified undesirable FDs, we
now need to decompose.
 Decomposition:


Let U be a relation scheme. A set of {R1,..,Rn} of
relation schemes is a decomposition of U if
R1  …  Rn = U
Every attribute of U occurs in at least one Ri.
www.assignmentpoint.com
Desirable Properties of Decomposition
 Decompositions should be




Lossless
Dependency preserving
No redundancy
Minimal number of tables
 Sometimes, not all properties can be achieved
simultaneously.
www.assignmentpoint.com
Lossless Decomposition
 Let



{R1,..,Rn} a decomposition of U
u relation instance over U
Pi = Ri(u) for i from 1 to n
 Then

{R1,..,Rn} is a lossless decomposition if
u = P1 … P n
 In other words, the original relation can be
reconstructed.
www.assignmentpoint.com
Dependency Preserving Decompositions
 In decomposing a table, ensure that any FDs are
easily enforceable.
 Example:




Relation U(A,B,C)
FDs: A  B, A  C, B  C
If we decompose U into R(A,B) and S(B,C), then A
 B, B  C can be easily enforced when changing
R or S.
Because of transitivity, A  C is automatically
enforced.
www.assignmentpoint.com
Non-Dependency Preserving
Decomposition
 If we decompose U into R’(A,B) and S’(A,C), then
enforcing A  B and A  C is easy.
 However, B  C becomes an interrelational
constraint and can only be enforced through a
join.
 This decomposition is not dependency
preserving.
www.assignmentpoint.com
Normalization
 Normal forms, as defined in relational database
theory, are guidelines for the design of the tables
in the database.
 Normalization reduces redundancy.
 Important to remember why we want to avoid
redundancy


Space requirements
Insertion, deletion and update anomalies.
www.assignmentpoint.com
The Normal Forms
 First normal form
 Second normal form
 Third normal form
 Boyce-Codd normal form
 Fourth normal form
 Fifth normal form
 The normal forms are ordered in that everything
in 2NF is also in 1NF.
 We ignore 5NF, as violations hardly occur in
practice.
www.assignmentpoint.com
First Normal Form
 A relation is in 1NF iff the value of each attribute
in a tuple is atomic.
 A relation which is not in 1NF
sid
123
234
cid
CS51S
CS51T
CS51S
CS52S
CS52T
grade
A
B
C
B
B
www.assignmentpoint.com
Getting Tables into 1NF
 Normalizing a table which is not in 1NF is easy:
Simply repeat the other fields.
 Thus
sid
123
123
234
234
234
cid
CS51S
CS51T
CS51S
CS52S
CS52T
grade
A
B
C
B
B
www.assignmentpoint.com
Second and Third Normal Form
 Second and third normal form concern
relationship between non-key and prime
attributes.
 Recall that a prime attribute is a member of a
candidate key.
 Under 2NF and 3NF, a non-key attribute value
must provide a fact about the key, the whole key
and nothing but the key.
 Every non-prime attribute must be fully
functionally dependent on a candidate key.
www.assignmentpoint.com
Second Normal Form
 2NF is violated when a non-key attribute depends
on a proper subset of a candidate key.
 The following violates 2NF
Result(cid, sid, name, grade)
 as name is functionally dependent on sid alone.
www.assignmentpoint.com
Dangers of Violating 2NF
 Note that name is repeated for every course that
a student has a grade for.
 Problems:


Danger of inconsistency if a student changes their
name, e.g., by getting married.
If a student has not passed any courses yet, then
the student’s name cannot be stored.
www.assignmentpoint.com
Getting Tables into 2NF
 Decompose the table into
Result(cid, sid, grade)
Student(sid, name)
 This decomposition leads to longer retrieval times
for queries which involve joins.
 Normalization is necessary to avoid anomalies
which arise because of changes to attributes.
 If little chance of changes, then sometimes do not
normalize.
www.assignmentpoint.com
Third Normal Form
 Third normal form is violated when a non-prime
attribute depends on another non-prime attribute.
 The following violates 3NF
Empl(eid, dept, loc)
 loc is a fact about dept.
 Danger same as violation of 2NF.
www.assignmentpoint.com
Getting Tables into 3NF
 Again, decompose
Empl(eid, dept)
Department(dept, loc)
 We can always restore 3NF through a lossless
and dependency preserving decomposition.
www.assignmentpoint.com
Boyce-Codd Normal Form (BCNF)
 A relation scheme R is in BCNF if every
determinant of a FD over R is a candidate key.
 In other words, the determinant of every FD is a
superkey.
 Violation of BCNF



R(A,B,C,D,E,F)
{ A  BC, D  AEF }
D+ = ABCDEF


D is a good primary key.
A+ = ABC
www.assignmentpoint.com
Another Violation of BCNF
 Assume that we give each registration for a
course a unique registration number
Reg(rid, sid, cid, sname, grade)
 FDs




rid  sid, cid
sid, cid  rid, grade
sid  sname
rid+ = All attributes
www.assignmentpoint.com
Getting Tables into BCNF
 Decompose according to the FD whose
determinant is not a superkey.
 In our example, sid  sname
 This gives


Reg(rid, sid, cid, grade)
Stud(sid, sname)
 Not always possible to get tables into BCNF while
preserving all functional dependencies.
www.assignmentpoint.com
Example where BCNF is not possible
 Consider


R(A,B,C)
{ AB  C, C  B}
 Not in BCNF because C is not a superkey.
 However, every decomposition of R fails to be
dependency preserving as we have to split up the
attributes in AB  C
 Have to settle for 3NF.
www.assignmentpoint.com
Multivalued Dependencies (MVDs)
 In an FD, X
Y, knowing the value of X means
that you know the unique value for Y.
 In an MVD, X   Y, knowing the value of X
means that you know the set of values from which
Y can come.

www.assignmentpoint.com
Example of MVD
 Assume we have two streams for some course,
taught by different instructors, and that for each
course, we use two textbooks.
 Example:
course
CS51T
CS52S
instructor
Rao
Mansingh
Rao
Stitt
text
Date
Korth
Jackson
Rich
www.assignmentpoint.com
Example of MVD Continued
 Putting table in 1NF gives
course
CS51T
CS51T
CS51T
CS51T
CS52S
CS52S
CS52S
CS52S
instructor
Rao
Rao
Mansingh
Mansingh
Rao
Rao
Stitt
Stitt
text
Date
Korth
Date
Korth
Jackson
Rich
Jackson
Rich
 With primary key

course, instructor, text
www.assignmentpoint.com
 Since no FD, in BCNF.
Redundancy because of MVDs
 However, still redundancy in the table because

if <c,p,x> and <c,p’,x’> in table <c,p’,x> and
<c,p,x’> in table too.
 The table contains two multivalued dependencies:
 course   instructor

course   text
 Danger of insertion and update anomalies
www.assignmentpoint.com
Fourth Normal Form
 Under 4NF, a relation should not contain two or
more independent MVDs.
 In other words, if there is a MVD, X   Y, then
X should be a superkey.
www.assignmentpoint.com
Getting Tables into 4NF
 Again, get a table into 4NF through
decomposition so that each MVD is captured in a
separate table.
 Example:


CP(course, instructor)
CT(course, text)
www.assignmentpoint.com
Normalization Reconsidered
 Normalization helps avoid:



Insertion anomalies
Update anomalies
Deletion anomalies
 Normalization increases retrieval time for some
queries.
www.assignmentpoint.com