download

4838 Wilkinson Apps pp 182-202 8/9/99 10:01 AM Page 200
CD–200
•
APPENDIX 6.1
addresses on the direct-access storage medium where
the records relating to a particular occurrence are
stored. For instance, the city inverted list would show
Phoenix, Tempe, and Mesa. On the same line with
Tempe might be these three disk addresses—12682,
13256, and 13890—which represent the locations of the
records having listing numbers of 4, 8, and 10 (see Figure A5.1-2). Thus, if a real estate agent requests a list of
all houses for sale in Tempe, the retrieval program simply determines the locations of the applicable records by
reference to the inverted list and displays the requested
data.
When two or more inverted lists—each focused on a
secondary key—are involved, the request is called a
multiple-key inquiry. For instance, a request involving
both inverted lists in Figure A5.1-4 might be, What are
the addresses of houses for sale in Tempe that are priced
at $100,000 or under? By adding an inverted list focused
on “number of bedrooms,” the request could be further
narrowed to, What are the addresses of houses for sale
in Tempe having three bedrooms that are priced at
$100,000 or under?
Another example should clarify the construction of
inverted lists. Figure A5.1-5 shows, in the upper part,
several records of an accounts receivable master file
arranged according to customer numbers. Each record
contains three attributes that can serve as secondary
keys: customer name, credit limit, and current balance.
The lower part of the figure shows an inverted list pertaining to the credit limit attribute. All values of the
credit limit—$2000, $3000, and $4000—are listed in
the left column, expressed as numbers 2, 3, and 4.
Each value has been cross-referenced to the disk addresses at which the records are located. For instance,
the $2000 credit limit is cross-referenced to disk addresses 1000, 1300, and 1600, since those are the locations of the three records having that limit of credit.
Assume that a user, say a clerk in the credit department, makes a request for those customers having a
credit limit of $2000. The data base software will (1)
scan the inverted list, (2) retrieve the records located
at addresses 1000, 1300, and 1600, and (3) display the
names John Waters, Shirley Trimble, and Doris Malcolm.
APPENDIX 6.1
AN ILLUSTRATION OF THE NORMALIZATION PROCESS IN A RELATIONAL DATA BASE
Student Registration Illustration
Because of its importance, we will trace through an
extended example of normalization based on student registration. Our illustration begins with a simplified student registration form shown in Figure
A6.1-1, with data for a registered student, C.D. White.
Using the data in student registration forms, we prepare the table shown in Figure A6.1-2, which is an extremely unnormalized table. We begin with this table
and conclude with a set of normalized tables. Figure
A6.1-2 shows a portion of an unnormalized table for
students and the courses (classes) in which they are
enrolled. It includes repeating groups (number, title,
and room for each enrolled course), redundant data
(title and room for each course enrolled in by each student), and dependency on a nonkey entity (assigned
department depends on the course instructor and
not the student). Instead of a single focus on students (the entity of concern), the table has a blurred
focus that includes data concerning courses and instructors.
Our first step in moving toward normalization is to
eliminate the repeating groups, with the result shown in
Figure A6.1-3.* At this stage, in technical terms, the table
has been cast into first normal form. This table now has
fixed-length rows. However, two difficulties have been introduced. First, the student number is no longer a unique
identifier of each row. It is necessary to add the course
number to the student number in order to gain uniqueness. Thus the concatenated student number-course
number becomes the primary key. Second, redundancy is
increased. Thus we cannot stop with this table form.
The next step is to establish separate tables for students and courses, as shown in Figure A6.1-4. As a consequence, the student table contains only data that are
functionally dependent on the student number; thus it is
*When we refer to normalized tables, we mean the third normal
form, as defined by E. F. Cobb. See C. J. Date, An Introduction to
Database Systems, 4th ed. (Reading, Mass; Addison-Wesley,
1986), p. 99. Although some authorities prefer the Boyce–Codd
normal form, which is quite similar but does involve the elimination of one additional redundant situation, the third normal
form is widely employed. We should also note that while normalization is most frequently applied to relational data bases,
its concepts also pertain to tree and network structures.
4838 Wilkinson Apps pp 182-202 8/9/99 10:01 AM Page 201
APPENDIX 6.1
•
CD–201
Allstate University
Student Registration
Student Number: 123456789
Class Standing: Junior
Student Name: CD White
Cumulative GPA: 2.88
Course
number
ACC200
ENG300
MGT370
Course
title
Cost accounting
English literature
Organizations
Course
room
BA212
LA162
BA330
Course
time
MW10:00
MW2:00
T TH9:00
Instructor's
department
Accounting
English
Management
Course
instructor
Monroe
Hart
Engle
FIGURE A6.1-1 A student registration form.
Student
number (key)
Student
name
Class
standing
Cumulative
GPA
Course
number
Course
title
Course
room
123456789
234567891
CD White
SP Adams
Junior
Sophomore
2.88
3.17
ACC200
BUS300
Cost accounting
Communications
BA212
BA350
Course
title
Course
room
Course
time
Course
instructor
Instructor ’s
department
Course
number
MW10:00
MW11:00
Monroe
Pugh
Accounting
Business
ENG300
MAT250
Course
time
Course
instructor
Instructor ’s
department
MW2:00
T TH8:00
Hart
James
English
Engineering
English literature
Calculus I
LA162
LA210
FIGURE A6.1-2 An unnormalized table containing data relating to student courses.
Student
Course
number (key) number (key)
123456789
123456789
123456789
234567891
234567891
ACC200
ENG300
MGT370
BUS300
MAT250
Student
name
Class
standing
Cumulative
GPA
Course
title
CD White
CD White
CD White
SP Adams
SP Adams
Junior
Junior
Junior
Sophomore
Sophomore
2.88
2.88
2.88
3.17
3.17
Cost accounting
English literature
Organizations
Communications
Calculus I
FIGURE A6.1-3 A student-course table in first normal form.
Course
room
BA212
LA162
BA330
BA350
LA210
Course
time
Course
instructor
Instructor ’s
department
MW10:00
MW2:00
TTH9:00
MW11:00
TTH8:00
Monroe
Hart
Engle
Pugh
James
Accounting
English
Management
Business
Engineering
4838 Wilkinson Apps pp 182-202 8/9/99 10:01 AM Page 202
CD–202
•
APPENDIX 6.1
Course
number (key)
Course
title
Course
room
Course
time
Course
instructor
ACC200
ENG300
MGT370
BUS300
MAT250
Cost accounting
English literature
Organizations
Communications
Calculus I
BA212
LA162
BA330
BA350
LA210
MW10:00
MW2:00
TTH9:00
MW11:00
TTH8:00
Monroe
Hart
Engle
Pugh
James
Instructor’s
department
Accounting
English
Management
Business
Engineering
Student
number
Student
name
Class
standing
Cumulative
GPA
Student
number
Course
number
123456789
234567891
CD White
SP Adams
Junior
Sophomore
2.88
3.17
123456789
123456789
123456789
234567891
234567891
ACC200
ENG300
MGT370
BUS300
MAT250
FIGURE A6.1-4 A set of student-course tables in second normal form.
normalized. Also, the degree of redundancy concerning
students and courses has been reduced. Again, however,
two difficulties arise. First, the association between students and their courses has been lost. A student number–course number relationship table must therefore be
added.* Second, the course table still contains a socalled transitive dependency, that is, the attribute “Instructor’s department” is not dependent on the course
number, the primary key of the table. Hence, the course
table is not yet in normalized form.
The final step is to split the course table into a course
table and an instructor table, as shown in Figure A6.1-5.
To retain the linkage between tables, the instructor name
Course
number (key)
Student
number (key)
is kept in the course table as a foreign key. This change
has eliminated all remaining anomalies, so that updates,
additions, and deletions can easily be made and no
needed data are lost. For instance, if an instructor does
not teach any courses during a semester, his or her data
are not deleted from the data base. If a new course is
added to the curriculum but not yet offered, it can be entered into the data base at once. The resulting tables
(with data presented only in the instructor table), shown
in Figure A6.1-5, are now normalized, that is, expressed
in third normal form. They make up a data base that can
flexibly meet the varied needs of users, even though the
number of tables has been increased from one to four.
Course
title
Student
name
Course
room
Class
standing
Course
instructor (key)
Instructor’s
department
Monroe
Hart
Engle
Pugh
James
Accounting
English
Management
Business
Engineering
Course
time
Cumulative
GPA
Course
instructor
Student
number
FIGURE A6.1-5 A set of student-course tables in third normal form.
*Note that if grades are to be added to the data base, they must
be inserted into the relationship table rather than either the
student table or the course table.
Course
number