4838 Wilkinson Apps pp 182-202 8/9/99 10:01 AM Page 200 CD–200 • APPENDIX 6.1 addresses on the direct-access storage medium where the records relating to a particular occurrence are stored. For instance, the city inverted list would show Phoenix, Tempe, and Mesa. On the same line with Tempe might be these three disk addresses—12682, 13256, and 13890—which represent the locations of the records having listing numbers of 4, 8, and 10 (see Figure A5.1-2). Thus, if a real estate agent requests a list of all houses for sale in Tempe, the retrieval program simply determines the locations of the applicable records by reference to the inverted list and displays the requested data. When two or more inverted lists—each focused on a secondary key—are involved, the request is called a multiple-key inquiry. For instance, a request involving both inverted lists in Figure A5.1-4 might be, What are the addresses of houses for sale in Tempe that are priced at $100,000 or under? By adding an inverted list focused on “number of bedrooms,” the request could be further narrowed to, What are the addresses of houses for sale in Tempe having three bedrooms that are priced at $100,000 or under? Another example should clarify the construction of inverted lists. Figure A5.1-5 shows, in the upper part, several records of an accounts receivable master file arranged according to customer numbers. Each record contains three attributes that can serve as secondary keys: customer name, credit limit, and current balance. The lower part of the figure shows an inverted list pertaining to the credit limit attribute. All values of the credit limit—$2000, $3000, and $4000—are listed in the left column, expressed as numbers 2, 3, and 4. Each value has been cross-referenced to the disk addresses at which the records are located. For instance, the $2000 credit limit is cross-referenced to disk addresses 1000, 1300, and 1600, since those are the locations of the three records having that limit of credit. Assume that a user, say a clerk in the credit department, makes a request for those customers having a credit limit of $2000. The data base software will (1) scan the inverted list, (2) retrieve the records located at addresses 1000, 1300, and 1600, and (3) display the names John Waters, Shirley Trimble, and Doris Malcolm. APPENDIX 6.1 AN ILLUSTRATION OF THE NORMALIZATION PROCESS IN A RELATIONAL DATA BASE Student Registration Illustration Because of its importance, we will trace through an extended example of normalization based on student registration. Our illustration begins with a simplified student registration form shown in Figure A6.1-1, with data for a registered student, C.D. White. Using the data in student registration forms, we prepare the table shown in Figure A6.1-2, which is an extremely unnormalized table. We begin with this table and conclude with a set of normalized tables. Figure A6.1-2 shows a portion of an unnormalized table for students and the courses (classes) in which they are enrolled. It includes repeating groups (number, title, and room for each enrolled course), redundant data (title and room for each course enrolled in by each student), and dependency on a nonkey entity (assigned department depends on the course instructor and not the student). Instead of a single focus on students (the entity of concern), the table has a blurred focus that includes data concerning courses and instructors. Our first step in moving toward normalization is to eliminate the repeating groups, with the result shown in Figure A6.1-3.* At this stage, in technical terms, the table has been cast into first normal form. This table now has fixed-length rows. However, two difficulties have been introduced. First, the student number is no longer a unique identifier of each row. It is necessary to add the course number to the student number in order to gain uniqueness. Thus the concatenated student number-course number becomes the primary key. Second, redundancy is increased. Thus we cannot stop with this table form. The next step is to establish separate tables for students and courses, as shown in Figure A6.1-4. As a consequence, the student table contains only data that are functionally dependent on the student number; thus it is *When we refer to normalized tables, we mean the third normal form, as defined by E. F. Cobb. See C. J. Date, An Introduction to Database Systems, 4th ed. (Reading, Mass; Addison-Wesley, 1986), p. 99. Although some authorities prefer the Boyce–Codd normal form, which is quite similar but does involve the elimination of one additional redundant situation, the third normal form is widely employed. We should also note that while normalization is most frequently applied to relational data bases, its concepts also pertain to tree and network structures. 4838 Wilkinson Apps pp 182-202 8/9/99 10:01 AM Page 201 APPENDIX 6.1 • CD–201 Allstate University Student Registration Student Number: 123456789 Class Standing: Junior Student Name: CD White Cumulative GPA: 2.88 Course number ACC200 ENG300 MGT370 Course title Cost accounting English literature Organizations Course room BA212 LA162 BA330 Course time MW10:00 MW2:00 T TH9:00 Instructor's department Accounting English Management Course instructor Monroe Hart Engle FIGURE A6.1-1 A student registration form. Student number (key) Student name Class standing Cumulative GPA Course number Course title Course room 123456789 234567891 CD White SP Adams Junior Sophomore 2.88 3.17 ACC200 BUS300 Cost accounting Communications BA212 BA350 Course title Course room Course time Course instructor Instructor ’s department Course number MW10:00 MW11:00 Monroe Pugh Accounting Business ENG300 MAT250 Course time Course instructor Instructor ’s department MW2:00 T TH8:00 Hart James English Engineering English literature Calculus I LA162 LA210 FIGURE A6.1-2 An unnormalized table containing data relating to student courses. Student Course number (key) number (key) 123456789 123456789 123456789 234567891 234567891 ACC200 ENG300 MGT370 BUS300 MAT250 Student name Class standing Cumulative GPA Course title CD White CD White CD White SP Adams SP Adams Junior Junior Junior Sophomore Sophomore 2.88 2.88 2.88 3.17 3.17 Cost accounting English literature Organizations Communications Calculus I FIGURE A6.1-3 A student-course table in first normal form. Course room BA212 LA162 BA330 BA350 LA210 Course time Course instructor Instructor ’s department MW10:00 MW2:00 TTH9:00 MW11:00 TTH8:00 Monroe Hart Engle Pugh James Accounting English Management Business Engineering 4838 Wilkinson Apps pp 182-202 8/9/99 10:01 AM Page 202 CD–202 • APPENDIX 6.1 Course number (key) Course title Course room Course time Course instructor ACC200 ENG300 MGT370 BUS300 MAT250 Cost accounting English literature Organizations Communications Calculus I BA212 LA162 BA330 BA350 LA210 MW10:00 MW2:00 TTH9:00 MW11:00 TTH8:00 Monroe Hart Engle Pugh James Instructor’s department Accounting English Management Business Engineering Student number Student name Class standing Cumulative GPA Student number Course number 123456789 234567891 CD White SP Adams Junior Sophomore 2.88 3.17 123456789 123456789 123456789 234567891 234567891 ACC200 ENG300 MGT370 BUS300 MAT250 FIGURE A6.1-4 A set of student-course tables in second normal form. normalized. Also, the degree of redundancy concerning students and courses has been reduced. Again, however, two difficulties arise. First, the association between students and their courses has been lost. A student number–course number relationship table must therefore be added.* Second, the course table still contains a socalled transitive dependency, that is, the attribute “Instructor’s department” is not dependent on the course number, the primary key of the table. Hence, the course table is not yet in normalized form. The final step is to split the course table into a course table and an instructor table, as shown in Figure A6.1-5. To retain the linkage between tables, the instructor name Course number (key) Student number (key) is kept in the course table as a foreign key. This change has eliminated all remaining anomalies, so that updates, additions, and deletions can easily be made and no needed data are lost. For instance, if an instructor does not teach any courses during a semester, his or her data are not deleted from the data base. If a new course is added to the curriculum but not yet offered, it can be entered into the data base at once. The resulting tables (with data presented only in the instructor table), shown in Figure A6.1-5, are now normalized, that is, expressed in third normal form. They make up a data base that can flexibly meet the varied needs of users, even though the number of tables has been increased from one to four. Course title Student name Course room Class standing Course instructor (key) Instructor’s department Monroe Hart Engle Pugh James Accounting English Management Business Engineering Course time Cumulative GPA Course instructor Student number FIGURE A6.1-5 A set of student-course tables in third normal form. *Note that if grades are to be added to the data base, they must be inserted into the relationship table rather than either the student table or the course table. Course number
© Copyright 2026 Paperzz