Establishment and maintenance of a storage hierarchy for an on-line data base under TSS / 360 by JAMES P. CONSIDINE and ALLAN H. WEIS Thomas J. Watson Research Center, IBM corporation Yorktown Heights, N ew York INTRODUCTION As on-line interactive systems increase in popularity, several problem areas become more and more apparent. One of these is the management of the on-line accessible data base. It has been the experience of installations throughout the country that such a data base tends, if ungoverned, to increase in size as the system continues in operation, bounded only by the size of the storage available to. contain it. It is, therefore, essential for the continuance of a viable system that this data base be examined and methods devised to control its growth. In the first section of this paper we record some observations we have made on the nature of one particular on-line data base, specifically its growth and usage characteristics. The second section details a system we have designed to control the growth of the data base and insure maximum utilization of the on-line devices available. The third section describes the results of operating with the system. The fourth section details future amplifications and modifications to overcome some foreseeable difficulties inthe present version. Finally we summarize our observations and re-state the conclusions we have reached. T SS / 360 data base at T. J. Watson Research Center Since our system first went on a somewhat regular schedule of four-hour-a-day user sessions in June 1968, it was clear that, even under these conditions of relatively low availability, managing the on-line storage was going to be one of our primary problems. The amou.nt of on-line storage occupied by user data sets at that time was approximately 20,000 pages, or 80,000,000 characters (1 page = 4096 characters or 8192 hexadecimal digits). It was a matter of a few months before the amount rose to what is our working optimum, 30,000 pages or 120,000,000 characters. This o~timum is dictated by the maximum number of devIces we wish to devote to on-line storage. The distinction between devices and volumes should be made clear. A volume is a unit on which data are actually recorded. There are in principle large numbers of volumes available. A device is a unit on which a volume is mounted and which carries out the transmission of data to and from the volume. Devices are necessarily limited in number. A tape reel is a volume; the tape drive is a device. To return to the data base, observations made at the time indicated that perhaps 10-20 percent of this data was non-useful. Examples of this are data sets defined but not used and never erased, output listings of assemblies and compilations done many days previous to the current date and other such system- and user-generated residues. l\1easures were devised to periodically and systematically remove such unwanted data from the on-line storage, thereby achieving a small amount of leeway while the problem was being further studied. In an effort to acquire information on the usage of the data base, we implemented a means of marking a data set with the date on which it was used. Report programs were written to process the data thus re- 433 From the collection of the Computer History Museum (www.computerhistory.org) 434 Fall Joint Computer Conference, 1969 corded and the results ""-ere very informative to management and system programmers alike. Extracts from a typical report are presented in Figure 1. Among the facts which can be determined from such reports are the names of the authorized users actually using the system currently, how much storage each user is occupying, how much he is using, and how the amount of storage used by e~ch user varies from observation period to observatiotl period. The total amount of on-line storage that is being currently used by all users is also recorded. h~ addition, the data recorded can be processed to yield an on-line storage profile, as shown in Figure 2. For instance, in the reports formulated from data gathered on February 1, 1969 discovered that of our 160 or so authorized users, some 50 had actually used the system since the beginning of the year. We also found that most of these 50 were not actually using all of the storage they were occupying. In one case, up to 95 percent of the storage of a partiCUlar user had not been used during the period. In total we discovered that of some 28,000 pages of storage on the system only 13,000 pages had been used in the last month. These figures were based on information recorded after all the "vaste space occupied by obviously 70 60 <!) Z 50 OUI Ul W 40 i ~~ ::lZ 30 ~ 2 20 N' no. of mil of on -line storage owned by user NOTE· The 4 users who own more than 1200pages each own about 40% of the available on-line storage we Figure 2a--( )n-line ,.;torage owner;-;hip profile 60 o ~ ~o ~~o 40 ;~30 ~~20 o ¥ 10 z Date data rr.corrled 4/4/G9 Date for 3/2/6!9 co~parison Q~ 0.6 0.7 0.8 M 1.0 Q. amt of on-lInl 1 . 1 UHd by UHr bit. 312 - 4/4/1)9 amt. of on-lInl Itoralll allocatld to UHr on 4/4/69 NOTE· Thlnumblr at thl top of each column II thl fraction of on -line Iforav, owned by UHr, In that catlVory Total ~~~es Use~ Since 3/2/69 Total Papes In System on USERID 4/~/69 PAGES USED 14441 Figure 2b-On-line storage mage profile 27662 TOTAL PAGES USEROI 1588 1751 USER02 11 18 USER03 o 7 USER04 263 Figure I-On-line storage reports 431 useless data had been reclaimed. Similar data are being recorded periodically to monitor in a limited way the interactions of the users with the system. The amount of available on-line storage is recorded every time the system is loaded into the machine, a, process which takes place three or four times in a fourteen hOUlr day. The usage characteriBtics are recorded much less frequently, perhaps once or twice a month. Thus far1 observations on this I'omewhat expanded time scale have been more than sufficient.to give evidence of imminent difficulties in the matter of on-line storage . Even though these mea~mrements were made under conditions of limited availability, they gave clear indication of the existing problems involving the manage-· ment of the on-line data base and the control of its size. We realized at an early I'tage that unless some steps were taken to reduce the amount of data m::tin·- From the collection of the Computer History Museum (www.computerhistory.org) E,stablishment and Maintenance of a Storage Hierarchy tained on-line, it would be impossible to operate the system in our user environment with the on-line storage capacity then available. As indicated earlier, the problem is by no means unique to our installation. Various means have been adopted to handle the problem of controlling the size of on-line data bases. One installation requires that each user validate every file he wishes to retain once in every twenty-four hour period. Unvalidated files are erased. Another approach, similar in some respects to the one which we will describe, is the "Date Deletion" scheme which has been in effect for some time on the Compatible Time Sharing System at lVlassachusetts Institute of Technology.1 Since we felt at that time that we did not want to place the primary burden of storage management on the user, we looked for some systematic way of restricting the amount of data stored on-line. We wanted to combine ease of operation with convenience for the users. It seemed clear that a potentially vast condensation of the on-line data base could be achieved by systematically moving unused data sets from on-line storage to demountable storage volumes. The underlying assumption would be that the overhead involved in restoring data sets that might be required by the users would be small compared to the advantages to be gained by being able to reduce the amount of online storage required at anyone time. There were no observations of actual data set usage available to verify such an assumption, or to support any alternative, so we proceeded to implement a simple design to alleviate in part our pressing problem, and also at the same time to provide the experience necessary to evaluate the underlying assumptions. This "data migration" scheme is described in the following section. Management of the on-line data base Because of the limited amount of on-line storage available, it appeared necessary to us to establish a hierarchy of storage volumes, ranging from high-speed permanently mounted direct access volumes to lowspeed demountable magnetic tapes. The establishment of such a hierarchy immediately implies a mechanism for distributing data among the various classes of volumes according to some predefined or even dynamically defined criteria. Initially; in TSS/360 three categories of storage volumes suggest themselves: first, on-line direct-access volumes; second, off-line direct-access volumes, which would require mounting to enable the retrieval of information from them; and third, tape volumes, which would require mounting and, of course, have a .lower 435 data transmission rate than the direct access volumes. The first category comprises what are described in the TSS/360 system literature 2 as public storage volumes. Categories two and three are handled by TSS as private storage volumes. In our discussions below, the term "archival" storage will be used to refer to storage volumes of categories two and three which are processed by the migration scheme. As far as the rest of TSS/360 is concerned, these volumes constitute a subset of the general class "private storage volumes". The criteria to be used to govern the arrangement of data among the categories of volume are obviously the subject of wide differences of opinion. We have been limited in our considerations of this topic by the information that can be collected on our system about the usage of individual data sets. We have chosen to base ourc riterion on the information mentioned in the first section, i.e., the date on which the data set was last used. Specifically, a data set is useful or not depending only on the length of time since its last use. This is admittedly a very simple basis for judgment but for the moment it is what is available. Alternatives will be discussed briefly in the fourth section. The scheme has been designed to enable easy inclusion of other migration criteria as they are deemed necessary and the required information becomes available. It has been implemented in the form of seven commands and an auxiliary data set, which records the status of the data sets moved to archival storage. The commands are RMPS, MPS, EMDS, LMDS, RMDS, SAVE, and CMS. The data set is called SYSMDS. A brief description of each of these commands and the data set follows: RMPS-Recreate and Migrate Public Storage This command and the one that follows, MPS, are modifications of the TSS/360 system command, RPS (Recreate Public Storage).3 The RPS command is used to copy the contents of current public storage, one volume at a time, onto a new set of public volumes, leaving behind in the process useless data sets and producing a new system with cleaner public storage. The RMPS command adds the criterion of currency to the criteria of usefulness already in the RPS command. If a data ·set fails this test of usefulness, instead of being copied onto the new public storage it is copied or "migrated" onto an archival volume and cataloged. In addition, relevant data regarding this "migration" are recorded in a special data set called SYSMDS. The format of this data set will be discussed later. The fact that the data sets which have been moved to archival storage are cataloged requires some eluci- From the collection of the Computer History Museum (www.computerhistory.org) 436 Fall Joint Computer Conference, 1969 dation. Having these data sets cataloged is important so as to prevent the duplication Of data set names on public storage and on archival storage. Only one entry for a given data set name may appear in the catalog for each user. Private volume handling, however, is a sensitive area of TSS/360. If' a user requests that a private volume be mounted and hfs request is granted, the device on which the volume :is mounted remains assigned to him until he specifically releases it or terminates his session. Thus, if the u~er were allowed to directly access an archival volume simply by requesting one of his migrated data sets frqm the catalog, this volume could well remain mounted on the device for several hours. This would render it almost indistinguishable from a public volume, and defeat the purpose of the migration scheme. We have avoided this by specifying the first three characters of the volume identification of all our archival storage volumes as 'SAV'. A' minor modification to the system prohibits the user from directly accessing any volume whose identification! begins with these three letters. Commands described below perform any service he may require which involves these volumes and always release the volumes, thus freeing the device, as soon as the service has been performed. This assures that devices will be in use as little as possible for purposes dealing with the handling of archival storage SYSMDS-Mig'rated Data Set Record It is appropriate at this point to discuss in some detail the SYSMDS data set. It is an indexed sequential data set with an entry for each migrated data set with the data set name as key .These entries have ~he format illustrated in Figure 3~ As well as being a record of the migration, the information stored in SYSMDS is also sufficient to recreate the catalog entries for the data sets moved to archival storage. In addition, there is an entry for each of the archival storage volumes. Included in these entries are the amount of available space on each volume, the number of pages to be erased, the numbtr of erase pending data sets, and the total number of data sets on the volume. These entries have as key. the nine characters 'ZZZZZZZZ.' followed by the six character volume identification. There is also a record containing the total number of archival pages erased and the total number restored to on-line storage. The information is all in EBCDIC characters so as to make it available by simply printing SYSl\fDS. An up-to-date copy of the SYSMDS data set is made after each modification and stored on the system residence volume (to insure LOCATION CONTENTS o-~~ Data Set Name ~9-50 Data Set Organization 53-56 ~umber 59-6~ D~te (Sequential,Partitloned,etc.) of P~ges(Slz~ Created of the Data Set) -'DDD/YY' (010/69 indicates the tenth day of 196 0 ) 67-72 Date Last Used 75-80 Date Migrated 83-88 Archival Volume 91-9~ Archival Volume Type 97-99 File Sequence Numher(for tape volumes only) 102 'Eras~ I~entification Pending' (has the val~e 'y' for Yes and 'N' for No) rlOTE : For a full discussion of TSS/360 terminoln~y please consul Reference 2. Figure 3-Format of the data set entry in SYSMDS continuity between successive versions of public storage). MPS-Migrate Public Storage This command differs from Rl\fPS in that when it operates on current public storage it moves only those data sets which fail the test of currency. They are moved to the appropriate archival storage volume and the copies in public storage are erased. Appropriate entries are made in the SYSMDS data set. This command can also be applied to archival direct access storage volumes, producing an additional level of storage on tape, creating the three-level structure described earlier. Again, the entries in SYSIVIDS are amended to reflect the changes brought about by the execution of the command. "l"he two commands RMPS and MPS are the primary means bv which out-of-date files are moved 1rom noline to a~chival storage. The next group of commands is concerned with enabling the user to examine the contents of archival storage and modify the number and status of his files which are stored there. LMDS-List Migrated Data Sets This command enables the user to determine which of his data sets have been moved to archival storage" From the collection of the Computer History Museum (www.computerhistory.org) Establishment and Maintenance of a Storage Hierarchy In addition to the name of each data set, information such as its organization, 8ize, date last used, date migrated, etc., is provided. EMDS-Erase Migrated Data Set This command enables the user to specify that a data set of his which is on archival storage is to be erased. This command simply marks the appropriate entry in the SYSIVIDS data set as "erase pending" for subsequent processing by the Cl\,fS command (q.v.). The data set name is not removed from the catalog until the actual erasure on the volume has been carried out by the CMS command. The user may specify that either a specific data set or all his data sets on archival storage are to be erased. RMDS-Restore Migrated Data Set This command enables the user to bring about the return of a data set from archival storage to on-line public storage. The process occurs while the user waits. The data set is copied from the appropriate archival storage volume onto on~line public storage and the copy on archival storage is erased. Appropriate entries in the SYSMDS data set are amended to reflect the results of this operation. The archival storage volume is then released, making the device again available for allocation. SAVE-Put A Copy Onto Secondary Storage This command enables the user to specify a data set as one to be migrated at the next execution of thp. RMPS or lVIPS command. The maintenance of archival storage is carried out by the use of two commands. The first, MPS, discussed above, can be applied to the demountable direct-access 'SAV' volumes to produce a second level of archival storage consisting of data sets whose last use is more remote in time than those on the first, direct-access level. These would generally be stored on tape. The second maintenance command is the CMS command which will now be described. CMS-Clean Migrated Storage This command examines data set entries in the SYSMDS data set for the "erase pending" flag set by EMDS to indicate that the corresponding data set is to be erased. The data sets are erased if they are on direct access volumes. In any case, the entries for the data sets in SYSMDS are deleted and the appropriate volume entries are amended to reflect the results of 437 these transactions. If the number of valid data sets on a tape volume becomes zero, the tape is released or made available for further use for migration. Results observed after 'migration The first migration 'vas carried out on ~\Iarch 10, 1ge9 in the process of converting our system from Version 2.0 to Version 4.0 of TS8. The criterion used was that a da,ta set should have been used since January ], 1969 to remain in on-line public storage. Operating problems prevented the processing of two of our six public volumes at that time. In the ensuing month an additional 3,000 pages ,vere moved to archival storage. I t should be pointed out that if the amount of data which was moved to archival storage had been returned completely to the current on-line public storage, we would not have had enough devices available to COIltain it. Thus the project did not simply justify itself; it proved essential to the continued life of the system. Since that time the process has been carried out at approximately one-month intervals. The status of online storage as of July 1, 1969 is reflected in Figure 4 which is presented for comparison with Figure 2. It can be seen from Figure 4 that the overall characteristics of the on-line data base have not changed a great deal in the intervening three months. There are about twenty more users owning data sets on-line than there were in April, but the ownership profile remains almost exactly the same. Figure 4b reveals a noticeable increase in the degree of utilization of on-line storage. This is indicated on the whole by the increase in the value of Q, the utilization quotient, calculated for all users, and in detail, by the shift toward higher values of Q, especially visible between Q=0.7 and Q= 1.0. Figure 5 contains similar information for the total storage on the system, i.e., on-line storage plus archival storage. This total storage is what ,vould have to be stored on-line in the absence of migration, assuming there were enough devices to do so. The total storage, thus defined, is about 51,000 pages, of vvhich some 32,000 are on-line and about 19,000 are archival. One can observe that the shape of the total storage ownership profile (Figure 5a) is very similar to that of the online storage profile. Figure 6 gives an idea of how the amount of storage occupied is divided between archival and on-line. Looking at this figure, one should be aware that there are thirty-six users who have no on-line storage, and thus cannot be classified as active. They are taking advantage, consciously or unconsciously, of the archival property of the migration volumes and leaving all their data stored in this fashion It should be pointed out that we have From the collection of the Computer History Museum (www.computerhistory.org) Fall Joint Computer Conference, 1969 438 110.--------------'--------100 f3 ~ 9 ~ 80 (!) ~ 70 ~ oe/) 60 a:: ILl ~ 5 IL. ~ o 40 Z I~ 30 :::) N • NO. OF PAGES OF ON-LINE STORAGE OWNED BY USER Figure 4a-On-line storage ownership profile on July 1, 1969 8o.-----------------------~ ~ r;: 10'" Figure 5a-Total storage ownership profile on July 1, 1969 QALl. USERS· .595 (3 loLl B- 1I0,-------------------------------·--~ e/) 20% ~o50 100 i~ ~1oLI 23% , SALL USERS· .372 90 ~3 :::)~30 lL. o d z z• 20 10 o ~~--~~~~~~==~+-L-~---L--~ 0.1 0.2 0.3 0.4 0.5 0.6: 0.7 0.8 0.9 1.0 Qa AMT. OF ON~INE STORAGE USEd BY USER BET. 6/4-711169 AMT. OF ON-LINE STORAGE ALLOCATED TO USER ON 711169 NOTE: THE NUMBER AT THE TOP OF EACH COLUMN IS THE FRACTION OF ON-LINE STORAGE OWNED BY USERS IN THAT CATEGORY , 10% z 20 Figure 4b-On-line storage usage profile on July 1, 1969 made no effort to encourage our users to police themsel~es in their use of on-line stor~ge. Thus these figures must not be considered as reflecting what storage space the users need, but rather what they will occupy and use if they find it available. An accounting procedure is being instituted which may result in reductions by the users of the amount of' on-line storage they occupy. This approach has been used with success in other applications, e.g., at Stanford University.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 S. AMT. OF STORAGE USED BY USER BET. 6/4-7/1/69 AMT. OF TOTAL STORAGE (OtH.INE+ARCHIVAL) AL.LOCATED TO USER ON 711/69 NOTE: THE NUMBER AT THE TOP OF EACH COLUMN REPF~ SENTS THE FRACTION OF TOTAL STORAGE OWNED BY USERS IN THAT CATEGORY Figure 5b-Total storage usage profile on July 1, 1969 From the collection of the Computer History Museum (www.computerhistory.org) Establishment and Maintenance of a Storage Hierarchy 439 not sufficiently distinguish between the user who occupies a large amount of on-line storage and the user who has a much smaller amount allotted to him. When migration takes place either one may find that his data sets have been migrated, and in fact the smaller user may find that more of his data have been moved to archival storage than the large user's. Amplifications and extensions-the evolution of migration There are several areas in which improvements are projected. These might be stated as goals in the implementation of a good migration scheme. o~~--~~~~--~~~~~~~~- ~ ·:: ~~c:~~~:::c.=~fs~::: ~~~':~ USER ON 711/69 Figure 6-Archival storage/total storage ratio Distribution PAGES ON-LINE 32,000 PAGES MIGRATED 19,~00 PAnES RESTORED 900 PAGES ERASED 1100 Figure 7-Status of storage as of July 1, 1969 The status of migrated storage as of July 1, 1969 is presented in Figure 7. The small fraction of the migrated storage that has been restored to on-line storage is a favorable sign for the continued success of the approach. Not at all surprisingly, in our experience with the operation of the migration scheme, several drawbacks have beoome apparent. For instance, one of the more valuable features of TSS/360 is that it allows users to share files with one another. This is made possible by links established in the system catalog between the directories of the individual users sharing the files. Under the present migration scheme, it is not possible for these links to survive the migration or restoration process. Thus after a migrated data set has been restored to on-line storage, the users sharing it have to re-establish the linkages which make the sharing possible. Another shortcoming from the user's point of view is that he is made aware of the existence of migration whenever he attempts to re-activate a file that has not been used recently. A separate action is required to make his file available to him once more. Also the criterion for migration is too simple to satisfy either the system manager or the user. For the manager, it is too easily circumvented, while for the user, it does a. Migration should be transparent to the user except for the wait involved while a data set is restored to on-line storage. No action of the user other than his ·wish to use his data set should be required to activate the restoration process. b. There should be reasonable criteria for migration and the information necessary to evaluate them should be available. c. There should be a migration 'monitor' to determine the extent to which migration is necessary based on the condition of public storage, the amount of storage available, etc. In addition, based on system load, the monitor would schedule the migration process so as to have a minimum impact on system performance. Weare attempting to address a. and c. in a unified way. The first step is, of course, to allow the migration routines to be invoked by other programs as well as by commands from the terminal. Then the transparency problem can be handled by having the routines which supervise the user's access to his on-line data sets recognize that a data set has been migrated and initiate the process of restoration of the data set to online storage. The next step will be having the migration to archival storage activated by a routine which from time to time monitors the state of on-line storage and determines when more on-line space is required. Thus the necessity of programmer or operator intervention to initiate the migration process will be eliminated. In a parallel effort additional information ou the usage of data sets and on-line storage will be acpumulated. As a simple example, we intend to add to the 'date last used' which we now record on the data set, information about the frequency of use of the data set. We hope then to be able to form reasonable judgments about which data sets to select for migration on the basis of this additional information. We also expect to take advantage of accounting routines to acquire infor- From the collection of the Computer History Museum (www.computerhistory.org) 440 Fan Joint Computer Conference, 1969 mation about the users and their use of the system. By accumulating as much information as possible we will be able to formulate more and more reliable criteria for the usefulness and currency of on-line data sets. SUMMARY In summary, we have seen that the size of the TSSj360 on-line data base increases rapidly with use of the system. Since a limited amount of on-line storage is available, it is necessary to control this growth. Observing that at any time much ~n-line information is not being used, we have formulated a systematic method of allocating data sets to on-line or archival storage based on some criteria of usefulness. The elementary scheme put into operation at our installation has proven of great value,in containing the online data base while giving the users an environment in which to expand their applications and use of the system. We have come to several conclusions regarding the maintenance of our on-line data base which we rest,ate here. 1. Some means of controlling the size of the online data base is absolu~ely essential for the continued operation of the system in our environment with our limited amount of on-line storage. 2. On the basis of our experience thus far, it is sufficient to examine the usage of data sets on a weekly basis or even less frequently to keep our on-line data base of •manageable size. We do our cleaning up operations at approximately two-week intervals, with migration being carried out when necessary ~o reduce the size of the on-line data base to the desired value. 3. It appears that the amouht of space gained by moving less used data sets to archival storage more than repays the effort involved. Most of the data moved to archival storage have stayed there. This is in part an indication that the criterion we have used for migration is able one, at least for our installation. H reason- We intend to expand this scheme to make it as unobtrusive as possible while still continuing its work of limiting the size of our on-line data base. In addition we will continue accumulating information on the characteristics of our users and their interactions with the system so as to formulate the most significa,nt criteria possible for migration. ACKNOWLEDGMENTS The authors would like to gratefully acknowledge stimulating discussions with Ronald M. Rutledge and Albin L. Vareha of Carnegie Mellon University and Lee Varian of Princeton University. We are also indebted to Barry Hazlett of IBM Pittsburgh for the implementation of the CMS command. REFERENCES MIT Computation Center The compatible time-sharing system: A programme~'8 guide MIT Press 196326-29 Cambridge Ma:5.'i 2 W T COMFORT it computing system desi(n for user service Proc FJCC 1965 Spartan Books Wash D C C T GIBSON Time-sharinc in the IBM system/360 Model 67 Pmc SJCC 1966 Spartan Books Wash D C A S LETT W L KONIGSFORD TSS/360: A time-shared operating system Proc FJCC 1968 Thomp'·;;on Book Co Wa'3h D C TSSj360 concepts andfacilities IBM Document C28-200:3 IBM Corp 1968 3 TSS/360 system programmer's yuide IBM Document C28-2008 IBM Corp 1968 4 N NIELSEN Flexible pricing: A.n approach to the allocalinn of CD.y,Y/,pute r re80urces Pmc FJCC H)68 Thomp'~on Book Co Wash DC 5 R C DALEY P G ~EUMANN A general purpose file 8y8tem for sacondary storage Proc FJCC 1965 Spartan Books Wash D C From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2026 Paperzz