CPU-utilization and secondary-storage performance-The demand for a new secondary-storage technology by PETER SCHNEIDER Siemens AG Munich, Germany computer system, with the costs of the secondary storage, in addition to those of the main memory, representing an appreciable share of the total storage system costs. Thus, if a new technology, such as the charge-coupled device or the bubble, is to be implemented in the storage system, it must ensure a more favorable price/performance ratio of the overall system, i.e., either reduce the costs for an unchanged system performance level or substantially enhance the system performance level for a constant or rising cost burden. In the future the performance aspect will become ever more dominant, which may eventually give rise to a situation where it is not only desirable but indispensable to adopt new technologies in the secondary-storage environment. ABSTRACT Previous studies investigated the usability of charge-coupled devices (CCDs) in the context of economies to be achieved in main memory capacity. In systems with virtual memories, such economies result from the use of fast paging devices. In spite of the concomitant savings in main memory costs, which will probably be eaten up by the costs of the new-technology paging device, the price/performance ratio must be expected to be less favorable than that of conventional systems, since the share of the operating system software required for page fault handling will increase. More recent research has shown, however, that not only the degree of multiprogramming, i.e., the number of processes required to cover an 110 time interval, is a factor of crucial importance for optimum CPU utilization, but also the number of disk devices available in the secondarystorage system: unless the number of storage devices is large enough to handle, within an 110 time interval, at least as many parallel 110 operations as are needed to ensure that a sufficient number of processes are again ready for busying the CPU, the aim of full utilization of the CPU cannot be achieved-not even through a higher degree of multiprogramming. With disk devices of ever higher recording .d.cosities but ,other.\:vise. Jlt!al~,(J)tlstant~lfQanan~e APPLICATION OF NEW MEMORY TECHNOLOGIES Some of the existing computer systems built on the virtual-memory concept have their secondary storages split up into two functionally separated sections (Figure 1): one section, the paging device, is used for storing the programs which were started by the users connected at a given time. At run time, these programs are loaded from the movinghead disks of the other secondary-storage section, the file ihertiorY, into tl'l,rptging (1cvkc,-'!1'rc ternary :;tcrnge,;,,:ttt:h is also shown, serves as a long-term archival storage medium and will not be further considered in this paper. The main memory contains the current pages of the currently active processes. This set of pages belonging to a process is called the "working set of pages."l The main memory can thus be said to function more or less as a buffer for the paging device: the individual programs will run without interruption only for such a time as their active environment does not change. If a page is missing in main memory (page fault), it has to be fetched from the paging device. As this transfer takes several milliseconds, the requesting process is put in suspense and the CPU turns to another program that can keep it busy. As prior analyses of buffer systems 2,3 have already shown, the page fault rate diminishes with increasing buffer (in this case main memory) capacity, the optimum being reached when the main memory capacity is equal to the paging device capacity: all data becoming available, fewer devices than today will be required in the future to store the on-line data file volume. Since fast, favorably priced central processing units are likewise becoming available, it must be expected that the future systems, unlike the systems of today, will for the first time be beset with the problem of input/output bottlenecks arising from an insufficient number of storage devices. Rather than attempting to achieve the required 110 data rate through a sufficient number of devices operating in parallel, use should therefore be made of such devices as CCD storages in secondary-storage hierarchies, which offer themselves as the less costly solution to the problem. INTRODUCTION At the present state of the technology, memory system costs remain the dominant factor in the overall costs of a 819 From the collection of the Computer History Museum (www.computerhistory.org) 820 National Computer Conference, 1977 PERIPHERY could be allowed to exceed those of a conventional paging device by exactly the amount that corresponds to the costs of the main memory portion saved. A weakness of this avenue of thinking lies in the fact that it neglects two aspects of existing systems: PRIMARY MEMORY MAIN MEMORY PAGING DEVICE IFIXED HEAD DISKS, DRUMS} SECONDARY MEMORY FILE MEMORY (MOVABLE HEAD DISKS) ARCHIVAL OR TERTIARY MEMORY [TAPES} Figure I-Memory hierarchy of a data processing machine programs will then be main-memory-resident so that the paging rate after the initial phase becomes zero. In such systems* there will occur, in addition to the mentioned paging 1I0s, file I/Os whenever a running program requests file references. In these cases too, the requesting program will be put in suspense and another one ready to busy the CPU processed instead. Moreover, the main memory is not to the same degree used as a buffer for file 1I0s as it is for paging 1I0s, since the question of a file locality corresponding to the working set of pages for programs has not been studied thoroughly enough so far. Many of the earlier studies 5 ,6 on the use of CCD and bubble devices were focused on the possibilities of increasing the speed of page transfer in paging, it being assumed that there would be no bottleneck for file I/Os or that all I/O activities could be handled by paging I/Os. Taking this as a starting point it was demonstrated that, granting an equally high utilization rate of the CPU, the use of a fast paging device, e.g., one in CCD technology, would clearly stand a comparison with a paging device in conventional technology (magnetic drum or fixed-head disk). A shorter access time (sum of latency time and actual transfer time) would enable the hit rate required in the main memory for achieving the same CPU load to be smaller, which means that the main memory capacity could be kept smaller than for systems using conventional paging devices. Thus, the costs of a paging device implemented in the new technology * There are also virtual systems where all 110 are handled by paging,4 1. Such a deliberate increase in the page fault rate will be accompanied by increased operating system overhead for storage management. This implies that with the CPU loaded to capacity (and this is indeed ensured by the fast page transfer) there will be less time available for processing user programs. From the user's point of view the price/performance ratio might therefore appear degraded in comparison with that offered by the known and tried technologies. 2. The limited channel data rates will not at all accommodate page transfer times as short as those realized by, say, a CCD paging device. Bearing these two limitations in mind and considering further that the existing systems exhibit a more or less wellbalanced performance level of all system constituents, which makes for a good CPU utilization (over 90%) it is unlikely for new-technology memories to be integrated in existing systems until the cost picture becomes more favorable. It is therefore a stringent conclusion that an improved price/performance ratio for existing systems can only be attained if the costs of the new paging device technology are absolutely lower than those of the conventional technology. The aim of this paper should be seen in an attempt to demonstrate that development trends already recognizable in the CPU and the moving-head disk fields indicate that in future the secondary storage-but now for reasons of performance-will present a sure area of application for the CCD memory concept. For this reason, the first question discussed will be that of the variables exerting an influence on CPU utilization. CPU UTILIZATION AND INFLUENCING VARIABLES This investigation was performed with the aid of a simulation model (cf. Figure 2) of the system whose schematic is shown in Figure 1. Waiting in front of the CPU is a queue of processes which are successively served by the CPU. The maximum length of this queue is a function of the degree of multiprogramming realized. In the model, too, the secondary storage is split up into a paging device and a file memory. When a paging 110 or file 110 request initiates a process change at the CPU, the relevant request is queued at the device addressed. Upon completion of the 110 operation, the associated process is again lined up in the CPU queue. The number of processes in the systems is constant; it corresponds to the degree of mUltiprogramming realized. The model is subject to the following constraints, which From the collection of the Computer History Museum (www.computerhistory.org) Demand for New Secondary-Storage Technology 821 100 0/0 80 j 70 r 50 z: :: 50 ~ ~ 40 ~ 3000 INSTRUCTIONS LL.. Z o 30 ~ 20 = ~ 10 a O~-r-'-'~'-'--'-'--.-'--r~--~~-r---5 12 18 24 30 35 42 48 54 60 6'5 72 78 84 x 100 INSTRUCTIONS - Figure 3-Distribution function of program processing time between 1/0operations (paging I/O and file 1'0) Figure 2-Simulation model used for the performance evaluation of data processing systems represent better conditions than are encountered in actual systems: (a) Each of the connected devices has its own independent interface to the main memory. Collisions will therefore occur only in cases where two or more requests compete for the same device. (b) File I/O requests and paging I/O requests are evenly distributed among all file memory devices and paging devices respectively. In reality, it may happen that, de.pen.dicg Oll .the file allocat.i.ou, certain.. devices afe accessed much more frequently than others. (c) File I/O and paging I/O requests have an equal share of 50% in all causes of process change. Other causes, such as time slice runout or terminal I/O, are neglected. Real systems are more likely to exhibit a predominance of file I/O operations. In order to establish realistic paging I/O and file I/O rates within the simulation system, the distribution of the process run times between process changes was determined at a time-sharing service computer center during open-sessions on different days and at different times. The measurement revealed a mean distribution of process run time lying at about 3000 instructions; Figure 3. This distribution was adopted for the simulation model. The model itself was built with the aid of the SIAS (SIEMENS ABLAUF VERFOLGER) simulation language, which is equivalent to IBM's GPSS language. The investigation was carried out on systems with differing numbers of disks, with and without an explicit paging device, and with differing CPU performance levels. The following sections summarize some results obtained for systems with unlimited numbers of devices, with 4 and 10 devices, and with varied degrees of mUltiprogramming. Except for the results shown in Figure 7, where the duration of I/O operations DI/o (made up of the latency time and the transfer time) was varied, all results given here are based on an assumed 34 ms for the duration of I/O operations. SYSTEMS WITHOUT EXPLICIT PAGING DEVICE For the sake of a clear and straightforward discussion, we shall first deal with the results of the simulation of models without an explicit paging device. Systems with unlimited numbers of devices Ifthe numbe~ of deyic,e~ is,l1o! limit~~, ~n input!<mtpu! bottleneck cannot arise in a computer system. The utilization level of the CPU is, in this case, determined solely by the degree of multiprogramming; Figure 4. The necessary degree of multiprogramming M should be equal to the quotient oc= DIlo+L L with DI/o representing the duration of an I/O operation and L the mean run time of processes between I/O operations. At the degree of mUltiprogramming (M = oc) denoted by this quotient, the CPU reaches a utilization level of 100%: any further increase of M is harmful and will lead to the wellknown "thrashing" effect. Looking at the CPU utilization level from this angle, a secondary-storage configuration can be regarded as being sufficiently large the instant the number of devices becomes greater than the theoretical degree of multiprogramming; this is so because at this From the collection of the Computer History Museum (www.computerhistory.org) 822 National Computer Conference, 1977 Ucpu Ucpu 100 100 01/0 =34 ms % r :z: « t 70 = 60 i= 60 :::; i= = = a... I ~ 70 :z: <l: I'-J ::::> 50 50 40 ::::> 1"--.1 ::J i= 34 ms 80 80 0 ~ 01/0= 0/0 a... t...) 40 30 30 20 10 2.4 MiPS 4 DEVICES ?n ~~1jI i i i I 10 20 30 40 i ., 50 MPO DEGREE OF MULTIPROGRAMMING ~ 1 10 20 30 40 DE GREE OF MULTIPROGRAM MI NG 50 • MPO Figure 5-CPU utilization as a function of the degree of mUltiprogramming with a limited number of devices • Figure 4-CPU utilization as a function of the degree of mUltiprogramming without limitation of the number of devices instant there exists a balanced ratio between 110 request and 110 terminations, which prevents the CPU from idling. It can further be seen from Figure 4 that a constant distribution of process run time but differing CPU performance levels will produce pronounced differences in the necessary degree of multiprogramming-a fact that is immediately evident from the above analysis. The study of this simulation model also revealed that systems with a high CPU performance level and a CPUbound load are equivalent to systems with a correspondingly lower CPU performance level but IIO-bound load. This, too, is immediately evident: as far as the CPU utilization level is concerned, it does not make any difference whether the program run time between 110 operation is taken up by a great number of fast-executing instructions (fast CPU, CPU-bound programs) or by only a few slowexecuting instructions (slow CPU, IIO-bound programs). This holds true also for all further cases considered. initial phase, 110 request from all processes will be competitively bidding for access to the secondary storage and be queuing up to be served. During an 110 time interval, it is possible to handle, at best, as many I/O request as there are devices. To ensure a high CPU load, the minimum number of devices to be provided** must be equal to G=~=CX:-l. It can here be stated as a general rule that the crucial determinant for the CPU utilization level is the minimum of the degree of mUltiprogramming and the number of devices. The following greatly simplified model was established: • The process run time L was considered constant. • Every 110 request bids for access to the next device capable of servicing the request. • These prerequisites granted, the idle time of a processor is given by: TIDL=DIO-min(M-I, G)'L for M::;cx: and G::;CX:-I, and the CPU utilization deduced therefrom, by: Ucpu=min(l, M/cx:, (G/a-l). Systems with a limited number of devices To show the strong influence exerted by the number of devices on the CPU utilization level, the graph in Figure 5 plots secondary-storage configurations with 4 and 10 disks and a mean block transfer time of 34 ms. Again, a distinguishing criterion is the different CPU performance level, which, as mentioned above, can also be interpreted as more or less pronouncedly CPU-bound load. What becomes apparent here is an effect similar to that observed when the degree of multiprogramming is too low: if the number of devices is too small, a full-capacity utilization of the CPU is impossible. Even very high degrees of mUltiprogramming are of no avail then, since, after the A utilization curve calculated from these formulas is plotted in Figure 5 for the case of 10 devices being connected to a 1 MIPS processor. The curve obtained through simulation approaches this theoretical curve, reaching the theoretical limit value only at a degree of mUltiprogramming of 60. This discrepancy ** If use is made of an additional paging device, paging va time (Dl/o P!) and file va time (Dl/o FlO ) will differ. The number of devices is then given by where h pF and hl/o represent, respectively, the paging and the file From the collection of the Computer History Museum (www.computerhistory.org) va rates. Demand for New Secondary-Storage Technology can be accounted for as follows: (a) The assumption that an 110 request will invariably bid for access to the device next capable of servicing a request is a far cry from reality. On the contrary: even when assuming evenly distributed bids for access to all secondary-storage devices, there will be short-term rushes for individual devices. This exactly is the reason why it takes a higher than the theoretically calculated degree of multiprogramming to achieve full CPU utilization: a high degree of multiprogramming makes it easier to busy all existing devices. (b) The distribution of run time may lead to a premature depletion of the processor queue, namely before a CPU-busying process is made available again by the termination of an I/O operation. Up to this point, the discussion has been restricted to systems in which no separate paging device whatsoever was used. All 110 requests, both paging and file, were handled by the same type of device (e.g., disks). In the following, the influence of a separate paging device on system performance will therefore be briefly analyzed. SYSTEMS WITH SEPARATE PAGING DEVICES The table given hereunder indicates the improvement in CPU utilization which, for a given limitation of the number of devices, is attainable if a separate paging device is used. It has been assumed that with an explicit paging device employed the transfer of one page will take 1 ms, including the latency time. Assuming a mean running length of 3000 instructions and 2.4 MIPS, this is approximately equal to one processing interval of a process between two I/O operations. Viewed from the angle of the processor, page faults are not time-critical, since the mean program run time and the duration of the page fault are of the same order of magnitude. As the simulation model assumes that 50 percent of all L'Q requests are paging 110 requests. which for the processor are no longer time-critical, the use of an explicit paging device can be expected to improve the CPU utilization level by a factor of 2. This is borne out by the results. An interesting result is that as long as there is no explicit paging device available both paging and file I/O requests T ABLE I-CPU utilization for different numbers of disk devices with and without explicit paging device. CPU performance MIPS 2.4 2.4 2.4 2.4 2.4 CPU utilization in % 15.1 30.6 22.8 34.2 69.6 Number of devices thereof for Paging paging device Disks 4 4 10 10 10 4 4 10 Degrees of multiprogramming 30 30 30 30 30 823 should be honored from all disks. Reserving disk areas on some few devices will lead to an overload on these and an underload on all others, with immediate consequences for the CPU utilization level. With 10 disk devices connected and all of them used for paging and file 110, a 2.4 MIPS processor will be utilized to a level of 34.2 percent. If, in contrast to this disk areas for paging are reserved on only 4 disks, the processor utilization level will be only 22.8 percent. SUMMARY OF SIMULATION RESULTS The investigations have revealed that the degree of mUltiprogramming and the number of devices are factors of similar weight as regards the CPU utilization level, but that the number of devices represents the more crucial bottleneck-in other words: if the secondary-storage system is underrated in terms of the number of devices employed in the secondary-storage system rather than in terms of the storage capacity, not even a high degree of mUltiprogramming will be able to raise the CPU utilization to a satisfactory level. It has further been demonstrated that, on the basis of the measured program run-time distribution values, powerful systems of, say, 2.4 MIPS without any device bottlenecks would require a degree of multiprogramming whose attainability appears doubtful. This brings into view the first one of the set of causes which in the future will make it necessary to use faster secondary-storage technologies than today: Always assuming a run-time distribution as shown in Figure 3, the ratio between mean program run time and 110 processing time will continuously deteriorate, because ever faster processors are becoming available at ever more attractive prices, while the performance data of the secondary-storage devices will remain comparatively constant. The only factor in the domain of secondary storages that undergoes pronounced changes is the storage density of moving-head disk storages. The ever-increasing storage density of the moving-head dl~~*~ .~~ ~he' ~~t~ f!~ of C,--i!!!PL~!' ':e!!te!'"! ,~,!th fast systems to be accommodated on ever- fewer disk drives. Thus, with the costs of the mechanical section of the disk storage remaining virtually constant, the increased recording density will bring substantial economies in secondary-storage costs. Let us assume that a given set of data files comprises a data volume of 10.000 MB: if use is made of 100-MB disk devices, this data volume requires 100 devices for storing. Using a 500-MB disk storage, however, a mere 20 devices wiJI be needed. It goes without saying that from the point of view of the computer center the latter configuration is the more cost-effective one. This gives rise to a situation where even the use of a paging device with short page transfer times may no longer be sufficient to ensure full CPU utilization. Specifically, this applies to cases where the number of secondarystorage devices is no longer sufficient to accommodate all file 1I0s. Full utilization of the CPU can then only be achieved through an apparent increase in the speed of the secondary From the collection of the Computer History Museum (www.computerhistory.org) 824 National Computer Conference, 1977 storage. This means, however, that provision has to be made for a separate buffer (in CCD technology for instance) for the secondary-storage devices. UCPU heeD 100 MPD=30 90 SECONDARY-STORAGE HIERARCHY Figure 6 shows the schematic diagram of a data processing system with a secondary-storage hierarchy. In contradistinction to many existing systems, the secondary storage should be connected to the main memory through a special storage processor SCU rather than through an 1/0 processor and the central processor (cf. Figure 1).7 In the case of high-performance processors, this will clearly relieve the load on the CPU/main memory interface. As a consequence, CPU references to the cache will no longer be obstructed by 110 activities. The function of the CCD storage within the secondarystorage hierarchy is analogous to that of the cache within the main memory hierarchy. Granting sufficient hit rates, the mean duration of 110 operations will be reduced to nearly that of page transfers. The task falling to the secondary-storage processor is to manage the secondarystorage hierarchy. As soon as the secondary-storage processor finds an 1/0 request directed to it, it ascertains whether the requested page is located in the CCD page buffer, in which case it initiates page transfer between main memory and paging buffer. In the case of a miss, the storage processor assumes responsibility for the transfer of the requested page not only between the disk devices and the CCD buffer but also to the main memory. If the organization of the paging buffer permits, the data units (blocks) transferred between disk and CCD buffer may be larger. Owing to the fact that the CCD buffer capacity is larger than that of the main memory, there is, in the case of an 110 request, some probability PERIPHERY 80 80 t 70 70 I 60 60 ~ 50 50 40 40 ~ 30 30 i= ~ ::::i I ~ ~~t~-~~~'~~~~i--~~~--~~~~I----~. 1 2 3 4 5 TIIO PAGE TRANSFER TIME IN MS ----- Figure 7-(a) CPU utilization as a function ofthe mean page transfer time and (b) corresponding hit ratio necessary for full CPU utilization for pages already used previously to be found again in the CCD buffer (backward hit). Depending on the block size within the CCD buffer, it is also possible for hits in the forward environment of a page request (forward hits) to occur. Figure 7a plots the utilization of a 2.4 MIPS and 1 MIPS processor vs. the page transfer time. It should be mentioned here that, with a sufficiently high degree of multiprogramming provided, the transfer times plotted can, on an average, be exactly attained by parallel operation of slow devices or the use of a fast device of adequate capacity. We shall now deduce an algorithm for calculating the hit rates in the CCD paging buffer required for full utilization of the central processor. The effective access time of storage hierarchies is given by the well-known formula 7 Teff=hl·tl+(1-hl)·t2 where hI represents the hit rates in the buffer stage, tl the access time of the buffer stage, and t2 the access time to the second stage. For purposes of this analysis, these quantities are interpreted as follows: PRIMARY STORAGE Teff = Tllo corresponds to the mean period of time required for handling an 1/0 request. t eeD PAGING BUFFER DISKS Figure 6--Schematic of a data processing system using a secondary-storage hierarchy tl = teeD is the access time of the buffer stage, D ~ corresponds to the mean serving time attained through the parallel utilization of G devices with access time DI/o, hI = heeD represents the buffer stage hit rate to be found. t Two means are calculated for this purpose: the first one for the number of devices, the second one for the hierarchical stages of the memory system. From the collection of the Computer History Museum (www.computerhistory.org) Demand for New Secondary-Storage Technology Full utilization of the CPU is ensured if one 110 is served per run time L. It follows that tl/O i.e., the average time required for handling an 110 request, must be equal to the mean program run time between 110 interrupts. Thus, the hit rate required for full utilization is given by: G'L-D IIo ~~ ~ . b hccD = G. -D (1) or, dlter translormatIon, y tCCD 110 - L-TIIo 2 hccD - t -T (). CCD 110 It can be recognized from this formula that if the number of devices G is sufficiently large the required hit rate becomes zero and a buffer storage is not necessary. Plotted on the abscissa in Figure 7 are the TI/o values. Assuming, for example, that t CCD = 1 ms, the various TI/o values are associated with corresponding hit rates required for full CPU utilization. These are plotted in Figure 7b for a 1 MIPS and a 2.4 MIPS system. As can be seen here quite clearly, it is necessary, as a function of time TI/o and, hence, as a function of the number of devices G, to have relatively high hit rates in the buffer. If use were made of paging devices with transfer times shorter than those assumed here, there would be no need to make such high demands on the hit rate in the buffer. CONCLUSION In view of the ever increasing recording density in disks and the enhanced CPU performance levels, the emergence of a bottleneck in the secondary-storage environment must be anticipated. It would appear that the adoption of a secondary-storage hierarchy, managed by a storage processor, represents a more cost-effective approach than the provision of the devices required for parallel operation. For a constant on-line data file volume, an increase in the number of devices, i.e., a distribution of the files among such a number of devices as are necessary to attain the required 110 handling time, would considerably boost the costs and yet fail to lead to a full utilization of the devices. Even in systems with a large on-line data file volume, the cost situation will give such a storage hierarchy, complemented by a cassette storage, a competitive edge on a system with a large number of disks. The performance data of the CCD technology, to be gathered from the report by Bhandaker", will in any case accommodate the page transfer times which may possibly be required in the system. For this reason, the design goal for CCD modules should be to achieve maximum storage density with its attendant cost advantages, because even in the case of relatively long access times sufficiently high data rates are attainable through a favorable storage organization. REFERENCES l. Denning, P., "The Working Set Model for Program Behaviour," Com- mun. of the ACM, II, 1968, pp. 323-333. 2. Mattson, R. L., "Evaluation of multilevel Memories," IEEE Trans. Magn., Vol. MAG-7, Dec. 1971, pp. 814-819. 3. Meade, R. M., "On memory system design," in AFIPS Conf. Proc. (FJCC), Vol. 37, Nov. 1970, p. 33. 4. Boyse, I. W. and D. R. Warn, "A Straightforward Model for Computer Performance Prediction," Comput. Surveys, Vol. 7, No.2, June 1975. 5. Bhandaker, D. P., "Cost Performance Aspects of CCD Fast Auxiliary Memory," Proc. CCD '75', Charge-Coupled Devices Appl. Conf., 1975, San Diego, pp. 435-442. 6. Pohm, A. V., "CostiPerformance Perspectives of Paging with Electronic and Electromechanical Backing Stores," Proc. of the IEEE, Vol. 63, No. 8, Aug. 1975. 7. Schneider, P., "Working Set Restoration-A method to increase the performance of multilevel storage hierarchies," in AFIPS Conf. Proc., NCC '76', pp. 373-380. From the collection of the Computer History Museum (www.computerhistory.org) From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2026 Paperzz