Computer performance variability by THOMAS E. BELL The Rand Corporation Santa Monica, California MOTIVATION FOR PERFORMANCE CO~VIPARISOXS and disk I/Os performed, and the amount of core occupied. 1 ·When each type of resource is separately charged for, one computer may be much cheaper for one type of job but very expensive for another type. If the charge for I/O is relatively low and the charge for CPU time relatively high, the user would be tempted to run I/O-bound jobs on this computer but to submit his "number-crunehers" to another computer. The user without access to a network may be precluded from distributing his work among the available computers in the most economical manner, but the net,York user has more options to choose from-and more decisions to make. These economic decisions must usually consider each of a number of performance metrics when the rate fur each may be different on each of the available computers. In addition to potentially different rates for each resource, computer centers may employ different functional forms for their billing equations. One of the reasons for such differences is the variety of objectives that they may adopt. Published objectives usually include cost recovery and equitability (or reasonableness) in addition to repeatability.2,3,4 Other objectives may include limiting load grO\vth to avoid the need for procuring a new machine,5 biasing users to employ resources available in excess rather than those in short supply,6 and being able to separate those users with immediate needs (and can afford to satisfy their need) from those who desire cheaper, slower service. 7 The differences in objectives ensure that, as computer centers are increasingly drawn together by nehYorks, users will face more and more different kinds of economic situations to evaluate, and they will find that superficial evaluations will not be adequate Tor choosing the most appropriate node. Since computational speeds often differ behveen the machines, rates themselves cannot be compared; the user must compare the total costs of computing through execution of sample jobs. Either real jobs or synthetic jobs (which use resources in known ways but do no useful computations) can be used in response time and resource charge comparisons. Typically, a user submits a standard job to each of several computers to measure the response time (either interactive or batch) and determine the charges. After finding the values from each candidate node, he picks the one offering him the best ratio of service (response time as ,yell as other services) to During a period when computing on a network is free, users can be very informal about choosing the computer for running their jobs. Certain issues usually dominate this informal evaluation-the convenience of entering jobs, the availability of attractive services, the reliability of the system, and the individual user's familiarity with the system's conventions. These informal evaluations are usually qualitative, but one additional, quantitative characteristic is often included-response time. Response time, in the context of a computer network, may be defined as the elapsed time for responding to a batch-job run request as well as the more common definition of the elapsed time for responding to interactive requests. On some systems, of course, these hvo instances are blended together. On other systems the two are distinct, and the values obtained from measuring them are quite different. On two specific computers a user might find that one provides superior batch response time while the other provides superior interactive response time. If everything else is equal and he has no problems in transferring data from one machine to the other, the user would choose the first for batch executions (e.g., running statistical evaluation programs) and the second for doing interactive work (e.g., editing a report). If installations charge real money for their computing services, another element (money) must be included in an evaluation of alternative computers. Evaluations rapidly lose their informal nature when net,York users find that their choice of computer determines the amount of money that will remain in budgets for paying their salaries. Personnel time lost due to poor response time, bad conventions, or inadequate services must now be traded off against the costs of avoiding these conditions. Computers that might have been unacceptable prior to charging may become optimal after economics has become a factor in decisionmaking. Response time is a single metric, but computer charges are computed from a number of different performance metrics. For example, a bill might be computed from the amount of processor time consumed, the number uf cards read, the number of lines printed, the number of tape l/Os 761 From the collection of the Computer History Museum (www.computerhistory.org) 762 National Computer Conference, 1974 cost. This exercise is therefore critical to the user--since it determines the costs he will incur-and to the node-since it determines the amount of load the center will experience. REPEATABILITY AND VARIABILITY One of the objectives for charging systems is repeatability, the characteristic of producing the same charge from each of a number of runs of the same job. This same objective of zero variability between runs is often implicitly assumed to have been met by users performing comparisons. They run a single job once on each machine and assume that the resulting performance values accurately represent the machine's performance. This is equivalent to the assumption that there exists zero within-sample variability. Therefore, the variation within samples (e.g., several runs on the same computer) can be disregarded in comparison with the variation between samples (e.g., runs on computer A compared with runs on computer B). If the standard deviation (a measure of variability) of run times on computers A and B were always far smaller than the difference in run times on --thetwB- machines, the -within;:.sample -variability would clearly not be significant. Good repeatability would aid users in budgeting their funds for computing as well as help them in -making comparisons. Gabrielle and John Wiorkmvski4 suggest that "a variance of no greater than 1 percent is thought to be acceptable." Probably, they mean that the standard deviation of charges should not exceed 1 percent of the mean. This amount of variability is certainly so small that it would interfere verv little in realistic comparisons. Perhaps performance v~iability is unworthy of consideration; some indication of the problem's actual magnitude is necessary to evaluate its importance. DETERlVIINING THE PROBLEfif ~IAGNITUDE OF THE Synthetic benchmarks are being used extensively in performance investigations. For example, Buchholz' synthetic test jobS has been used by Wood and Forman 9 for comparative performance investigations on batch systems, and Vote lO has employed his synthetic program in evaluating a time-sharing system. With their increasing use and their documented advantages for certain types of investigations, synthetic jobs are a natural vehicle for determining the magnitude of variability. Synthetic job We have used a modification of the Buchholz synthetic test job to determine the magnitude of variability in strictly controlled test situations on IB}\f, Honeywell, and other manufadurers' equipment. The job (as modified) is written in FORTRAN so that it can be executed on a variety of computers and is structured as follows: 1. Obtain the time that the job was given control and keep the time in memory, 2. Set up for the job's execution, 3. Set up for running a set of identical passes with an I/O-CPU mix as specified on a parameter card. 4. Execute the set of identical passes and record (in memory) the time of each passes' start and finish, 5. Compute some simple statistics from the resultant execution times and print both the times and the values of the statistics. 6. If requested, return to step 3 to repeat the operations for a new I/O-CPU mix, 7. Determine the current time; print out this time and the initiation time. 8. Terminate the job. As indicated, the job has embedded data collection in the form of interrogations of the system's hardware clock to record the elapsed time between certain major points in the program's execution. When appropriate, the job also determines the_ acc_umulatedresource usage at each of the major points. The elapsed time within the job can be compared with the time recorded by the accounting system to determine whether initiation/termination time is large enough to require distinguishing between these two measures of elapsed time. In all cases we have observed, the initiation/ termination time has been so large that disregarding it would invalidate many conclusions from performance investigations. The design of the job enables the user to identify the source of certain kinds of variability. Since repeated passes are individually timed, variability that arises from within the period of job execution can be identified. By running the job in a number of different situations other sources of variability can be identified. This job can be used directly to evaluate variability in batch systems, and can be submitted remotely to evaluate variability in remote job environments such as the time-sharing system we investigated. Interactive responsiveness While a synthetic job can be used to evaluate the system's response to user-programmed activity, it is not adequate to investigate highly interactive activity like text-editing. The latter type of system is usually investigated by using scripts; the user first makes lip a, list. of commands and then determines the time required for him to complete a series of interactions based on the list of commands. Unfortunately, this approach precludes identifying the source of variability if it arises from only a subset of the commands employed. In addition, the variability of human response is mixed with the variability in computer response. A better approach is to time the response of the computer to each individual command. From the collection of the Computer History Museum (www.computerhistory.org) Computer Performance Variability The analyst can time these responses with a stop watch as done by Lockett and White,l1· but this technique fails when the computer's response time becomes small. In such situations the human's response time in operating the watch may exceed the computer's response time, and human variability dominates computer variability. In addition, the human often becomes sloppy when large amounts of data are needed because the job becomes tedious. To avoid these problems, we designed and implemented a hardware device to time responses to a resolution of one millisecond. A nalysis approach Performance can be made to vary by orders of magnitude if a programmer puts his mind to it. By using a computer with a low resolution clock and doing careful programming, a programmer could execute a job that was almost never in control at the time the computer's clock advanced. Thus his job would be charged for using almost no resources. On the other hand, the programmer could leave out the special timing controls and be charged for using hundreds of times as much processing resource. An exercise performed in this way would prove little since any relationship to a normal job stream would be tenuous. A more useful indication of variability's magnitude would be the lower limit to be expected in more normal situations. Our experiments were directed to this objective and therefore involve simple situations vvhere low variability is to be expected. Ultimately, all variability could probably be explained. Variations in processing rate could be caused by fluctuations in power line frequency; differences in elapsed time for processing I/O-bound jobs might be explained as variations in the number of I/O-retries; variability in initiation time could sometimes be explained by slight differences in the length of a job control file. However, these circumstances are transitory and usually unknowable to the user. Our tests were designed to represent the best possible situation that a user could realistically expect. The simplest situation we investigated is one in which no jobs other than the test job are active; the system is initialized at the beginning of the test period to establish that no other jobs will interfere with the experiment. In this simple test the parameterized test job is set to run in either of two modes-as a totally CPU-bound job or as a totally I/O-bound job. The tests then increased in complexity through controlled multiprogramming situations to uncontrolled, normal operations. Results of the tests were used in simple analyses to indicate the degree of variability that should be expected, but in this initial investigation we made no attempt to employ sophisticated statistical models. In all cases we used computers that were isolated from any networks, and 've usually allO\ved only controlled on-line activity so that unexpected loading would not occur. In some cases we found that physically disconnecting transmission lines was necessary to achieve an environment that TABLE I-8tand Alone Runs-Elapsed (All Times in Milliseconds) 763 Time~" Type of Job Number Passes Mean Elapsed Time Standard Deviation CPU CPU 1/0 1/0 20 20 20 20 29135 29135 35165 35300 73 73 135 186 was strictly controlled. Whenever we relaxed our controls, loading became random and variability increased. E~VIPIRICAL RESULTS The most elementary measurement of performance is probably elapsed time, and it is often the one of most interest. Results of elapsed time investigations are therefore presented first, with results involving I/O and CPU metrics following. Batch elapsed time The phenomenon of variability in elapsed times under multiprogramming conditions is well-known. When a user's job is run with different mixtures of other jobs, differing amounts of resources are denied it each time it is run; thus the job may execute slowly one time and rapidly the next. One way to decrease this variability (and also decrease the average time) is to ensure that the job of interest is run with the highest priority. The interested network user might therefore occasionally pay to run at a very high priority in order to determine the performance under "best possible" conditions. However, an even better situation is to run stand-alone on the computer. We therefore executed the synthetic job in its simple CPU-bound and I/O-bound versions on an otherwise idle system. The elapsed time to execute the CPU-bound portion of this job on an IB:\1 360/65 operating under OS/~IVT at The Rand Corporation evidenced no variability within the job that was above the measurement's resolution of 32 milliseconds. (See Table 1.) The I/O-bound portion's elapsed time, however, typically varied enough to result in standard deviations (for 20 identical executions) of about .5 percent of the mean (mean of about 3,) seconds). This small value indicates that, at least within a stand-alone job, variability is not impressive. The situation becomes less encouraging when the comparisons are behveen separately initiated jobs. The statistics of interest now include initiation time, termination time, and average execution time of a pass through the timed loop for each separately-initiated job. The elapsed time for initiating the jobs (the time bet,,;een initiation as recorded by the accounting system and the time control has passed to the job's code) averaged 8.25 seconds with a From the collection of the Computer History Museum (www.computerhistory.org) 764 National Computer Conference, 1974 TABLE II-8tand Alone Runs-Channel Times (All times in Milliseconds) Disk Run Number File 1 File 2 File 3 File 4 1 2 3 34432 344.56 34429 34274 34283 34257 27351 27298 27334 39809 39823 39816 Mean 34439 34271 27328 39816 Tape Run Number File 1 File 2 File 3 File 4 4 5 6 22356 22404 22350 22533 22621 22677 23328 23350 23087 23093 23103 22370 22610 23339 23094 Mean 2~~40 standard deviation of 6.67 percent of the mean. The elapsed time for termination (time user code completes executing until the accounting system records termination) averaged 3.775 seconds wltn statidard deviation of 16.7 percen-tof the mean. Although our experiments provided multiple samples of initiation and termination, the sample size of identically run executions was too small for computation of meaningful measures of variability. In two specific instances, however, the average elapsed times to execute the totally CPUbound portion differed by less than the resolution of measurement. The average execution time for the I/O-bound portion, in two instances of samples of two, changed by 0.4 percent and 1.2 percent. (Allocation of files was done identically in each instance.) Although the within-sample variability appears small for internal portions of a job, the initiation and termination times vary significantly. Therefore, comparisons involving elapsed times should be designed recognizing that the variability of initiation and termination may obscure some results. as percentages of means ranged from a low of 5.7 percent, to a typical 12 percent, to a high of 30.8 percent. These values were obtained with different levels of user activity on WYLBUR by real users (as opposed to our artificial load for testing), but did not include the variability often introduced by time-sharing systems since the 360/65 was not time-shared. We executed our standard synthetic job on a Honeywell 615 computer under its time-sharing system to determine variability in this on-line system. The elapsed time to execute each pass through the CPU-bound portion ,vas recorded in order to provide an indication of the variability of response time during normal operation on that system. Each pass could be executed in about 9 seconds in a standalone system. However, during normal oper9.tion the average was occasionally as large as 597 seconds with the standard deviation exceeding the average. (It was 598 seconds.) Variability of this magnitude can produce highly anti-social behavior by some network users. I/O activity .-- Elapsed time is variable in all but the simplest cases, but perhaps the network comparison shopper can depend on charges from a bi11ing system to be within the 1 percent TABLE III-Multiprogrammed Runs-Channel Times (All times in Milliseconds) InInRun File 1 Increase File 2 crease File 3 crease File 4 Increase Percent Percent Percent Percent (Disk) 1 1 2 2 3 4 28543 37678 27726 38578 30896 30200 -17.1 9.4 -19.5 12.0 -10.3 12.3 37227 48834 37366 49977 38983 47346 8.6 42.5 9.0 45.8 13.7 38.2 49739 46655 47920 45346 24687 24730 82.0 70.7 75.4 65.9 -9.7 -9.5 44329 30638 43163 28613 24162 24195 11.3 -23.0 8.4 -28.1 -39.3 -39.2 1.9 23570 2.2 23675 2.1 2.5 (Tape) On-line elapsed time 22698 22873 3 4 An on-line system operating with low priority in a computer would be expected to have variable response, but relatively constant response is usually expected when the on-line system is given a priority only a little below the operating system itself. 'Ve ran a series of carefully designed tests to provide an indication of this assumption's validity on our WYLBUR12 test editor in Rand's normal environment. A heavily I/O-bound activity (listing a file Ull a video terminal) experienced response time with a standard deviation 23.2 percent of the mean. A heavily CPU-bound activity (automatically changing selected characters) had reduced variability-15.6 percent of its mean. A variety of other editing functions experienced similar variability under a pair of different configurations. The standard deviations 1.3 22908 2.2 23032 1.4 23775 1.9 23849 Notes: 1. Three jobs were muitiprogrammed in each case. The first job was CPU-bound; its results are not represented here. The other two jobs were I/O-bound and were paired as follows: Run Second Job Type Third Job Type 2 3 4 Disk Disk Disk Disk Disk Disk Tape Tape 2. All increases represent changes from the mean times indicated for individual files in Table II. From the collection of the Computer History Museum (www.computerhistory.org) Computer Performance Variability suggested by the Wiorkowskis. We ran tests on two systems to test this. Repeated runs of the same stand-alone, I/O-bound job resulted in recorded channel times on a Honeywell Information Systems (HIS) 6050 with ranges considerably less than 1 percent of the means. (See Table II.) When run ,vith one CPU-bound job and only one other I/O-bound job, the channel times changed from the stand-alone values by an average deviation of 2.0 percent for tape files, but 29.2 percent for disk files. (As indicated in Table III, both positive and negative deviations were observed.) In one case, an 82 percent deviation ,vas observed between the charges for the stand-alone run and a three-job multiprogramming case. All these instances involved simple sequential files. Using this type of file several times with identical jobs usually results in nearly constant performance when considering the metric of I/O requests. I/O counts (Execute Channel Program requests, EXCPs) are reported in IB:\I's System ~1anagement Facilities (S::YIF) accounting system and seldom vary. However, Sl\IF only reports I/O requests rather than actual I/O accesses. The number of actual I/O accesses normally exceeds the number of requests since channels can execute a number of accesses as a result of a single EXCP. The difference usually becomes dramatic in the Indexed Sequential Access ::\fethod (ISAl\f) where a single request may result in extensive examination of overflow areas. Processor activity Processor time appears to be a straightforward metric, but even its definition is open to dispute; major parts of a job's CPU activity may not be logically associated with that job alone. For instance, the rate of executing instructions may be reduced as a result of activity on a channel or another processor in the system. In addition to the definitional problem, accounting systems are often implemented in ways that users feel are illogical. These problems are seldom of importance when a job runs alone in the system, but may be critical during multiprogramming or multiprocessing. Repeated runs of our test job on an IBM 360/65 did not indicate any meaningful variability when the CP"Gbound job was run stand-alone, but the I/O bound job, in two cases, provided CPU times of 29.4 seconds and 28.7 seconds (a difference of 2.4 percent of the mean). The I/O variability, however, does not appear critical because this job ran, stand-alone, for an elapsed time of approximately 780 seconds; the difference of 0.7 seconds is therefore less than 0.1 percent of the elapsed time. Under multiprogramming results vary more. Although the CPU charges should not vary when the job is run multiprogrammed rather than stand-alone, our results indicate that the reported charges contained both a biasing element and a random element. The charges for the CPU-bound job went up from 583 seconds (stand-alone) to 612 seconds (when run with the I/O-bound job) to 637 765 seconds (when run with a job causing timer interrupts every 16.7 milliseconds) to 673 seconds (when multiprogrammed with both the other jobs). The changes in CPU charges are clearly dependent on the number of interrupts the system handles for other jobs on the system. The largest CPU charge observed in this series of tests with the CP"Gbound job was 16 percent over the stand-alone charge, but larger biases can be obtained by running more interruptcausing jobs simultaneously. In one particularly annoying case, the author observed a production job (as opposed to a synthetic one) whose CPU charges differed between two runs by an amount equal to the smaller of the two charges-a 100 percent variation! I/O-bound jobs often experience CPU charge variability of equal relative magnitude, and users with I/O-bound jobs have come to expect 30 percent variations in their charges. These problems are not unique to IB::'.'1 equipment. VVe found the same sorts of variability when running on a Honeywell 60.50 processor. With only a single processor active, ,ve observed processor times that (with two other jobs active) increased up to 7.2 percent over the stand-alone average charge. :\10re jobs and more processors increase the variability. IXTgRPRETATIOX The results of performance tests indicate conclusively that something is varying. Some of the reported variability is undoubtedly due to the reporting mechanisms themselves. For example, the CPU time reported by IB.:\1's S::\fF in an MVT system attributes the processing time for I/O interrupts to the job in control at the time of the interrupt. Since that job may be anyone running on the system (the job that caused the I/O to be executed or an unrelated one), CPU charges are dependent on which jobs are concurrently executing as ,veIl as the micro-level scheduling of these jobs. This scheme had the advantage of accounting for most of the CPU activity at reasonably lmv cost, but it often reported a number whose meaning was obscure. (IB::\f's new virtual system reduces the reported variability by not reporting the I/O interrupt-handling time.) The entire amount of variability cannot be attributed to reporting mechanisms; the basic processes are clearly subject to variability-causing phenomena. Some of these can be identified subsequent to a test and employed to design better-controlled tests in the future. For example, in some of our tests we could have reduced variability by using only old files. K ew files are allocated by the operating system in a manner that depends on previous file allocations; using only old files would ensure that physical file positions would be identical for each test run and result in reduced variability. However, employing this strategy would preclude determining how wen the operating system performed file allocation. In general, the more realistic the desired test, the more variabiiity the analyst must accept. Even if a test is run with good performance reporting in From the collection of the Computer History Museum (www.computerhistory.org) 766 National Computer Conference, 1974 a very tightly controlled environment, performance must be considered a random variable. If files are allocated prior to testing, random I/O errors still may occur. Rotating devices do not possess ideally constant, known rotational velocities. In addition, micro-level sequencing in a processor is often dependent on the precise start time of user jobs and the entry time of system support modules. The ability to explain these effects after the test does not imply that they are knowable before the test. Comparisons of elapsed times or charges between computers on a network cannot depend on variability being within the desired few percent. Even in the simplified situations reported in this paper the variability was often large enough to preclude single-sample evaluations from being dependable. If a user intends to do sigrificant computing after choosing a node, he should ensure that his evaluation reflects this reality. Further, he should occasionally check the environment on the network to see whether charges or response time have changed enough to justify a change in his workload allocation scheme. REFERENCES 1. Hall, Gayle, "Development of an Adequate Accounting System," New York, Share, Inc., Computer Measurement and EvaluationSelected Papers from the SHARE Project, Vol. 1, 1973, pp. 301-305. 2. Kreitzberg, C. B. and J. H. Webb, "An Approach to Job Pricing in a Multi-programming Environment," Proceedings Fall Joint Computer Conference, 1972, pp. 115-122. 3. Young, J. W., "Work Using Simulation and Benchmarks," New York, Share, Inc., Computer Measurement and EvaluationSelected Papers from the SHARE Project, Vol. 1, 1973, pp. 286-292. 4. Wiorkowski, G. K. and J. J. Wiorkowski, itA Cost Allocation Model," Datamation, Vol. 19, No.8, August 1973, pp. 60-65. 5. Bell, T. E., B. W. Boehm, and R. A. Watson, "Computer Performance Analysis: Framework and Initial Phases for a Performance Improvement Effort," The Rand Corporation, R-549-PR, November 1972. (Also Fall Joint Computer Conference 1972, pp. 11411154.) 6. Watson, R. A., Computer Performance Analysis: Applications of Accounting Data, The Rand Corporation, R-573-PR, May 1971. 7. Nielsen, N. R., "Flexible Pricing: An Approach to the Allocation of Computer Resources," Proceedings Fall Joint Computer Conference, 1968, pp. 521-531. 8. Buchholz, W., itA Synthetic Job for Measuring System Performance," IBM Systems Journal, Vol. 8, No.4, 1969, pp. 309-318. 9. Wood, D. C., and E. H. Forman, "Throughput Measurement Using a Synthetic Job Stream," Proceedings Fall Joint Computer Conference, 1971, pp. 51-55. 10. Vote, F. W., Multiprogramming Systems Evaluated Through Synthetic Programs, Lincoln Laboratories (MIT), ESD-TR-73-338, December 1973. 11. LO"ckett,J. A~aiidA.R. White, -Controlled -Tests!or'Perjorrrwnce Evaluation, The Rand Corporation, P-5028, June 1973. 12. Fajman, R., and J. Borgelt, "WYLBUR: An Interactive Text Editing and Remote Job Entry System," Communications of the ACM, Vol. 16, No.5, May 1973, pp. 314-322. From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2025 Paperzz