EXPLICIT PARALLEL PROCESSING DESCRIPTION AND CONTROL IN PROGRAMS FOR MUL11- AND UNI-PROCESSOR COMPUTERS J. A. Gosden AUERBACH Corporation Philadelphia, Pennsylvania This paper discusses the development and current state of the art in parallel processing. The advant~ges and disadvantages of the various approaches are described from a pragmatic viewpoint. One that is described and discussed is a natural extension of the evolutionary approaches. This alternative covers the problems of locating, describing, controlling, and scheduling parallel processing. It is a general-purpose approach that is applicable to all types of data processing and computer configuration. GENERAL APPROACH The approach is based on the following assumptions. 1. The specification of potential parallel activity in processing will depend largely upon programmer specification. 2. Major benefits will accrue only if a high degree of parallelism exists; i.e., either many parallel paths exist or, if the parallel paths are few, they must be long. 3. Programmers will specify parallelism only if it is easy and straightforward to do so. 4. A simple control scheme is needed to provide straightforward scheduling. Programmer Specification It is obvious that an automatic scheme can be devised to recognize independent parallel paths when the paths are at a task or an instruction level. The latter scheme has been implemented in STRETCH 9 * and other computers; the former is a simple matter of comparing the names of the input and output files of various tasks in a job. At intermediate levels where subroutines, parameters, and many other forms of indirect addressingexist, the problem is at present unsolved from a general-purpose practical viewpoint. Therefore, we must now look to the programmer to define independent processes. Degree 0/ Parallelism Previous experience with simultaneous or overlapped I/O and multiprogramming suggest that full potential gains are· never realized because of the * In this paper the references are combined with an extensive bibliography and are, therefore, listed in their alphabetical sequence, rather than in consecutive numerical order. 651 From the collection of the Computer History Museum (www.computerhistory.org) PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 652 difficult scheduling problems. Therefore, we expect that scattered parallelism of two or three short paths may not lead to significant gains; multiple paths must be sought. If there are two or three long parallel paths, it may be more practical to specify them as separate tasks. Easy Specification Experience indicates that any new feature in parallel processing must be easy to specify; otherwise, the feature will not be much used or exploited. Both simultaneous I/O and multiprogramming gained most when they were made programmer independent. However, our first assuniption implies that in this case we cannot rely on programmer independence, so we must make the specification of parallel processing easy and simple to encourage its use. Good Control Scheme There are two main requisites of a good control scheme. First, it must be simple so that the overheads of implementing it do not seriously erode the advantages gained. Second, it should be efficient with simple ad hoc scheduling algorithms. Experience has shown that ad hoc schemes are successful whereas complex schemes are not only difficult to implement but disappointing in general performance. 9 ,13 BACKGROUND This section discusses some of the existing literature on parallel processing and relates the findings to the assumptions previously made. Parallel processing in various forms has produced interesting developments in the computer field. For those interested in pursuing the subject, a general bibliography attached to this paper indicates the continuing voluminous literature on this general subject and its pervasive influence on computer hardware and computer software architecture. It is possible to divide the development into six general classes: Class 1 (1953) Simultaneous Input-Output Class 2 (1957) Fork and Join Statements Class 3 (1958) Multiprogramming Operating System Class 4 (1959) Processor Arrays Class 5 (1961) Planned Scheduling Class 6 (1962) Re-entrant On-demand System The dates are approximate origins of development, but there has been no attempt to assign single inventors because most ideas were developed in several places more or less concomitantly. These titles are useful labels for discussion in this paper but are not intended as formal definitions. The author regrets that he has not been able to incorporate in this paper formal definitions for the various labels used. Class 1, Simultaneous Input-Output (achieved either by the use of hardware or even programmed timesharing of the processor) made useful increases of two- or threefold in computer power by parallel use of input-output and computation processes. Class 2, Fork and Join Statements provided a simple language mechanism for programs to take advantage of a multiprocessor computer system. The literature does not show great gains in this area. Examples given 12,17 have not looked rewarding except one attributed to Poyen 48 by Opler 45 which, however, is somewhat clumsy; it is predicated on a specific number of processors and requires an exertion on the part of the programmer that will not in general be obtained. Opler, however, has recognized what we shall call "parallel-for" construction; his term is "do-together. " Class 3, Multiprogramming Operating System provides a mechanism for a greater degree of simultaneous input-output by mixing programs and balancing the use of the parts of a computer configuration,14,26,27,56 which has resulted in useful gains of some two- to threefold. 42 This subset of parallel processing maintains independent processes each of which is part of an independent job. These systems have enjoyed extensive development in both large and small computer systems, and in general their schedulers operate on a queue of tasks using ad hoc scheduling algorithms. Good results have been achieved with straightforward general-purpose approaches. Class 4, Processor Arrays are computers with replicated arrays of processors which are powerful on a specific range of problems. 44 These processors operate on an array of data obeying a common program. The SOLOMON 54 computer and the structure proposed by Holland 28 are the best known although they have not enjoyed popUlarity, partly because they depart radically both in architecture and solution techniques from the evolutionary mainstream (a tendency not likely to succeed as Brooks 7 pre- From the collection of the Computer History Museum (www.computerhistory.org) EXPLICIT PARALLEL PROCESSING DESCRIPTION AND CONTROL dicts) and partly, in the writer's opinion, because they shun the qualifier "general-purpose," which has been the touchstone of the success of the computer. Associative memories are another special case. Class 5, Planned Scheduling is a title which covers schemes that propose to describe parallel-processing structures and to schedule them using a CPM or job shop scheduling approach. Typically, each part of a process is described separately and is· associated with a set of conditions or inputs which cause it to be executed instead of being linked in a program sequence. 21 ,37,41,52 Such schemes are the antithesis of Class 3 in that they pre-partition the problem into logical parts rather than apply the somewhat arbitrary paging of general operating systems. Some of those schemes attempt (or attempted 13) pre-schedtiling, relying on expected time and space utilization estimates. Again this is a major departure from the mainstream of computer development and has not enjoyed much popularity. Class 6, Re-entrant On-demand Systems are an interesting mutation of multiprogramming systems where the process is common to all users but the data sets may be independent. The classic example is an airlines reservation application. Since the queries are independent, a multiprogramming operating system approach can be used. Tile principle of re-entrant routines was developed to allow sharing of access to a common set of programs. 3,5,49 It is also relevant to point out a class of activity which has not existed, that is, automatic recognition of parallel processes. This is a subject which is still at an early stage of development. At present there are systems that seek out some overlap at the instruction level in large-scale processors, but no work has been done at the subroutine or program level. Even at the instruction level, as the processor reshuffles access and attempts parallelism, as in the look-ahead in STRETCH,9 the problem of interrupts and unwinding is formidable. The problems of finding implicit parallelisms in programs are also formidable, especially with the use of parameters, block structures, and late binding. Even if possible, the compiling overhead in analysis may be considerable. The foregoing summary identifies the author as a mainstream evolutioner rather than a radical innovator. This does not mean that he designates classes 4 and 5 as useless but as yet unproved, and agrees 653 that radical innovation must prove itself by large gains to offset its disruption whereas evolution can be justified by more modest gains. THE PARALLEL FOR The simple premise of this paper is that all FOR statements in an ALGOL program, and similar constructs in other languages, are good potential sources of parallel activity. FOR statements divide into two kinds: first, those which are iterative and second, those which are parallel. The second group is a large fraction of the total. Two examples may suffice, one scientific and one commercial. A typical scientific example is matrix addition in which the pairwise additions of corresponding elements can be performed independently, and therefore in parallel. A typical commercial example is the extension of an invoice where the individual multiplications of price per unit times quantity for each item can be performed independently, and therefore in parallel. It is easy to see that by using the simple device of describing each FOR statement as either iterative or parallel, a large group of parallel paths in programs can be identified. It is interesting to note that Opler,45 and later Anderson,2 use a FOR statement to illustrate their notations. The FOR statement is a special case of a loop; it is, therefore, natural to dichotomize loops into those that are either iterative or parallel in nature. One simple example shows that parallel loops not only exist but exist in large numbers. The main loop in a payroll program (that is, the payslip loop) is independent for each man. It is, of course, a FOR loop where the meaning might be "for every record on the master file .... " It may appear unreasonable to distinguish between a PARALLEL FOR and a rewritten loop, but there are substantial pragmatic differences that would distinguish them at a programming language level. . SUMMARY OF PARALLEL STRUCTURES We have identified three areas of parallel structure in programs: 1. FO RK to several dissimilar paths and JOIN 2. PARALLEL FOR statements 3. LOOPS rewritten as PARALLEL FOR statements From the collection of the Computer History Museum (www.computerhistory.org) 654 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 All of these. structures require some kind of JOIN as a prelude to subsequent processing. Techniques for Item 1 have been described elsewhere 12,17 and are described later for Items 2 and 3. However, there also exist two other types that should be recorded for completeness: 4. FORKS without a JOIN 5. Completely independent processes As an example of Item 4 consider a task that updates a file and then produces several reports. After the update there is a FORK to each report routine with no need to JOIN. PL/1 has a facility to do this.49 As an example of Item 5 consider an airlines reservation system and the individual queries and updates made upon it. ADVANTAGES OF PARALLELISM The identification of parallelism in programs is useful only if some advantage is gained thereby. For both uni- and multiprocessor systems there are advantages. In general we can note that, whereas program structures in which FORK statements are used do not tend to show a high order of parallelism, PARALLEL FORS and PARALLEL LOOPS do naturally show a high degree of parallelism; moreover, because they tend to occur in nested sets, they increase the parallelism multiplicatively. Multiprocessor Advantages The multiprocessor situation contains four interesting cases: First, there is the simple case where the number of programs to be run is less than the number of processors. The availability of parallel paths enables the system to use the otherwise idle processors. This situation may not seem to occur frequently· but it can occur often when a system begins to run out of peripheral devices, even if the load on them is not heavy. Second, there is the case where a mixture of highand low-priority jobs is being run. If the disparity in processing is large enough, several processors could work on the high-priority program and reduce its turnaround time. The third and fourth are cases where batch processing response times for large jobs can be improved if the jobs are run serially rather than in parallel. The third case is illustrated by the simple (and over-simplified) example of three jobs that each require one hour's capacity of the system. When run serially, they would be completed in one, two, and three hours, respectively, giving an average turnaround time of two hours. When run in parallel, they would have an average turnaround time of three hours. Of course, the choices are never as simple as this example supposes but, in general, serial operation is desirable. The fourth case considers the savings in internal system overheads when parallel activity on one task replaces parallel activity on different tasks. In situations where frequent overlays are made for both data and program segments, the amount of shuffling can be reduced. In particular, if several processors keep approximately in step (not necessarily exactly in step as required in SOLOMON), they can share access to, and residence of, program segments and some data segments. Even if they do not share exact access, but are accessing a disc, cartridge, or other complex access device, then there is more opportunity· to batch and optimize accesses for parts of one task than with parts of diverse tasks using different areas of auxiliary storage. Single Processor Advantages The single processor system contains two interesting cases. First, there is the case of a mix of multiprogrammed tasks where one has an outstanding priority. Suppose it stalls due to some wait for I/O; then if parallel paths exist, it may be able to proceed on another path instead of passing control to a lower priority task. Such cases have been "hand-tailored" in the past relying on a complex look-ahead type of structure or the batching of accesses to auxiliary storage. If parallel paths were started automatically, the batching could adjust itself dynamically. A typical example would be an operating system that has two kinds of scheduler, one that allocates processors and another that batches and queues access to auxiliary storage. Then parallel paths could utilize processors while more urgent processes are awaiting input. Second, there is the case of reducing overlays. Particularly in systems where relatively small overlays are used, the problem of fitting internal loops within one overlay is difficult and sometimes impossible. On a loop that needs two segments, the overlays needed for x executions could be reduced from From the collection of the Computer History Museum (www.computerhistory.org) EXPLICIT PARALLEL PROCESSING DESCRIPTION AND CONTROL 2x to 2x/n if n parallel paths were used. Both these cases also apply in the case of multiprocessor systems, and are illustrated in the example given in the section on Processor Scheduling. Al PREP ~ TECHNIQUES A2 AND (AS) Basic Elements This section is not a comprehensive treatment of all cases but a range of examples to show how simple techniques can be used to implement and control parallel activity. It will be seen that all the techniques can be handled by a simple organization of five simple functions (PREP, AND, ALSO, JOIN, and IDLE). This provides a uniform mechanism which can easily be incorporated in processor hardware if the economics justify it. PREP is a function performed before a new set of parallel paths begins. It causes a variable called PPC (parallel path counter) to be established. If other PPC's exist for this process, they are pushed down, which allows sets of parallel paths to be nested. PREP sets the new PPC to the value one. AND (L) is a simple two-way fork; it requests the controller to start a parallel sequence at L and to add one to the current PPC for this process. The processor executing the AND continues to the next instruction. The controller queues the request for the AND sequence. ALSO (L) is a simple two-way fork which is the same as AND except that no PPC converters are involved. It is equivalent to TASK in PL/1 49 and is used for divergent paths that do not rejoin. JOIN is a function used to terminate a set of parallel paths. When executed, it reduces the current PPC for the' process by one. If PPC is then zero, it is popped up and processing continues. If PPC is not zero, meaning that more paths remain to be completed, it releases the processor executing the JOIN. IDLE is a function that ends a parallel path and releases the processor executing the IDLE. It should be noted that by associating PPC with a process rather than the JOIN statement, the property of re-entrant code is retained. JOIN is used as a device similar to CLOSE PARENTHESES. Example 1. Classic FORK and JOIN The popular form of the FORK statement is: FORK (SI, S2, S3 ... ) (1) At the execution of this statement the program divides into parallel paths that commence at statements labelled Si, and join at the statement labelled J. 655 ~ A3 82 A4 GO TO JOIN • if 1 A5 A6 81 • JOIN Example 1 shows the implementation of FORKJOIN for two paths. Al establishes a new PPC counter and sets the counter to 1. A2 initiates A5 as a parallel path and increments the PPC counter to 2. A3 executes the second path S2. A4 ends S2 with a transfer to JOIN. A5 executes the first path S1. A6 is the join; when one path ends it reduces the PPC to 1 and frees the processor, when the other path ends, the PPC goes to zero, the next PPC is popped up and the processor continues to A 7. The extension to three or more paths is obvious. Example 2 shows the implementation of a P ARALLEL FOR process. Boxes Bl, B2, B3, and B4 are the translation of the FOR statement. B 1 initializes the generation of the values of the parameter of the FOR statement and establishes the new PPC. B2 is the incrementing or stepping function. B3 is the end test and transfers to JOIN when all paths have been started. B4 starts a new parallel path. B5 is a parallel path. B6 is the JOIN. In contrast to the examples given by Opler and Anderson, this scheme is independent of the number of processors available, and there is no need to state From the collection of the Computer History Museum (www.computerhistory.org) PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 656 Example 2. A PARALLEL FOR Bl INITIALIZE and PREP J t STEP TO NEXT VALUE B2 l IF END GO TO JOIN B3 B4 - yes t no AND (B2) _t B5 DO FOR THIS VALUE Example 3. Unjoined Forks , I B6 JOIN Example 4 shows how the classic file maintenance, or payroll, application can be handled. The preliminary part is similar to the PARALLEL FOR but the JOIN structure is different because we take the case where the output is required to be in the same sequence as the input. To do this, each record processed has a link set to point to the following record. This is set when the following record is established in D2. D 1 opens files, sets initial links, and sets data areas. D2 gets the next record and makes links. D3 ends the loop that produces the parallel paths; D4 produces the parallel paths; and D5 is the parallel path. The JOIN occurs at D6 and uses a lockout mechanism to ensure that only one processor executes output at a time and in proper sequence. Where parallel processing is allowed, there are some functions that must be carried out by only one proc- Cl PERFORM UPDATES ~IDLE ~ B7 C2 C3 explicitly the relationship between the FORK and the JOIN. This process allows the number of paths to be data dependent, and even to be zero. Note that the PPC shows how many parallel paths are active. This concept has been described and defined in ALGOL by Wirth.58 He uses the term AND (which he attributes to van Wijngaarden, and which we borrow for consistency) but he does not show how the system controls the JOIN nor does he provide an explicit PARALLEL FOR. The programmer builds it himself. Wirth is correct in that a programming language does not specifically require the JOIN but the hardware does need a JOIN delimiter. Example 3 shows how unjoined forks are implemented, which is the example discussed in Item 5 under Parallel Structures. Note that PREP is not used, no PPC counters are involved, and IDLE and ALSO are used instead of JOIN and AND/. PRINT REPORT 1 C4 C5 C6 PRINT REPORT 2 C7 C8 C9 From the collection of the Computer History Museum (www.computerhistory.org) EXPLICIT PARALLEL PROCESSING DESCRIPTION AND CONTROL 657 not be established if a delay occurs such that a record is output before its successor is input. One alternative is to delay output; another is to provide an interim dummy linkage. Fourth, we must make box D2 include choosing an insertion to the file. The full logic for this box determines whether the next record to be processed is Example 4. Serial File Processing \)1 D2 D4 1. one record from the master file and a matching record from the detail file. 2. one record from the master file and no matching detail record. 3. one record from the detail file and no matching master record. D5 D6 ~ D7 IDLE UNLOCK ~ PROCESSOR SCHEDULING Apart from the wish to run certain problems faster in a multiprocessor, the most interesting case in scheduling is the possibility of shared access to program segments by processors and the subsequent reduction in overlay overheads. It may seem at first that this would lead to more complex dynamic storage allocation than is desirable. However, it turns out that shared access arises naturally. There are three basic advantages that this scheme enjoys: D8 D9 DIO Dll DIO ess at a time, for example, updating a record. Various hardware locks have been developed, 3 and even some complex software locks. 22 ,35 Wirth 58 proposed a term SHARED in ALGOL to denote procedures that are not to be used in parallel. Special Points to Note First, we must note that these techniques presuppose that all programs use re-entrant code. Second, we must provide some special protected functions such as "add to x" so that totals can be accumulated from each path, or a lock-out feature to temporarily restrict access to a section of data. 2 ,3,22,35,43,59 Third, we must arrange the linkage in Example 4 so that the system can tolerate such situations as the case of an empty file, and one record being completed before the next is initiated. This is necessary because the link from onerecord to its successor, used to control output, may • Processes generate parallel paths dynamically one at a time on an "as needed" basis, which contrasts with the schemes proposed by Opler 45 and Anderson. 2 It limits scheduling overheads in that the process generates only records of parallel processes as they are encountered. For example, the number existing for an N -way P ARALLELFOR will be only -one plus the number initiated, however large N may be. • Distribution of parallel paths among varying nu~bers of processors is an easy ad hoc process. • The same basic data can be used by the scheduler to exploit any of the alternative advantages of parallel processing. The following examples illustrate the last two points. Consider a system with three processors: I, II, and III. Consider that there are six tasks in the system. Each task has one or more processes live at a time. Each process is either active in a processor or is in one of three queues. A process in queue 'a' has the segments it needs in core, but no processor. From the collection of the Computer History Museum (www.computerhistory.org) PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 658 Table I An example without priorities. allel paths builds up demand for its segments. Table II shows how the system reacts for priority classes and uses low priorities to fill the gaps. STATVS RESVLT Five processes exist, Al through El Four have segments in core, Al through Dl Three are active, Al through Cl Al Bl Cl Dl El 81 Tl VI VI WI I II III. a c Al creates :\2 which uses SI, ,md A2 into queue 'a' Al Bl Cl Dl El A2 81 Tl VI VI \\,1 81 I II III a c a Al needs next segment 82, Al goes into queue Ie', Dl obtains processor I Al Bl Cl Dl El A2 82 Tl VI VI IVI 81 c II III I 81 needs next segment T2, 81 goes into queue Ie', A2 obtains processor II Al Bl Cl Dl El A2 82 T2 Tl VI WI 81 c c III I c II Now a new segn1ent can be called to overlay T1. WI has waited longest and El goes into 'b' Al Bl Cl Dl El A2 .\3 82 T2 VI VI IVI 81 81 c c III I b II a A2 needs next segment 82, A2 goes into queue Ie', A3 obtains processor II Al Bl Cl Dl El ,\2 A:l 82 T2 VI Dl WI 82 81 c c III I b c II \\,1 arrives and El goes into queue 'a' Al Bl Cl Dl El A2 A:l 82 T2 VI Dl WI 82 81 c c III I b c II C 1 needs next segment V!. queue 'c'. El obtains III Al Bl Cl Dl El A2 A:l 82 T2 VI Dl IVI 82 81 c c c I III c II gO(~S C 1 goes into Now VI can be overlayed. 82 is chosen because two processes in queue lei need it. Al and A2 go into queue 'b'. ASSETS OF THIS SCHEME There are nine significant assets of this way of considering parallel processing: 1. It indicates profitable sources of parallelism in conventional program structures. 2. It enables some parallelism in FOR statements to be shown in existing programs with trivial changes. 3. It provides an easy, and hopefully popular, way to express parallelisms in all conventional programming languages. 4. It provides a simple mechanism to control parallel activities which could also be incorporated in hardware. 5. It requires only a small expansion to the bookkeeping required in operating systems. Al Bl Cl Dl El A2 A3 82 T2 VI Dl WI 82 SI b c c I III b II Table II A process in queue 'c' does not have all its segments in core. A process in queue 'b' is awaiting termination of some peripheral I/O, in particular, perhaps the arrival of its needed segment in core. The state of the system can be represented by Table I. The tasks are labelled Al through El. When they fork into separate processes, numeric tags are used. For simplicity we assume that any process needs only one segment at a time and there is only room for four overlay slots in executable storage (Le., storage from which processors select and execute instructions). It is obviously desirable to have more overlay slots than processors to build up work for processors. The current segment needed by a process is shown in line 2. The status of each process is shown as either the identity of the processor executing it or the queue 'a', 'b', or 'c' where it is located. In the case of equal priority processes we assert that the scheduler (i) prefers to initiate processes in queue 'a', (ii) prefers to get a segment wanted by most members of 'b', and (iii) gives chronological preference. Table I shows how the system might react with a set of equal priority processes and how a set of par- An example with priorities. Five processes exist, Al through El Four have segments in core, Al through Dl Three are active, Al through Cl Al Bl Cl Dl El 81 Tl VI VI IVI I II III a c Al creates A2, which uses PI and goes into queue 'a' Al A2 Bl Cl Dl El 81 81 Tl VI VI WI I a II III a c e 1 goes A2 preempts C 1 and gets III, into queue 'a' Al A2 Bl Cl Dl El 81 81 Tl VI VI WI I III II a a c Al needs next segment 82 and goes into queue 'c'. Cl gets I Al A2 Bl Cl Dl El 82 81 Tl VI VI WI c III II I a c Priority calls for 82 to be requested to overwrite VI. Al goes into queue 'b', DI goes into queue 'e' Al A2 Bl Cl Dl El 82 81 T1 VI VI WI b III II I c c A2 creates A3 which uses 81 and goes into queue ' a t Al A2 A3 Bl Cl Dl El 82 81 81 Tl VI VI WI bIVaIIIcc el A3 preempts Cl and gets I, goes into queue 'a' Al A2 A3 Bl Cl Dl El 82 81 81 Tl VI VI IVI b III I II a c c 82 overwrites VI, Al goes into queue 'a' Al A2 A3 Bl Cl Dl El 82 81 81 Tl VI VI WI a III I II a c c Al preempts Bl and gets II BI goes into queue 'at Al A2 A3 Bl Cl Dl El 82 81 81 Tl VI VI WI II III I a a c c From the collection of the Computer History Museum (www.computerhistory.org) EXPLICIT PARALLEL PROCESSING DESCRIPTION AND CONTROL 6. It is applicable to all types of computer structures and applications. 7. It permits the dynamic changing of the number of processors available. 8. It permits the exploiting of are-entrant code by individual programs. 9. It offers opportunities to reduce heavy page turning overheads. REFERENCES-BIBLIOGRAPHY 1. G. M. Amdahl, "New Concepts In Computing System Design," Proc. IRE, vol. 50, (May 1962) pp. 1073-1077. 2. J. P. Anderson, "Program Structures for Parallel Processing," CACM, vol. 8, (December 1965) pp. 786-788. 3. 1. P. Anderson, et aI, "D 825-A MultipleComputer System for Command and Control," AFIPS, volume 22, Proc. FICC, Spartan Books, Washington, D.C., 1962, pp. 86-96. 4. J. R. Ball, et aI, "On the Use of the SOLOMON Parallel-Processing Computer," AFIPS, volume 22, Proc. FICC, Spartan Books, Washington, D.C., 1962, p. 137. 5. G. P. Bergin, "Method of Control for Re-entrant Routines," AFIPS, volume 26, Proc. FICC, Spartan Books, Washington, D.C., 1964, pp. 45-55. 6. S. Boilen, et aI, "A Time-Sharing Debugging System for a Small Computer," AFIPS, volume 23, Proc. SICC, Spartan Books, Washington, D.C., 1963, p. 51. 7. F. P. Brooks, Jr., "The Future of Computer Architecture," Proc. IFIP, volume 1, 1965, p. 87. 8. E. Bryan, "The Dynamic Characteristics of Computer Programs," Rand Corporation, 1964 (Paper to SHARE XXII). 9. W. Buckholz (Ed.), Planning a Computer . System, McGraw-Hill, New York, 1962. 10. A. W. Burks, The Logic 0/ Fixed and Growing Automata, University of Michigan, Eng. Res. Inst., Ann Arbor, 1957. 11. Chang and D. J. Wong, "Analysis of Real-TIme Programming," lournal ACM, vol. 12, (1965) p. 581. 12. T. E. Cheatham and G. F. Leonard "An Introduction to t?e CL-II Programming' System," Computer Assoclates-63-7-SD, November 1963. 13. E. F. Codd, "Multiprogram Scheduling," CACM, (June & July 1963) pp. 347 and 413. 14. E. F. Codd, "Multiprogramming STRETCH: Feasibility Considerations," CACM, vol. 2, (November 1959) pp. 13-17. 15. E. G. Coffman, Jr., et aI, "A General-Purpose Time-Sharing System," AFIPS, volume 25, Yi. 659 Proc. SICC, Spartan Books, Washington, D.C., 1964. 16. M. Phyllis Cole, et aI, "Operational Software in a Disc Oriented System," AFIPS, volume 26, Proc. FICC, Spartan Books, Washington, D.C., 1964, p. 351. 17. M. E. Conway, "A Multi-Processor System Design," AFIPS, volume 24, Proc. FICC, Spartan Books, Washington, D.C., 1963, pp. 139-146. 18. M. E. Conway, "Design of a Separate Transition Diagram Compiler," CACM, vol. 6, (July 1963) pp. 396-408. 19. F. J. Corbat6, et aI, "An Experimental TimeSharing System," Proc. SICC, vol. 21, (1962) pp. 335-344. 20. J. B. Dennis, "Segmentation and the Design of Multi-Programmed Computer Systems," lournal ACM, vol. 12, (Oct. 1965) p. 589. 21. E. W. Dijkstra, "Solution of a Problem in Concurrent Programming Control," CACM, vol. 8, (Sept. 1965) p. 569. 22. Letter: "A Problem in Concurrent Programming Control," CACM, vol. 8, (Sept. 1965) p. 569. 23. P. Dreyfus, "System' Design of the Gamma 60," Proc. WICC, 1958, p. 130. 24. S. Fernback, "Computers in the USA-Today and Tomorrow," Proc. IFIP, vol. 1, 1965, p. 77. 25. D. R. Fitzwater and E. J. Schweppe, "Consequent Procedures in Conventional Computers," AFIPS, volume 26, Proc. FICC, Spartan Books, Washington, D.C., 1964, pp. 465-76. 26. S. Gill, "Parallel Programming," Computer lournal, vol. 1, (1958) p. 2. 27. A.C. D. Haley, "The KDF. 9. Computer System," AFIPS, volume 22, Proc. FICC, Spartan Books, Washington, D.C., 1962, p. 108. 28. J. H. Holland, "The Universal Computer Capable of Executing an Arbitrary Number of SubPrograms Simultaneously," Proc. EICC, 1959, p. 108-113. 29. J. H. Holland, "Iterative Circuit Computers," Proc. WICC, 1960, p. 259. 30. T. J. Howarth, et aI, "The Manchester University Atlas Operating System, Part II: User's Description," Computer lournal, vol. 4, (1961) p. 226. 31. D. J. Howarth, "Experience with the Atlas Scheduling System," AFIPS, volume 23, Proc. SICC, Spartan Books, Washington, D.C., 1963, p. 59. 32. E. T. Irons, "A Rapid Turnaround MultiProgramming System," CACM, vol. 8, (March 1965) p. 152-7. 33. T. Kilburn, et aI, "The Manchester University Atlas Operating System, Part I: Internal Organization," Computer lournal, vol. 4, (1961) p. 222. From the collection of the Computer History Museum (www.computerhistory.org) 660 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1966 34. T. Kilburn, et aI, "The Atlas Supervisor," Proc. EJCC, vol. 20, 1961, pp. 279-294. 35. D. E. Knuth, Letter: "Additional Comments on a Problem in Concurrent Programming Control," CACM, vol. 9, (May 1966) p. 321. 36.N. Landis, et aI, "Initial Experience with an Operating Multi-Programming System," CACM, vol. 5, (May, 1962) pp. 282-286. 37. A. L. Leiner, et aI, "Concurrently Operating CDmputer Systems," ICIP, 1959, B. 12. 17. pp. 353-361. 38. A. L. Leiner, et aI, "Organizing a Network Df Computers to Meet Deadlines," Proc. EJCC, 1957, pp. 115-128. 39. G. F. LeDnard and J. R. Goodroe, "An Environment fDr an Operating System," ACM Proc. 64, E.2.3-1. 40. F. M. Marcotty, et aI, "Time-Sharing Dn the Ferranti-Packard F. P. Good Computer System," AFIPS, volume 23, Proc. SJCC, Spartan BODks, Washington, D.C., 1963, p. 29. 41. J. L. McKenney, "Simultaneous Processing of JDbs Dn an Electronic Computer,'" Management Sci. VDI. 8, (April, 1962) pp. 344-54. 42. M. R. Mills, "Operational Experience Df Time Sharing and Parallel Processing," Computer Journal, vol. 6, (1963) p. 28. 43. M. R. Nekora, "Comment on a Paper Dn Parallel Processing," CACM, vol. 4, (Feb. 1961) p. 103. 44. J. Nievergelt, "Parallel Methods for Integrating Ordinary Differential Equations," CACM, vol. 7, (Dec. 1964) pp. 731-33. 45. A. Opler, "PrDcedure Oriented Language Statements to Facilitate Parallel Processing," CACM, vol. 8, (May 1965) pp. 306-7. 46. G. E. Pickering, et aI, "Multi-Computer Programming for a Large Scale Real-Time Data Processing System," AFIPS, volume 25, Proc. SJCC, Spartan Books, WashingtDn, D.C., 1964, p. 445. 47. R. E. Porter, (Title unknown) Parallel Programming Conference, May 1960. 48. Jeanne Poyen, Private discussion with Opler, 1959. 49. G. Radin and H. P. Rogoway, "NPL: Highlights of a New Programming Language," CACM, vol. 8, (Jan. 1965) pp. 9-17. 50. M. R. Rosenberg, "Computer-Usage Accounting for Generalized Time-Sharing Systems," CACM, vol. 7, (May 1964) pp. 304-8. 51. W. F. Schmitt and A. B. Tonik, "Sympathetically Programmed Computers," ICIP, 1959, pp. 344-348. 52. E. S. Schwartz, "An Automatic Sequencing Procedure with Application to Parallel Programming," JACM, vol. 8, (April 1961) p. 513. 53. A. B. Shafritz, et aI, "Multi-Level Programming for a Real-Time System," Proc. EJCC, vol. 20, 1961, pp. 1-16. 54. D. L. Slotnick, et aI, "The SolDmon Computer," AFIPS, volume 22, Proc. FJCC, Spartan Books, WashingtDn, D.C., 1962, p. 97. 55. J. S. Squire and Sandra M. Palaiss, "Programming and Design CDnsideration Df a Highly Parallel CDmputer," AFIPS, volume 23, Proc. SJCC, Spartan Books, Washington, D.C., 1963, p. 395. 56. E. Strachey, "Time-Sharing in Large Fast Computers," ICIP, 1959, p. 336 and Computers & Automation, (Aug. 1959). 57. R. N. Thompson and J. A. Wilkinson, "The D825 Automatic Operating and Scheduling System," AFIPS, volume 23, Proc. SJCC, Spartan BODks, Washington, D.C., 1963, p. 41. 58. N. Wirth, Letter: "A NDte Dn Program Structures fDr Parallel Processing," CACM, vol. 9, (May 1966) p. 320. 59. L. D. Yarborough, "Some Thoughts Dn Parallel Processing," CACM, vol. 3, (Oct. 1960) p. 539. From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2026 Paperzz