Rochester Institute of Technology RIT Scholar Works Articles 1993 A Selection theory and methodology for heterogeneous supercomputing Song Chen Mary Eshaghian Ashfaq Khokhar Follow this and additional works at: http://scholarworks.rit.edu/article Recommended Citation Proceedings of the Heterogeneous Processing Workshop, April 1993 This Article is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Articles by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. A Selection Theory and Methodology for Heterogeneous Supercomputing Song Chen, Mary M. Eshaghian Ashfaq Khokhar, Muhammad E. Shaaban Dept. of Computer & Information Science New Jersey Institute of Technology Newark, N J 07102 Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089 Abstract are outlined in a survey paper by Khokhar et a1 [6]. The design of a methodology for mapping algorithms in HSC environment is the main topic of our paper. The Optimal Selection Theory is a mathematical programming formulation for choosing the most appropriate suite of heterogeneous parallel and vector supercomputers for a given task [4].This mathematical technique is based on the methodology of code profiling [4] and analytical benchmarking [4]. Code profiling is a code specific function to determine the code types present in a given task. Analytical benchmarking provides a measure on how well the available machines perform on a given code type. The goal of the optimal selection theory is to minimize the total execution time spent on all the code segments of a task, subject to a k e d constraint such as cost. This theory was augmented recently [lo] by incorporating the performance of code segments on non-optimal choices. In this paper, we introduce Heterogeneous Optimal Selection Theory (HOST) which incorporates heterogeneous parallelism embedded in tasks, and reflects the costs associated with different fine grain mapping techniques available for a given architecture. The modeling requirements of the input to HOST, as well as system modeling requirements, are discussed. IIierarchical Cluster-M (HCM), a programming paradigm based on Cluster-M [3] is introduced. IICM meets the modeling requirements of IISC. A given task is modeled in HCM as task specification taking into account decomposition of task into subtasks, code segments, heterogeneous parallelism embedded in the task, and interdependencies present among different components of the task. This specification is independent of any architecture, simplifying the mapping process. Utilizing HCM as a modeling platform, efficient mapping strategies to map tasks onto the underlying heterogeneous suite of computers are presented. The rest of the paper is organized as follows. In sec- In this paper, a methodology for mapping algorithms onto heterogeneous suite of supercomputers is presented. A n approach for selecting an optimal suite of computers for solving problems with diverse computational requirements, called Heterogeneous Optimal Selection Theory (HOST), is presented. HOST is an extension t o Augmented Optimal Selection Theory in two ways: It incorporates heterogeneous parallelism embedded in the tasks, and it reflects the costs associated in using various fine grain mapping strategies at individual machine level. The proposed mapping methodology is based on the Cluster-M programming paradigm. For the mapping purpose, the input format, assumed in HOST, is modeled in terms of Hierarchical Cluster-M specification and representation. For a given problem, a hj'ierarchical Cluster-M specification is generated t o indicate the execution of concurrent tasks at different stizges of the computation. This specification is then mapped onto the Hierarchical ClusterM representation ofthe underlying heterogeneous suite of supercomputers. 1 Introduction The performance of a parallel algorithm is very much dependent on the architecture it is mapped onto. Also, many algorithms have diverse computational requirements which may not be well met when using a single architecture. Heterogeneous SuperComputing (HSC) [6], and or Superconcurrency [5]deals with the concurrent use of heterogeneous suite of parallel machines in solving a given problem. This computational model has been recently studied by several scientists [I, 3, 4,7, 8, 9, lo]. Various issues involved in the design of tools and paradigms to support HSC 15 1066-1220193$3.000 1.993IEEE I W e Segment 1 I W e Segment 2 MIMD, scalar, vector, etc. Similarly, each machine type can have more than one model, e.g., Ncube and Mesh for SIMD. Machine choices in OST are always optimal and decomposition of code segments are uniform. According t o OST, there exists an assignment such that the total time spent on all code segments is minimized, subject t o a fixed constraint such as cost. The assumption here is that there are always enough number of machines of each type available to which a code block can be assigned. Therefore, a code segment can always be assigned to its matching machine type. OST was augmented by Wang, et a1 [IO] to incorporate the performance of code segments on non-optimal machine choices, assuming that the number of available machines for each type is limited. Under this assumption, a code segment which is most suitable for one type of machines may have to be assigned to another type. For example, consider a code segment consisting of 5 code blocks suitable for execution on MIMD machines. Assuming there are 2 MIMD machines and 6 SIMD machines available, it is impossible to assign all the blocks to MIMD machines. On the other hand, assigning all 5 blocks to 5 SIMD machines may result in the shortest execution time, or decomposing the segment into 2 blocks and assigning them to 2 MIMD machines will give better performance. In Augmented Optimal Selection Theory (AOST), nonoptimal machine choices and non-uniform decompositions of code segments are incorporated. In the formulation of OST and AOST, it has been assumed that the execution of all code segments of a given task is totally ordered in time. Ilowever, there may exist different execution interdependencies among a set of code segments. Also, parallelism may be present between code segments, resulting in a concurrent execution of several code blocks of different code segments on a suite of heterogeneous machines. Furthermore, the effect of mapping techniques for different problems available on individual machines has not been considered in the formulation of selection theory. SIMD MIMD Task t Code Segment S vector v Figure 1: Input format for OST and AOST tion 2, HOST is presented. The Hierarchical ClusterM modeling of the input to HOST is discussed in section 3. In the following section, a mapping methodology based on HCM paradigm is proposed. The concluding remarks are included in section 5. 2 Heterogeneous Optimal Theory (HOST) Selection Freund in [4] proposed an Optimal Selection Theory (OST) to choose an optimal configuration of machines for executing an application task on a heterogeneous suite of computers with an assumption that the number of machines available is unlimited. An application task is assumed to comprise of S non-overlapping code segments. Each segment has homogeneous parallelism embedded in its computations. Also, code segments are considered to be executed serially. A code segment is decomposable if it can be partitioned into different code blocks (see Figure 1). All code blocks within a code segment have same type of embedded parallelism and can be executed concurrently. The goal of OST is to assign the code blocks within each homogeneous code segment to the available matching machine types such that it can be executed optimally. The execution time for a decomposable code segment is equal to the longest execution time among all the code blocks of that code segment. A machine type is identified based on the underlying machine architectures, for example, SIMD, Consider the following example. Given a code segment consisting of two SIMD code blocks. One code block sorts N elements and the other adds N elements. Assume there are two SIMD machines available, a N processor hypercube and a N processor Mesh Connected Computer (MCC). Efficient assignment of code blocks to the two SIMD machines depends upon the mapping techniques used on the machines for the problem under consideration. The assignment problem becomes more interesting if the machines are of different sizes in terms of number of processors and or memory available with each processor. 16 t . For each code block 6 within code segment j , there is a mapping m[t, E]. Let p[t,j] be mapping vector for HOST, as presented in this section, is an extension to AOST in two ways: It incorporates the effects of various fine grain mapping techniques available on individual machines, and the task is assumed to have heterogeneous embledded parallelism. The input format is relaxed to allow concurrent execution of mutually independent code segments. An application task is divided into subtasks. Subtasks are executed serially. Each subtask may contain a collection of code segments which can be executed in parallel. A code segment consists of homogeneous parallel instructions. Each code segment is further decomposed into several code blocks which can be executed concurrently. These code blocks are to be assigned to the machines of the same type. A general input format is illustrated in Figure 2. Figure 1 is a special case of Figure 2, where each subtask contains only one code segment. In HOST, heterogeneous code blocks of different code segments can be executed concurrently on different types of machines, exploiting heterogeneous parallel computations embedded in the application. Let S be the number of code segments of the given task, and M be the number of different machine types to be considered. Ij’urthermore, let ~ [ t be ] the number of machine models of type t , a[t] be the number of mappings available on machine type t , and ,O[t,I] be the number of machines of model 1 of type t available. Assume v[t,j] is the maximum number of code blocks code segment j can be decomposed into. Define r[t,j:lt o be the number of machines of type t that are atctually used to execute code segment j. Therefore, *y[t, j] equals the minimumof v [ t , j ] and the number of machines of type t available, i.e., code segment j on machine type t . P[t,jI = (m[t,1 1 1 ~ [ ~ , 2 I , . . . , ~ [ ~ l 7 [ ~ , ~ l l ) , 1 5 m[t,k] 5 cr[t]. With this mapping vector p on machine type t , we have the execution time of segment j as follows: P ’ X P ’,k 6[t?j,p1= m a x 1 5 k 5 7 [ t d s [ t , m [ t , ~ ~ x r ~ , j ] ? * [ r , k ] ’ Therefore, different mappings, p , available on machine type t result in different execution time of segment j. Let A[t, j] be the minimum execution time of segment j among all the possible mappings on type t . A[t,j] = minp[t,j] b [ t i j , ~ [ t , j l l * Let the machine type selection vector r indicate the selection of machine types for code segment 1 to SI such that r = (t[l],t[2],...,t[Sl) . Define x[r] to be the execution time of the given subtask with heterogeneous machine type selection 7 on all the code segments, such that x[r] = maxlijisX[tlj],j], then HOST is formulated as follows: For any subtask, there exists a r with minx[r] 7 subject to C E l ( m a x l < j l s y[t, jl) x 4 1 I C Based on this formulation, it is evident that given a decomposition of the task, as shown in Figure 2, to be executed on a desired heterogeneous suite of machines, an optimal execution time is achievable. In the following section, we present a paradigm suitable for modeling both the input format of the task and the underlying heterogeneous suite of machines. This modeling will then be utilized to develop a mapping methodology. m, y[t,jI = min(C;!! Mt,jI). A parameter m[t,k] is defined to specify the effect of mapping technique available for a code block k on machine type t . Let’s further assume that for a particular mapping m on machine type t , the best matched code segment can obtain the optimal speedup O[t,m] in comparison to a baseline system. A real number ~ [ t , jindicates ] how well a code segment j can be matched with machine type t. A[t, k] is a utilization factor when running code block k on a machine of type t . We have 0 5 ~ [ t , 5 j ] 1 and 0 5 A[t, k] 5 1. Let p[j] be the percentage of time spent executing code segment j within overall execution of a given subtask on baseline machine. x:=lp[j] = 1. Similar to the definition of plj], let plj, k] be the percentage of time spent executing code block k within overall execution of code segment j on baseline machine. Cg!’ p l j , k] = 1. Suppose code segment j is assigned to machine type 3 Modeling the Input to HOST HOST, as described in the section above, is an existence proof for an optimal selection of processors for a given task in HSC. In this section, we present a tool for modeling the input to HOST. This modeling paradigm is to be used in Section 4 as part of mapping methodology. 3.1 Modeling requirements The input formulation in HOST assumes that a parallel task T is divided into subtasks ti, 1 i 5 N. Each subtask ti is further divided into code segments tij, 1 _< j 5 S, which can be executed concurrently. Each code segment within a subtask can belong to a < 17 The Cluster-M parallel programming paradigm introduced in [3] meets most of the above requirements. Cluster-M models a parallel task as a problem specification, independent of the underlying architecture. However, Cluster-M has no provision to model the heterogeneity present in the task. In the following section, we extend Cluster-M model to accommodate the requirements of heterogeneous supercomputing. This extended model is called Hierarchical Cluster-M. different type of parallelism(i.e. SIMD, MIMD, vector, etc.), and thus should ideally be mapped onto a machine with a matching type of parallelism. Each code segment may further be decomposed into several concurrent code blocks with the same type of parallelism. These code blocks t i j k , 1 5 k 5 B , are suited for parallel execution on machines having the same type of parallelism. This decomposition of the task into subtasks, code segments, and code blocks is shown in Figure 2. 3.2 Task 1 I (COdeSegment.5 The two main components of Cluster-M model are: the Cluster-M representation of an architecture, and the Cluster-M specification of a parallel task. ClusterM exploits fine grain parallelism at the individual instruction and processor levels. The problem specification is carried out without considering the underlying architecture making this model valuable for modeling computations in heterogeneous supercomputing environment comprising several types of architectures. To exploit multi-level parallelism in a task, Hierarchical Cluster-M model is proposed as a more restricted form of the Cluster-M model. Hierarchical Cluster-M (HCM) exploits parallelism at the subtask, code segment, code block, and instruction levels. This is accomplished by modifying both the Cluster-M system representation and problem specification processes. The modification to the system representation takes into account the presence of several interconnected machines in the system, providing a spectrum of computational modes. The problem specification takes into account the type of parallelism present in each portion of the task. (SIMD, MIMD, vector etc.) Figure 2: Input format for HOST A good model of this input format is needed to facilitate the mapping of tasks onto a heterogeneous architecture. In addition to modeling the input format, the architecture being considered for the execution of the task should also be modeled. Several requirements for this model are identified as follows: 0 0 0 0 0 0 Hierarchical Cluster-M Model The modeling of the input format should handle the decomposition of the task into subtasks, code segments, and code blocks, while preserving the information regarding the type of parallelism present in each portion of the task. This is essential to match the type of each code block with a suitable machine type in the system. 3.2.1 The model should handle parallelism at fine grain and coarse grain levels. Hierarchical Cluster-M system represent at ion The Hierarchical Cluster-M representation of a system consists of two layers of clustering: sys-tem layer and machine layer. System layer clustering consists of several levels of nested clusters. At the lowest level of clustering each machine in the system is assigned a cluster by itself. Completely connected clusters are merged to form the next level of clustering. This process is continued until no more merging is possible. Machine layer clustering is obtained in a similar way with individual processors replacing system machines in the clustering process. For a heterogeneous suite of interconnected computers, the HCM system representation is obtained as follows: Modeling of the input code should emphasize the communication requirements of the various code segments. The modeling of the input code should be independent of the underlying architecture. The modeling of the system should provide the mode of computation of each machine in the system. The interconnection topology of individual architectures should be systematically represented in the model at both system and machine levels. 18 rv.rt.r Figure 3: A heterogeneous parallel computing system. Figure 4: Hierarchical Cluster-M representation of a heterogeneous computing system. 1. The HCM system representation algorithm is first applied to the system as a whole. At the first level of clustering, each computer in the system is in a cluster by itself. Each clustering level is constructed by merging clusters from the lower level that are completely connected. This is continued until no more clustering is possible. The resulting clustering levells are called system level clusters. requirements of the task. The Cluster-M specification consists of several levels of clusters with the input-level being lowest and the final result-level being highest. At the lowest level, each cluster contains one computational operand. All initial clusters involved in a computation are merged into one cluster in the next clustering level. Clusters in intermediate levels are merged, split, and/or their elements manipulated according to computation and communication requirements. Several essential Cluster-M constructs needed to formulate the Cluster-M specification of a task are discussed in [3]. The Cluster-M specification represents the communication needs of the problem at the instruction level and has no provision to identify parallelism at higher levels (i.e. subtask, code segment/block levels). This specification can be written for any parallel problem regardless of its communication or present computation types present. The HCM task specification is obtained using Cluster-M constructs. We assume here that the input is a task T in a form similar to the input to IIOST, i.e. the following has been done: 2. Each resulting cluster is labeled according to the type of parallelism present in the cluster(i.e. SIMD, MIMD, vector, etc.). 3. For each computer in the system apply the Cluster-M system representation algorithm. At the lowest level each processor in the computer is in a cluster by itself. All completely connected clusters are merged to form the next level of clustering. The highest level of clustering consists of one clusters containing all processors in the computer. This results in the Cluster-M representation for each individual computer in the system. Note that the collection of machine clusters at the highest level are equivalent to the lowest system clustering level. A heterogeneous parallel computing system is shown in Figure 3, while its HCM representation is shown in Figure 4. 3.2.2 0 Hierarchical C l u s t e r - M problem specifi- e Each subtask is divided into several concurrent code segments t i j , 1 5 j 5 S. The type of each of each code segment has been identified. 0 Each code segment is further decomposed into cation The Cluster-M specification of a parallel task is a program that specifies communication and computation 19 The task is divided into sequential subtasks t i , l<i_<N. several concurrent homogeneous code blocks l<R<B. tijk The Hierarchical Cluster-M specification of task T has several layers of clustering: subtask, code segment, code block, and instruction clustering layers. HCM specification of task T is computed as follows: 1. Subtask clustering layer: At the subtask clustering layer, each subtask t i is represented by single-cluster level i, with subtask tl forming the lowest such level. 2. Code segment clustering layer: 0 0 For all subtask clustering levels, each level i contains a number of clusters at the code segment clustering layer. Each such cluster contains code segments t i j of subtask t i . Each cluster is labeled with the parallelism type of its corresponding code segment. Code segment clusters, in the same subtask clustering level i, are connected if results from the clusters are used by a single cluster of subtask clustering level i 1. Figure 5: Hierarchical Cluster-M specification of input to HOST + 0 3. Code block clustering layer: Each code segment cluster j in subtask clustering level i contains several clusters at the code block clustering layer. Each cluster in this layer corresponds to a code block t i j k . Each code block cluster is labeled with the type of parallelism present in the block. 0 Each code block t i j k in the input to HOST is represented by a cluster in code block clustering level R in code clustering level j and subtask level i . Thus a one-to-one correspondence exists between the decomposition levels of the input code to HOST and its corresponding HCM specification. The HOST input format assumes that subtask ti cannot start unless all the code of subtask ti-1 is completed. The HCM specification make no such assumption. All possible inputs to HOST form a subset of all possible inputs that can be represented by HCM. 4. Instruction clustering layer: For each cluster of the code-block clustering levels find its Cluster-M problem specification. This step yields the lowest layer of HCM clustering, namely the instruction-level clustering layer. Note that if the input is comprised of only one subtask containing one code segment with one code block, then the resulting HCM specification is identical to the more general Cluster-M specification. This is due to the fact that in such a case no code type restrictions are imposed. The input to HOST as shown in Figure 2 and the corresponding HCM specification is shown in Figure 5. From the above steps, each layer in HCM specification corresponds to a decomposition level in the input to HOST, as follows: 0 Each code segment t i j in the input to HOST is represented by a cluster in code segment clustering level j in subtask clustering level i. 4 Mapping Methodology A good strategy for mapping of a parallel computing application onto a system of interconnected processors aims at maximizing the utilization of the available processing and communication resources, leading to faster execution time. This is traditionally accomplished by thorough analysis of the problem graph in terms of computation blocks granularity and data dependencies between such blocks. The system parameters, namely processor power and interconnection topology, are also carefully analyzed. The map- Each subtask t i in the input to HOST is represented in HCM specification by level i in the subtask clustering layer containing one cluster corresponding to subtask i. 20 ping process then attempts to match each computation block with a system processor minimizing system communication overhead (e.g. minimize the number of system communication hops for each data dependency in the problem [Z]). The Hierarchical Cluster-M (HCM) paradigm simplifies the mapping process by formulating the problem in the form of HCM problem specification emphasizing its communication requirements independently from the target architecture. Similarly, the HCM representation of the yystem emphasizes processor interconnection patterns,. This results from the fact that the clustering process is based on topology of the system. Once both, the HCM problem specification and system representation are obtained, the mapping process at several levelf3 is carried out as follows: 0 0 0 0 1. Code segment cluster mapping: - For each cluster in a code segment clustering level of the task specification, find a system-level cluster that matches the type of parallelism in the segment. Assign code segment clusters to the appropriate system clusters. 0 - If a system cluster with a matching type of parallelism is not found, then a cluster with the next best type of parallelism is selected. This selection is based on information collected from Analytical Benchmarking. - Code segment clusters that are connected in the task specification are mapped onto connected system level clusters. If appropriate connected clusters are nol; found, then map each two connected segment clusters onto suitable system clusters with a minimum communication cost. If the number of clusters C of level L matches the number of processors N or clusters in a representation level, then each processor will be assigned one such cluster and the problem specification proceeds as written. If the number of processors N exceeds the largest number of clusters C, a subset of processors matching C is used. The system representation of this subset is used. If the number of processors in the system is less than C, then each processor is assigned [C/N] clusters. All Cluster-M operations on the clusters assigned t o a processor are performed internally in that processor. If several system representation levels exist with a matching number of clusters, then the level with the closest matching cluster degree of connectivity is selected. If the cluster degree connectivity in the problem specification is greater than that of possible levels in the system representation, then the specification clustering level is transformed to reduce cluster degree. This transformation is accomplished by dividing each cluster into several sub-clusters of the same level with a lower degree of connectivity. Then this transformed level is mapped as described above. Once a representation clustering level has been selected, adjacent specification clusters are mapped onto adjacent representation clusters, if possible. 0 - The above steps are repeated for all code segment clustering levels. 2. Code block cluster mapping: Following the completion of code segment cluster mapping for all levels. Code block clusters contained in each (code segment cluster are mapped onto several subclusters contained in the corresponding system cluster. If a sufficient number of subclusters is available then each code segment cluster is mapped onto a system subcluster, If more processors exist within a representation cluster than the number of elements in the specification cluster, then computation results are not mapped onto an interior processor of the cluster; a processor on the boundary of adjacent clusters is selected instead. This is necessary to minimize the number of communication hops within a cluster to deliver its computation result for subsequent levels. Note that the above mapping procedure is performed for each specification level independently from other levels. The above procedure provides an efficient mapping in terms of processor and communication link utilization. Several operations are frequently encountered in Hierarchical Cluster-M problem specifications. Macros 3. Instruction level cluster mapping: For each. ipstruction clustering level, find all S Y 5 tem clustering levels with the closest matching number of clusters. These levels are possible mapping level candidates. Several cases may arise: 21 are defined using basic HCM constructs t o represent such common operations. The utilization of macros in problem specifications instead of using low-level constructs simplifies mapping of specifications onto representations. The mapping of each defined macro is done for each system representation only once. Whenever any defined macro is encountered in the problem specification, the predetermined mapping for the architecture at hand is looked up from an HCM macro mapping library. 5 Conclusions In this paper, we have presented one of the very first methodologies for mapping algorithms onto a heterogeneous suite of supercomputers. A Heterogeneous Optimal Selection Theory (HOST) is formulated to select an optimal suite of computers for solving problems with diverse computational requirements. This HOST is an extension to the Augmented Optimal Selection Theory (AOST). The input format of application tasks, assumed in formulating the AOST, is modified to incorporate heterogeneous parallelism embedded in the tasks. Also, new parameters have been introduced to reflect the costs associated in using various fine grain mapping strategies. A hierarchical Cluster-M (HCM) paradigm has been introduced to model the application tasks and underlying heterogeneous suite of supercomputers. The HCM paradigm simplifies the mapping process by formulating the problem in the form of ClusterM problem specification (task graph) emphasizing its communication and computation requirements independently from the target architecture. For a given problem, a Cluster-M specification is generated to indicate the execution of concurrent tasks at different stages of the computation. This specification (task graph) is then mapped onto the Cluster-M representation (system graph) of the underlying heterogeneous suite of supercomputers. Efficient algorithms for mapping specifications onto representations are presented. Using this methodology, portable code can be generated, which is a very much desired feature in a dynamically changing Heterogeneous Supercomputing environment. References [l] G. Agha and Rajendra Panwar. “An Actor-Based Framework for Heterogeneous Computing Sys- tems” . In Proc. Workshop on Heterogeneous Processing, pages 35-42, Mar. 1992. [2] S. Bokhari. “Partitioning Problem in Parallel, Pipelined, and Distributed Computing”. IEEE Trans. on Computer, 37:48-57, January 1988. [3] M. Eshaghian and R. F. Freund. “Cluster-M Paradigms for High-order Heterogeneous Procedural Specification Computing”. In Proc. Workshop on Heterogeneous Processing, pages 47-49, Mar. 1992. [4] R. F. Freund. “Optimal Selection Theory for Superconcurrency” . In Supercomputing ’89, pages 699-703, NOV.1989. [5] R. F. Freund and D.S. Conwell. Superconcurrency: “A Form of Distributed Heterogeneous Supercompu t ing” . Supercomput ing Re view, 3 :47-5 0 , Oct. 1990. [6] A. Khokhar, V. K. Prasanna, M. Shaaban, and C. Wang. “Heterogeneous Supercomputing: Problems and Issues”. In Proc. Workshop on Heterogeneous Processing, pages 3-12, Mar. 1992. [7] J. Mahdavi, G. L. Huntoon, and M. B. Mathis. “Deployment of a HIPPI-based Distributed Supercomputing Environment at the Pittsburgh Supercomputing Center”. In Proc. Workshop on Heterogeneous Processing, pages 93-96, Mar. 1992. [8] V. S. Sunderam. “PVM: A Framework for Parallel Distributed Computing”. Concurrency: Practice and Experience, 2(4):315-339, December 1990. [9] R. J. Vetter, D. H. C. Du, and A. E. Klietz. “Network Supercomputing: Experiment with a Cray-2 to CM-2 HIPPI Connection”. In Proc. Workshop on Heterogeneous Processing, pages 87-92, Mar. 1992. [lo] M. Wang, S. Kim, M. Nichols, R. Freund, and H. J. Siegel. “Augmenting the Optimal Selection Theory for Superconcurrency” . In Proc. Workshop on Heterogeneous Processing, pages 13-21, Mar. 1992.
© Copyright 2026 Paperzz