A Selection theory and methodology for heterogeneous

Rochester Institute of Technology
RIT Scholar Works
Articles
1993
A Selection theory and methodology for
heterogeneous supercomputing
Song Chen
Mary Eshaghian
Ashfaq Khokhar
Follow this and additional works at: http://scholarworks.rit.edu/article
Recommended Citation
Proceedings of the Heterogeneous Processing Workshop, April 1993
This Article is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Articles by an authorized
administrator of RIT Scholar Works. For more information, please contact [email protected].
A Selection Theory and Methodology
for Heterogeneous Supercomputing
Song Chen, Mary M. Eshaghian
Ashfaq Khokhar, Muhammad E. Shaaban
Dept. of Computer & Information Science
New Jersey Institute of Technology
Newark, N J 07102
Dept. of Electrical Engineering
University of Southern California
Los Angeles, CA 90089
Abstract
are outlined in a survey paper by Khokhar et a1 [6].
The design of a methodology for mapping algorithms
in HSC environment is the main topic of our paper.
The Optimal Selection Theory is a mathematical
programming formulation for choosing the most appropriate suite of heterogeneous parallel and vector
supercomputers for a given task [4].This mathematical technique is based on the methodology of code profiling [4] and analytical benchmarking [4]. Code profiling is a code specific function to determine the code
types present in a given task. Analytical benchmarking provides a measure on how well the available machines perform on a given code type. The goal of the
optimal selection theory is to minimize the total execution time spent on all the code segments of a task,
subject to a k e d constraint such as cost. This theory was augmented recently [lo] by incorporating the
performance of code segments on non-optimal choices.
In this paper, we introduce Heterogeneous Optimal
Selection Theory (HOST) which incorporates heterogeneous parallelism embedded in tasks, and reflects
the costs associated with different fine grain mapping techniques available for a given architecture. The
modeling requirements of the input to HOST, as well
as system modeling requirements, are discussed. IIierarchical Cluster-M (HCM), a programming paradigm
based on Cluster-M [3] is introduced. IICM meets
the modeling requirements of IISC. A given task is
modeled in HCM as task specification taking into account decomposition of task into subtasks, code segments, heterogeneous parallelism embedded in the
task, and interdependencies present among different
components of the task. This specification is independent of any architecture, simplifying the mapping process. Utilizing HCM as a modeling platform, efficient
mapping strategies to map tasks onto the underlying
heterogeneous suite of computers are presented.
The rest of the paper is organized as follows. In sec-
In this paper, a methodology for mapping algorithms onto heterogeneous suite of supercomputers is
presented. A n approach for selecting an optimal suite
of computers for solving problems with diverse computational requirements, called Heterogeneous Optimal
Selection Theory (HOST), is presented. HOST is an
extension t o Augmented Optimal Selection Theory in
two ways: It incorporates heterogeneous parallelism
embedded in the tasks, and it reflects the costs associated in using various fine grain mapping strategies
at individual machine level. The proposed mapping
methodology is based on the Cluster-M programming
paradigm. For the mapping purpose, the input format,
assumed in HOST, is modeled in terms of Hierarchical Cluster-M specification and representation. For a
given problem, a hj'ierarchical Cluster-M specification
is generated t o indicate the execution of concurrent
tasks at different stizges of the computation. This specification is then mapped onto the Hierarchical ClusterM representation ofthe underlying heterogeneous suite
of supercomputers.
1
Introduction
The performance of a parallel algorithm is very
much dependent on the architecture it is mapped onto.
Also, many algorithms have diverse computational requirements which may not be well met when using
a single architecture. Heterogeneous SuperComputing (HSC) [6], and or Superconcurrency [5]deals with
the concurrent use of heterogeneous suite of parallel
machines in solving a given problem. This computational model has been recently studied by several
scientists [I, 3, 4,7, 8, 9, lo]. Various issues involved
in the design of tools and paradigms to support HSC
15
1066-1220193$3.000 1.993IEEE
I
W e Segment 1
I
W e Segment 2
MIMD, scalar, vector, etc. Similarly, each machine
type can have more than one model, e.g., Ncube and
Mesh for SIMD. Machine choices in OST are always
optimal and decomposition of code segments are uniform. According t o OST, there exists an assignment
such that the total time spent on all code segments is
minimized, subject t o a fixed constraint such as cost.
The assumption here is that there are always enough
number of machines of each type available to which a
code block can be assigned. Therefore, a code segment
can always be assigned to its matching machine type.
OST was augmented by Wang, et a1 [IO] to incorporate the performance of code segments on non-optimal
machine choices, assuming that the number of available machines for each type is limited. Under this
assumption, a code segment which is most suitable
for one type of machines may have to be assigned to
another type. For example, consider a code segment
consisting of 5 code blocks suitable for execution on
MIMD machines. Assuming there are 2 MIMD machines and 6 SIMD machines available, it is impossible
to assign all the blocks to MIMD machines. On the
other hand, assigning all 5 blocks to 5 SIMD machines
may result in the shortest execution time, or decomposing the segment into 2 blocks and assigning them
to 2 MIMD machines will give better performance. In
Augmented Optimal Selection Theory (AOST), nonoptimal machine choices and non-uniform decompositions of code segments are incorporated.
In the formulation of OST and AOST, it has been
assumed that the execution of all code segments of a
given task is totally ordered in time. Ilowever, there
may exist different execution interdependencies among
a set of code segments. Also, parallelism may be
present between code segments, resulting in a concurrent execution of several code blocks of different code
segments on a suite of heterogeneous machines. Furthermore, the effect of mapping techniques for different problems available on individual machines has not
been considered in the formulation of selection theory.
SIMD
MIMD
Task
t
Code Segment S
vector
v
Figure 1: Input format for OST and AOST
tion 2, HOST is presented. The Hierarchical ClusterM modeling of the input to HOST is discussed in section 3. In the following section, a mapping methodology based on HCM paradigm is proposed. The concluding remarks are included in section 5.
2
Heterogeneous Optimal
Theory (HOST)
Selection
Freund in [4] proposed an Optimal Selection Theory
(OST) to choose an optimal configuration of machines
for executing an application task on a heterogeneous
suite of computers with an assumption that the number of machines available is unlimited. An application
task is assumed to comprise of S non-overlapping code
segments. Each segment has homogeneous parallelism
embedded in its computations. Also, code segments
are considered to be executed serially. A code segment
is decomposable if it can be partitioned into different
code blocks (see Figure 1). All code blocks within a
code segment have same type of embedded parallelism
and can be executed concurrently.
The goal of OST is to assign the code blocks
within each homogeneous code segment to the available matching machine types such that it can be executed optimally. The execution time for a decomposable code segment is equal to the longest execution time among all the code blocks of that code segment. A machine type is identified based on the underlying machine architectures, for example, SIMD,
Consider the following example. Given a code segment consisting of two SIMD code blocks. One code
block sorts N elements and the other adds N elements.
Assume there are two SIMD machines available, a N
processor hypercube and a N processor Mesh Connected Computer (MCC). Efficient assignment of code
blocks to the two SIMD machines depends upon the
mapping techniques used on the machines for the
problem under consideration. The assignment problem becomes more interesting if the machines are of
different sizes in terms of number of processors and or
memory available with each processor.
16
t . For each code block 6 within code segment j , there
is a mapping m[t, E]. Let p[t,j] be mapping vector for
HOST, as presented in this section, is an extension
to AOST in two ways: It incorporates the effects of
various fine grain mapping techniques available on individual machines, and the task is assumed to have
heterogeneous embledded parallelism. The input format is relaxed to allow concurrent execution of mutually independent code segments. An application task
is divided into subtasks. Subtasks are executed serially. Each subtask may contain a collection of code
segments which can be executed in parallel. A code
segment consists of homogeneous parallel instructions.
Each code segment is further decomposed into several code blocks which can be executed concurrently.
These code blocks are to be assigned to the machines
of the same type. A general input format is illustrated
in Figure 2. Figure 1 is a special case of Figure 2,
where each subtask contains only one code segment.
In HOST, heterogeneous code blocks of different code
segments can be executed concurrently on different
types of machines, exploiting heterogeneous parallel
computations embedded in the application.
Let S be the number of code segments of the given
task, and M be the number of different machine types
to be considered. Ij’urthermore, let ~ [ t be
] the number of machine models of type t , a[t] be the number of mappings available on machine type t , and
,O[t,I] be the number of machines of model 1 of type
t available. Assume v[t,j] is the maximum number
of code blocks code segment j can be decomposed
into. Define r[t,j:lt o be the number of machines
of type t that are atctually used to execute code segment j. Therefore, *y[t, j] equals the minimumof v [ t , j ]
and the number of machines of type t available, i.e.,
code segment j on machine type t .
P[t,jI = (m[t,1 1 1 ~ [ ~ , 2 I , . . . , ~ [ ~ l 7 [ ~ , ~ l l ) ,
1 5 m[t,k] 5 cr[t].
With this mapping vector p on machine type t , we
have the execution time of segment j as follows:
P ’ X P ’,k
6[t?j,p1= m a x 1 5 k 5 7 [ t d s [ t , m [ t , ~ ~ x r ~ , j ] ? * [ r , k ] ’
Therefore, different mappings, p , available on machine type t result in different execution time of segment j. Let A[t, j] be the minimum execution time of
segment j among all the possible mappings on type t .
A[t,j] = minp[t,j] b [ t i j , ~ [ t , j l l *
Let the machine type selection vector r indicate
the selection of machine types for code segment 1 to
SI such that r = (t[l],t[2],...,t[Sl) . Define x[r] to
be the execution time of the given subtask with heterogeneous machine type selection 7 on all the code
segments, such that x[r] = maxlijisX[tlj],j], then
HOST is formulated as follows:
For any subtask, there exists a r with
minx[r]
7
subject to C E l ( m a x l < j l s y[t, jl) x 4 1 I
C
Based on this formulation, it is evident that given
a decomposition of the task, as shown in Figure 2, to
be executed on a desired heterogeneous suite of machines, an optimal execution time is achievable. In
the following section, we present a paradigm suitable
for modeling both the input format of the task and
the underlying heterogeneous suite of machines. This
modeling will then be utilized to develop a mapping
methodology.
m,
y[t,jI = min(C;!!
Mt,jI).
A parameter m[t,k] is defined to specify the effect
of mapping technique available for a code block k on
machine type t . Let’s further assume that for a particular mapping m on machine type t , the best matched
code segment can obtain the optimal speedup O[t,m]
in comparison to a baseline system. A real number
~ [ t , jindicates
]
how well a code segment j can be
matched with machine type t. A[t, k] is a utilization
factor when running code block k on a machine of type
t . We have 0 5 ~ [ t , 5
j ] 1 and 0 5 A[t, k] 5 1.
Let p[j] be the percentage of time spent executing code segment j within overall execution of a given
subtask on baseline machine. x:=lp[j]
= 1. Similar to the definition of plj], let plj, k] be the percentage of time spent executing code block k within overall execution of code segment j on baseline machine.
Cg!’ p l j , k] = 1.
Suppose code segment j is assigned to machine type
3
Modeling the Input to HOST
HOST, as described in the section above, is an existence proof for an optimal selection of processors for
a given task in HSC. In this section, we present a
tool for modeling the input to HOST. This modeling
paradigm is to be used in Section 4 as part of mapping
methodology.
3.1
Modeling requirements
The input formulation in HOST assumes that a parallel task T is divided into subtasks ti, 1
i 5 N.
Each subtask ti is further divided into code segments
tij, 1 _< j 5 S, which can be executed concurrently.
Each code segment within a subtask can belong to a
<
17
The Cluster-M parallel programming paradigm introduced in [3] meets most of the above requirements.
Cluster-M models a parallel task as a problem specification, independent of the underlying architecture.
However, Cluster-M has no provision to model the
heterogeneity present in the task. In the following section, we extend Cluster-M model to accommodate the
requirements of heterogeneous supercomputing. This
extended model is called Hierarchical Cluster-M.
different type of parallelism(i.e. SIMD, MIMD, vector,
etc.), and thus should ideally be mapped onto a machine with a matching type of parallelism. Each code
segment may further be decomposed into several concurrent code blocks with the same type of parallelism.
These code blocks t i j k , 1 5 k 5 B , are suited for
parallel execution on machines having the same type
of parallelism. This decomposition of the task into
subtasks, code segments, and code blocks is shown in
Figure 2.
3.2
Task 1
I
(COdeSegment.5
The two main components of Cluster-M model are:
the Cluster-M representation of an architecture, and
the Cluster-M specification of a parallel task. ClusterM exploits fine grain parallelism at the individual instruction and processor levels. The problem specification is carried out without considering the underlying
architecture making this model valuable for modeling
computations in heterogeneous supercomputing environment comprising several types of architectures. To
exploit multi-level parallelism in a task, Hierarchical
Cluster-M model is proposed as a more restricted form
of the Cluster-M model.
Hierarchical Cluster-M (HCM) exploits parallelism
at the subtask, code segment, code block, and instruction levels. This is accomplished by modifying
both the Cluster-M system representation and problem specification processes. The modification to the
system representation takes into account the presence
of several interconnected machines in the system, providing a spectrum of computational modes. The problem specification takes into account the type of parallelism present in each portion of the task.
(SIMD, MIMD, vector etc.)
Figure 2: Input format for HOST
A good model of this input format is needed to facilitate the mapping of tasks onto a heterogeneous architecture. In addition to modeling the input format,
the architecture being considered for the execution of
the task should also be modeled. Several requirements
for this model are identified as follows:
0
0
0
0
0
0
Hierarchical Cluster-M Model
The modeling of the input format should handle the decomposition of the task into subtasks,
code segments, and code blocks, while preserving
the information regarding the type of parallelism
present in each portion of the task. This is essential to match the type of each code block with a
suitable machine type in the system.
3.2.1
The model should handle parallelism at fine grain
and coarse grain levels.
Hierarchical Cluster-M system represent at ion
The Hierarchical Cluster-M representation of a system consists of two layers of clustering: sys-tem layer
and machine layer. System layer clustering consists
of several levels of nested clusters. At the lowest level
of clustering each machine in the system is assigned
a cluster by itself. Completely connected clusters are
merged to form the next level of clustering. This process is continued until no more merging is possible.
Machine layer clustering is obtained in a similar way
with individual processors replacing system machines
in the clustering process.
For a heterogeneous suite of interconnected computers, the HCM system representation is obtained as
follows:
Modeling of the input code should emphasize the
communication requirements of the various code
segments.
The modeling of the input code should be independent of the underlying architecture.
The modeling of the system should provide the
mode of computation of each machine in the system.
The interconnection topology of individual architectures should be systematically represented in
the model at both system and machine levels.
18
rv.rt.r
Figure 3: A heterogeneous parallel computing system.
Figure 4: Hierarchical Cluster-M representation of a
heterogeneous computing system.
1. The HCM system representation algorithm is first
applied to the system as a whole. At the first level
of clustering, each computer in the system is in
a cluster by itself. Each clustering level is constructed by merging clusters from the lower level
that are completely connected. This is continued
until no more clustering is possible. The resulting
clustering levells are called system level clusters.
requirements of the task. The Cluster-M specification
consists of several levels of clusters with the input-level
being lowest and the final result-level being highest.
At the lowest level, each cluster contains one computational operand. All initial clusters involved in a
computation are merged into one cluster in the next
clustering level. Clusters in intermediate levels are
merged, split, and/or their elements manipulated according to computation and communication requirements. Several essential Cluster-M constructs needed
to formulate the Cluster-M specification of a task are
discussed in [3].
The Cluster-M specification represents the communication needs of the problem at the instruction level
and has no provision to identify parallelism at higher
levels (i.e. subtask, code segment/block levels). This
specification can be written for any parallel problem
regardless of its communication or present computation types present.
The HCM task specification is obtained using
Cluster-M constructs. We assume here that the input
is a task T in a form similar to the input to IIOST,
i.e. the following has been done:
2. Each resulting cluster is labeled according to
the type of parallelism present in the cluster(i.e.
SIMD, MIMD, vector, etc.).
3. For each computer in the system apply the
Cluster-M system representation algorithm. At
the lowest level each processor in the computer
is in a cluster by itself. All completely connected
clusters are merged to form the next level of clustering. The highest level of clustering consists of
one clusters containing all processors in the computer. This results in the Cluster-M representation for each individual computer in the system.
Note that the collection of machine clusters at the
highest level are equivalent to the lowest system clustering level. A heterogeneous parallel computing system is shown in Figure 3, while its HCM representation is shown in Figure 4.
3.2.2
0
Hierarchical C l u s t e r - M problem specifi-
e
Each subtask is divided into several concurrent
code segments t i j , 1 5 j 5 S. The type of each
of each code segment has been identified.
0
Each code segment is further decomposed into
cation
The Cluster-M specification of a parallel task is a program that specifies communication and computation
19
The task is divided into sequential subtasks t i ,
l<i_<N.
several concurrent homogeneous code blocks
l<R<B.
tijk
The Hierarchical Cluster-M specification of task T
has several layers of clustering: subtask, code segment,
code block, and instruction clustering layers. HCM
specification of task T is computed as follows:
1. Subtask clustering layer:
At the subtask clustering layer, each subtask t i is
represented by single-cluster level i, with subtask
tl forming the lowest such level.
2. Code segment clustering layer:
0
0
For all subtask clustering levels, each level i
contains a number of clusters at the code
segment clustering layer. Each such cluster contains code segments t i j of subtask t i .
Each cluster is labeled with the parallelism
type of its corresponding code segment.
Code segment clusters, in the same subtask
clustering level i, are connected if results
from the clusters are used by a single cluster
of subtask clustering level i 1.
Figure 5: Hierarchical Cluster-M specification of input
to HOST
+
0
3. Code block clustering layer:
Each code segment cluster j in subtask clustering
level i contains several clusters at the code block
clustering layer. Each cluster in this layer corresponds to a code block t i j k . Each code block cluster is labeled with the type of parallelism present
in the block.
0
Each code block t i j k in the input to HOST is represented by a cluster in code block clustering level
R in code clustering level j and subtask level i .
Thus a one-to-one correspondence exists between
the decomposition levels of the input code to HOST
and its corresponding HCM specification. The HOST
input format assumes that subtask ti cannot start unless all the code of subtask ti-1 is completed. The
HCM specification make no such assumption. All possible inputs to HOST form a subset of all possible inputs that can be represented by HCM.
4. Instruction clustering layer:
For each cluster of the code-block clustering levels find its Cluster-M problem specification. This
step yields the lowest layer of HCM clustering,
namely the instruction-level clustering layer.
Note that if the input is comprised of only one subtask containing one code segment with one code block,
then the resulting HCM specification is identical to the
more general Cluster-M specification. This is due to
the fact that in such a case no code type restrictions
are imposed. The input to HOST as shown in Figure
2 and the corresponding HCM specification is shown
in Figure 5. From the above steps, each layer in HCM
specification corresponds to a decomposition level in
the input to HOST, as follows:
0
Each code segment t i j in the input to HOST is
represented by a cluster in code segment clustering level j in subtask clustering level i.
4
Mapping Methodology
A good strategy for mapping of a parallel computing application onto a system of interconnected processors aims at maximizing the utilization of the available processing and communication resources, leading
to faster execution time. This is traditionally accomplished by thorough analysis of the problem graph
in terms of computation blocks granularity and data
dependencies between such blocks. The system parameters, namely processor power and interconnection topology, are also carefully analyzed. The map-
Each subtask t i in the input to HOST is represented in HCM specification by level i in the subtask clustering layer containing one cluster corresponding to subtask i.
20
ping process then attempts to match each computation block with a system processor minimizing system
communication overhead (e.g. minimize the number
of system communication hops for each data dependency in the problem [Z]).
The Hierarchical Cluster-M (HCM) paradigm simplifies the mapping process by formulating the problem in the form of HCM problem specification emphasizing its communication requirements independently
from the target architecture. Similarly, the HCM representation of the yystem emphasizes processor interconnection patterns,. This results from the fact that
the clustering process is based on topology of the system. Once both, the HCM problem specification and
system representation are obtained, the mapping process at several levelf3 is carried out as follows:
0
0
0
0
1. Code segment cluster mapping:
- For each cluster in a code segment clustering
level of the task specification, find a system-level
cluster that matches the type of parallelism in
the segment. Assign code segment clusters to the
appropriate system clusters.
0
- If a system cluster with a matching type of parallelism is not found, then a cluster with the next
best type of parallelism is selected. This selection
is based on information collected from Analytical
Benchmarking.
- Code segment clusters that are connected in
the task specification are mapped onto connected
system level clusters. If appropriate connected
clusters are nol; found, then map each two connected segment clusters onto suitable system clusters with a minimum communication cost.
If the number of clusters C of level L
matches the number of processors N or clusters in a representation level, then each processor will be assigned one such cluster and
the problem specification proceeds as written.
If the number of processors N exceeds the
largest number of clusters C, a subset of processors matching C is used. The system representation of this subset is used.
If the number of processors in the system is
less than C, then each processor is assigned
[C/N] clusters. All Cluster-M operations
on the clusters assigned t o a processor are
performed internally in that processor.
If several system representation levels exist
with a matching number of clusters, then the
level with the closest matching cluster degree
of connectivity is selected.
If the cluster degree connectivity in the
problem specification is greater than that
of possible levels in the system representation, then the specification clustering level is
transformed to reduce cluster degree. This
transformation is accomplished by dividing
each cluster into several sub-clusters of the
same level with a lower degree of connectivity. Then this transformed level is mapped
as described above.
Once a representation clustering level has
been selected, adjacent specification clusters are mapped onto adjacent representation clusters, if possible.
0
- The above steps are repeated for all code segment clustering levels.
2. Code block cluster mapping:
Following the completion of code segment cluster
mapping for all levels. Code block clusters contained in each (code segment cluster are mapped
onto several subclusters contained in the corresponding system cluster. If a sufficient number of
subclusters is available then each code segment
cluster is mapped onto a system subcluster,
If more processors exist within a representation cluster than the number of elements
in the specification cluster, then computation results are not mapped onto an interior processor of the cluster; a processor
on the boundary of adjacent clusters is selected instead. This is necessary to minimize
the number of communication hops within a
cluster to deliver its computation result for
subsequent levels.
Note that the above mapping procedure is performed for each specification level independently from
other levels. The above procedure provides an efficient
mapping in terms of processor and communication link
utilization.
Several operations are frequently encountered in Hierarchical Cluster-M problem specifications. Macros
3. Instruction level cluster mapping:
For each. ipstruction clustering level, find all S Y 5
tem clustering levels with the closest matching
number of clusters. These levels are possible mapping level candidates. Several cases may arise:
21
are defined using basic HCM constructs t o represent
such common operations. The utilization of macros in
problem specifications instead of using low-level constructs simplifies mapping of specifications onto representations. The mapping of each defined macro is
done for each system representation only once. Whenever any defined macro is encountered in the problem
specification, the predetermined mapping for the architecture at hand is looked up from an HCM macro
mapping library.
5
Conclusions
In this paper, we have presented one of the very
first methodologies for mapping algorithms onto a heterogeneous suite of supercomputers. A Heterogeneous
Optimal Selection Theory (HOST) is formulated to
select an optimal suite of computers for solving problems with diverse computational requirements. This
HOST is an extension to the Augmented Optimal Selection Theory (AOST). The input format of application tasks, assumed in formulating the AOST, is
modified to incorporate heterogeneous parallelism embedded in the tasks. Also, new parameters have been
introduced to reflect the costs associated in using various fine grain mapping strategies.
A hierarchical Cluster-M (HCM) paradigm has
been introduced to model the application tasks and
underlying heterogeneous suite of supercomputers.
The HCM paradigm simplifies the mapping process
by formulating the problem in the form of ClusterM problem specification (task graph) emphasizing its
communication and computation requirements independently from the target architecture. For a given
problem, a Cluster-M specification is generated to indicate the execution of concurrent tasks at different
stages of the computation. This specification (task
graph) is then mapped onto the Cluster-M representation (system graph) of the underlying heterogeneous
suite of supercomputers.
Efficient algorithms for mapping specifications onto
representations are presented. Using this methodology, portable code can be generated, which is a very
much desired feature in a dynamically changing Heterogeneous Supercomputing environment.
References
[l] G. Agha and Rajendra Panwar. “An Actor-Based
Framework for Heterogeneous Computing Sys-
tems” . In Proc. Workshop on Heterogeneous Processing, pages 35-42, Mar. 1992.
[2] S. Bokhari. “Partitioning Problem in Parallel,
Pipelined, and Distributed Computing”. IEEE
Trans. on Computer, 37:48-57, January 1988.
[3] M. Eshaghian and R. F. Freund. “Cluster-M
Paradigms for High-order Heterogeneous Procedural Specification Computing”. In Proc. Workshop on Heterogeneous Processing, pages 47-49,
Mar. 1992.
[4] R. F. Freund. “Optimal Selection Theory for Superconcurrency” . In Supercomputing ’89, pages
699-703, NOV.1989.
[5] R. F. Freund and D.S. Conwell. Superconcurrency: “A Form of Distributed Heterogeneous Supercompu t ing” . Supercomput ing Re view, 3 :47-5 0 ,
Oct. 1990.
[6] A. Khokhar, V. K. Prasanna, M. Shaaban,
and C. Wang. “Heterogeneous Supercomputing:
Problems and Issues”. In Proc. Workshop on Heterogeneous Processing, pages 3-12, Mar. 1992.
[7] J. Mahdavi, G. L. Huntoon, and M. B. Mathis.
“Deployment of a HIPPI-based Distributed Supercomputing Environment at the Pittsburgh
Supercomputing Center”. In Proc. Workshop
on Heterogeneous Processing, pages 93-96, Mar.
1992.
[8] V. S. Sunderam.
“PVM: A Framework for
Parallel Distributed Computing”. Concurrency:
Practice and Experience, 2(4):315-339, December
1990.
[9] R. J. Vetter, D. H. C. Du, and A. E. Klietz. “Network Supercomputing: Experiment with a Cray-2
to CM-2 HIPPI Connection”. In Proc. Workshop
on Heterogeneous Processing, pages 87-92, Mar.
1992.
[lo]
M. Wang, S. Kim, M. Nichols, R. Freund, and
H. J. Siegel. “Augmenting the Optimal Selection
Theory for Superconcurrency” . In Proc. Workshop on Heterogeneous Processing, pages 13-21,
Mar. 1992.