Compressing Cube-Connected Cycles and Butter y Networks y

Compressing Cube-Connected Cycles
and Buttery Networks y
Ralf Klasing, Reinhard Luling, Burkhard Monien
Universitat-GH Paderborn, FB 17,
Warburger Str. 100, W-4790 Paderborn, Germany
e-mail : [email protected], [email protected], [email protected]
Abstract
We consider the simulation of large cube-connected cycles (CCC ) and large
buttery networks (BFN ) on smaller ones, a problem that arises when algorithms
designed for an architecture of an ideal size are to be executed on an existing architecture of a xed size. We show that large CCC 's and BFN 's can be embedded
into smaller networks of the same type with (a) dilation 2 and optimum load, (b)
dilation 1 and optimum load in most cases, (c) dilation 1 and nearly optimum
load in all cases. Our results show that large CCC 's and BFN 's can be simulated
very eciently on smaller ones. Additionally, we implemented our algorithm for
compressing CCC 's and ran several experiments on a Transputer network, which
showed that our technique also behaves very well from a practical point of view.
A preliminary version of these results appears in: Proc. 2nd IEEE Symposium on Parallel and
Distributed Processing (1990), pp. 858-865.
yThis work was supported by grant Mo 285/4-1 from the German Research Association (DFG).
1 Introduction
Over the past few years, a lot of research has been done in the eld of interconnection
networks for parallel computer architectures (for a survey, cf. [15], [20]), as most of these
architectures can actually be realized in hardware (e.g. as a network of Transputers).
Much of the work has been focused on the capability of certain networks to simulate
other network or algorithm structures, in order to execute parallel algorithms of a special
structure eciently on dierent processor networks (as some outstanding work, see e.g. [2],
[14]). But the problem generally neglected is that most of the existing algorithms are
designed for arbitrarily large networks (see e.g. [18, 21, 22]), whereas, in practice, the
processor network will be xed and of smaller size. Thus, the larger network must be
simulated in an ecient way (i.e. needing little simulation time) on the smaller target
network.
Solutions to this problem, which is commonly modeled as a graph embedding problem,
have been proposed so far for common network structures like hypercubes, binary trees,
meshes, shue-exchange networks, deBruijn networks, etc. in [1, 3, 4, 5, 6, 7, 8, 10,
16, 17]. So far, only partial results are known about two classes of networks which are
very important for practical purposes, namely the cube-connected cycles (CCC ) { as
introduced in [18] { and the buttery network (BFN ).
In [4], [8], and [17], embeddings with optimum dilation and load are presented in the
case of embedding CCC 's and BFN 's of dimension l into k where kjl. The authors
also restrict themselves to special kinds of embeddings of a very regular structure, like
coverings [4], homogeneous emulations [8], and homomorphisms [17]. Because of the very
restricted nature, Bodlaender [4] and Peine [17] are also able to classify their embeddings
completely.
In [3], a general procedure is described for mapping parallel algorithms into parallel architectures. This procedure is applied to the CCC network achieving dilation 1, but very
high load. Also, only special kinds of embeddings, so-called contractions, are considered.
This paper investigates the embedding problem for CCC and BFN taking into account
general embedding functions and any possible network dimension. The central statement
derived is:
Large CCC 's and BFN 's can be simulated very eciently (almost optimally)
on smaller ones.
In more detail, we prove for the cube-connected cycles network that CCC (l) can be embedded into CCC (k), l > k, with
(a) dilation 2 and optimum load, i.e.
l
l l?k
k2
m
.
(b) dilation 1 and optimum load, if kl 2.
(c) dilation 1 and optimum load, for certain values of l, k, if kl < 2.
(d) dilation 1 and "nearly" optimum load, for all other values of l, k, if kl < 2.
2
More precisely, the load in cases (c) and (d) is
l
i?1 2l?k
i
2
m
for
i?3
i?1
2
< kl i?1 ;
i
2
2 i k:
For the buttery network, we show that BFN (l) can be embedded into BFN (k), l > k,
with (a) { (d) like above. Here, the load in cases (c) and (d) is specied by
l
i?2 2l?k
li
m
5 l?k
2
2
3
m
for
for
i?4
i?1
l
k
2
< kl :
5
3
i?2 ;
i
2
7 i 2k;
The general strategy of the embeddings is to map 2l?k cycles in CCC (l)=BFN (l) of length
l onto a cycle in CCC (k)/BFN (k) of length k and to distribute their nodes as evenly as
possible on the new cycle. A specication or a variation of this general idea will yield a
lot of the results above. But in one important case, namely the dilation 1 embedding of
BFN (l) into BFN (k) for kl < 2, this construction is not powerful enough. (It only yields
load 2 2l?k .) Here, we come up with a subtle method that allows local rearrangement of
nodes between dierent cycles. As an eect, the load is distributed more evenly in this
part of the network.
Our results have a major impact on a lot of elds in parallel processing, as CCC 's and
BFN 's have been generally accepted as two benchmark architectures for multicomputers
because of their xed degree and good routing capabilities. To show the practical applicability of our techniques, we built a tool which allows mapping of any CCC of dimension l
to a xed CCC of dimension k < l. We present results for a distributed branch & bound
algorithm solving the Vertex Cover Problem [13] and for a program which simulates an
arbitrary distributed algorithm. Using our mapping tool, a lot of important algorithms
for large CCC 's and BFN 's can be implemented very eciently on a network of realistic
size. E.g. the simulation of a PRAM (parallel random access machine) on large BFN 's as
described in [19] can now easily be transferred to a xed network of processors congured
as a buttery.
2 Basic Denitions and Proofs
(Most of the terminology is taken from [15], [18], [20].) Let n be a positive integer. Let a
denote the binary complement of a 2 f0; 1g.
Networks
The cube-connected cycles network of dimension n, denoted by CCC (n), has vertex-set
Vn = f0; 1; : : : ; n ? 1g f0; 1gn : Its edges connect vertex
(i; ) = (i; a a : : : an? )
0 1
1
with both
((i + 1) mod n; ) and (i; (i));
3
where (i) := a a : : : ai? aiai : : : an? . CCC (n) has n2n nodes, 3n2n? edges and
degree 3.
The buttery network of dimension n, denoted by BFN (n), has vertex-set Vn =
f0; 1; : : : ; n ? 1g f0; 1gn : Its edges connect vertex
0 1
1
+1
1
1
(i; ) = (i; a a : : : an? )
0 1
1
with both
((i + 1) mod n; ) and ((i + 1) mod n; (i)):
BFN (n) has n2n nodes, n2n edges and degree 4.
An edge of the type (i; ) ? ((i + 1) mod n; ) is called a cycle-edge, one of the type
(i; ) ? (j; (i)), j 2 fi; (i + 1) mod ng a cross-edge. For each 2 f0; 1gn , the cycle
+1
(0; ) ? (1; ) ? : : : ? (n ? 1; ) ? (0; )
of length n will be denoted by (; ).
Lexicographical Ordering
For many of the proofs later on, we will need the notion of lexicographical ordering. For
this purpose, let the lexicographical numbering Lex : f0; 1; : : : ; m ? 1g f0; 1gn ! IN be
dened as
Lex(i; a a : : : an? ) = i2n + a 2n? + a 2n? + : : : + an? 2 :
0 1
1
0
1
1
2
1
0
Then, the lexicographical order on f0; 1; : : : ; m ? 1g f0; 1gn is specied by
(i; ) < (j; ) , Lex(i; ) < Lex(j; ) ;
and the lexicographical distance between (i; ) and (j; ) is dened as
jLex(i; ) ? Lex(j; )j :
Network Simulations
Let G and H be nite undirected graphs. An embedding of G into H is a mapping f from
the nodes of G to the nodes of H . G is called the guest graph and H is called the host
graph of the embedding f . The dilation of the embedding f is the maximum distance
in the host between the images of adjacent guest nodes. Its load factor is the maximum
number of vertices of the guest graph G that
l are
m mapped to the same host graph vertex.
(The optimum load achievable is the ratio jjHGjj of the number of nodes in G and H .) Its
edge congestion is the maximum number of edges that are routed through a single edge
of H . (A routing is a mapping r of G's edges to paths in H , r(v ; v ) = a path from f (v )
to f (v ) in H .)
1
2
4
2
1
An embedding of G into H is an abstraction of a simulation of G by H as an interconnection
network. The dilation and edge congestion are measures for the communication time, the
load for the maximum work to be done by a processor. In this paper, we focus on dilation
and load. Edge congestion will only play a minor role.
2.1 The General Embedding Strategy
The basic idea of most of the embeddings presented here is to map 2l?k cycles in
CCC (l)=BFN (l) of length l onto a cycle in CCC (k)/BFN (k) of length k and to distribute their nodes as evenly as possible on the new cycle by squeezing the old cycles
together in an appropriate way. Two dierent kinds of such embeddings are distinguished:
1st Construction:
The cycles of CCC (l)/BFN (l) are mapped together such that the rst k nodes of each
cycle in CCC (l)/BFN (l) are mapped onto the k nodes of a cycle in CCC (k)/BFN (k)
and the remaining l ? k nodes are distributed among the nodes of that cycle in increasing
order.
Those cycles (; a a : : : al? ) with the same sequence of bits a : : :ak? are identied. The
distribution of the nodes on a cycle is determined by choosing a (distribution) function
d : fk; k +1; : : : ; l ?1gf0; 1gl?k ! f0; 1; : : : ; k ?1g which is only applied to the signicant
bits ak ; ak ; : : : ; al? . (On each cycle, the same distribution function is used.) Formally,
the embedding f is of the form
0 1
+1
1
0
1
(
if 0 i k ? 1;
else:
: : : ak ? )
f (i; a a : : :al? ) := (d(i; a a (i; a: : :a
k k
l? ); a : : : ak? )
0 1
1
0
1
1
+1
1
0
1
The load of f is determined by the distribution function d. Therefore, d should distribute
the nodes as evenly as possible on each cycle. All the cross-edges
(i; ) ? (i; (i)); 0 i k ? 1;
(i; ) ? (i + 1; (i)); 0 i k ? 2
of CCC (l)/BFN (l) are mapped onto a corresponding cross-edge in CCC (k)/BFN (k).
Likewise, all the cycle-edges
(i; ) ? (i + 1; ); 0 i k ? 2
of CCC (l)/BFN (l) are mapped onto a corresponding cycle-edge in CCC (k)/BFN (k).
All the other edges of CCC (l)/BFN (l) are mapped onto a path on a single cycle in
CCC (k)/BFN (k). So, in this case the dilation is directly dependant on the distribution
d of the nodes on the cycle and stands partly in contrast to the evenness of the distribution
as explained above.
For low dilation, the nodes (i; a a : : : al? ) and (j; b b : : :bl? ) of CCC (l)/BFN (l) with
a small lexicographical distance between (i; ak ak : : : al? ) and (j; bk bk : : :bl? ) should
0 1
1
0 1
+1
5
1
1
+1
1
be mapped close together on a cycle in CCC (k)/BFN (k). For kl < 2, the 1st construction
does not work at all, because there is no distribution d with a small load and dilation.
2nd Construction:
The cycles of CCC (l)/BFN (l) are mapped together such that the l nodes of each cycle
in CCC (l)/BFN (l) are distributed among the k nodes of a cycle in CCC (k)/BFN (k)
in increasing order.
The cycles to be mapped together are specied by selecting indices (0); (1); : : :;
(k ? 1) 2 f0; 1; : : : ; l ? 1g, (0) < (1) < : : : < (k ? 1), and by identifying
those cycles (; a a : : : al? ) with the same sequence of bits a : : :a k? . The distribution of the nodes on a cycle is determined by choosing a (distribution) function
d : f0; 1; : : : ; l ? 1g f0; 1gl?k ! f0; 1; : : : ; k ? 1g which is only applied to the signicant bits ai, i 62 f(0); (1); : : :; (k ? 1)g. (On each cycle, the same distribution function is used.) Formally, let a : : : al? n = a : : :a ? a : : : a i ? a i
: : : a k? ? a k? : : : al? : Then the embedding f is of the form
0 1
1
1)+1
1
(0)
0
(
1) 1
(
1
0
(0) 1
(
1)
(0)+1
( ) 1
( )+1
f (i; a a : : :al? ) := (d(i; a : : :al? n ); a : : :a k? ):
0 1
1
0
1
(0)
(
1)
Again, the load of f is determined by the distribution function d. Therefore, d should
distribute the nodes as evenly as possible on each cycle. All the cross-edges
(i; ) ? (i; (i)); i 2 f(0); (1); : : :; (k ? 1)g;
(i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g
of CCC (l)/BFN (l) are mapped onto a path consisting of one corresponding cross-edge in
CCC (k)/BFN (k) and two (possibly empty) paths on two dierent cycles. All the other
edges of CCC (l)/BFN (l) are mapped onto a path on a single cycle in CCC (k)/BFN (k).
In both cases, the dilation is directly dependant on the distribution d of the nodes on the
cycle and stands partly in contrast to the evenness of the distribution as explained above.
For low dilation, the values of (0); (1); : : :; (k ? 1) should be spread evenly
among 0; 1; : : : ; l ? 1 (or in parts of these numbers), and the nodes (i; a a : : :al? )
and (j; b b : : : bl? ) of CCC (l)/BFN (l) with a small lexicographical distance between
(i; a a : : :al? n ) and (j; b b : : : bl? n ) should be mapped close together on a cycle in
CCC (k)/BFN (k).
0 1
0 1
0 1
1
1
1
0 1
1
Note that the 1st construction is a special case of the 2nd one by specifying
(i) = i; 0 i k ? 1;
(
d (i; b b : : :bl?k? ) = d (i; b b :i : : b )
l?k?
2
0 1
1
1
0 1
1
if 0 i k ? 1;
if k i l ? 1;
where d and d denote the distribution d in the 1st and the 2nd construction respectively.
1
2
6
2.2 Dilation 2 Embedding of the CCC and the BFN
Theorem 1:
Let k, l be positive integers, l > k.
1. There is al dilation
m 2 embedding of BFN (l) into CCC (k ) with optimum
l
l
?
k
load, i.e. k 2 .
2. There is a dilation 2 embedding of CCC (l) into CCC (k) with optimum
load.
3. There is a dilation 2 embedding of BFN (l) into BFN (k) with optimum
load.
Proof:
BFN (l) can be embedded into CCC (k) with dilation 2 and optimum load by an obvious
choice of d and in the 2nd construction of Section 2.1: Let d be the even distribution
in lexicographical order, i.e. d : f0; 1; : : : ; l ? 1g f0; 1gl?k ! f0; 1; : : : ; k ? 1g satisfying
d(0; 0l?k ) = 0; d(l ? 1; 1l?k ) = k ? 1;
d(i; ) d(i0; 0); if (i; ) (i0; 0) according to the lexicographical order on
f0; 1; : : : ; l ? 1g f0; 1gl?k ,
l
l l?k
k2
m
l
? 1 jd? (j )j kl 2l?k
1
m
for all j = 0; 1; : : : ; k ? 1,
and choose (i) such that d((i); 1l?k ) = i for all 0 i k ? 1. (This ensures that
i ? 1 d((i); ) i for all i; .)
As the distribution d is even, f obviously has optimum load. As i ? 1 d((i); ) i for
all i; , all the cross-edges
(i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g
of BFN (l) are mapped onto a path consisting of one cross-edge and at most one cycleedge in CCC (k). As the distribution d is even and in lexicographical order, all the other
edges of BFN (l) are mapped onto a path on a single cycle in CCC (k) of length at most
2. Therefore, f has dilation 2.
As CCC (n) is a subgraph of BFN (n) [9], there is also a dilation 1 embedding of CCC (n)
into BFN (n). Hence, an embedding of CCC (l) into CCC (k) with dilation 2 and optimum
load is obtained by rst embedding CCC (l) into BFN (l) and then BFN (l) into CCC (k).
An embedding of BFN (l) into BFN (k) with dilation 2 and optimum load can be derived
analogously by rst embedding BFN (l) into CCC (k) and then CCC (k) into BFN (k).
7
2.3 Dilation 1 Embedding of the CCC
Theorem 2:
Let k, l be positive integers, l > k. There is a dilation 1 embedding of CCC (l)
into CCC (k) with load
8
<
:
m
l
l 2l?k
l k
m
2p?1 l?k
2
p
for
for
l
k 2;
2p?3
l
p?1 < k
p?1 ;
p
2
2 p k:
Proof:
(A) kl 2
Each of the two constructions of Section 2.1 yields a straightforward way to embed CCC (l)
into CCC (k) with dilation 1 and optimum load: In each case, let d be the even distribution
in lexicographical order, and for the 2nd construction, choose (i) such that d((i); ) = i
for all 2 f0; 1gl?k , 0 i k ? 1. ((0); (1); : : :; (k ? 1) exist because kl 2.)
As the distribution d is even, f has optimum load. All the cross-edges
(i; ) ? (i; (i)); 0 i k ? 1; (1st constr.)
(i; ) ? (i; (i)); i 2 f(0); (1); : : :; (k ? 1)g (2nd constr.)
of CCC (l) are mapped onto a corresponding cross-edge in CCC (k). All the other edges
are mapped onto a cycle-edge or onto a single node in CCC (k).
Note that the edge congestion of the rst construction is at least 2 2l?k and at most
2l?k , that of the second one is at least 2l?k and at most 2l?k . Therefore, the second
embedding should be preferred.
5
2
(B)
3
2
p?3
p?1
2
< kl p?1 ;
p
2
2pk
l
m
CCC (l) can be embedded into CCC (k) with dilation 1 and load pp? 2l?k by specifying
d and in the 2nd construction of Section 2.1 as described below.
As already explained in Section 2.1, the 1st construction does not work at all for dilation
1 and small load when kl < 2. The 2nd one still works quite well for dilation 1, but
optimum load cannot be guaranteed any longer. The load can only be balanced in certain
sections of each cycle (; ) in CCC (k). The aim is to make these sections as large as
possible and almost equally long. To achieve this, the values of (0); : : :; (k ? 1) must
be spread evenly among 0; 1; : : : ; l ? 1.
l m
Therefore, let (i) := ilk for ?1 i k (where (?1) = ?1 and (k) = l are
dened for formal reasons). Now, each cycle (; ) in CCC (k) is partitioned into sections
8
2
1
B ; B ; : : :; B k?l? where
0
1
2
1
Bj = (ij + 1; ) | (ij + 2; ) | : : : | (ij ? 1; ) | (ij ; );
+1
+1
and i ; i ; : : :; i k?l? are iteratively dened by
0
1
2
1
i = ?1;
8j 2 f1; 2; : : : ; 2k ? l ? 1g : Let ij > ij? such that
(i + 1) ? (i) = 2 for all ij? < i < ij ,
(ij + 1) ? (ij ) = 1.
0
1
1
(Note that i ; i ; : : : ; i k?l? are well-dened because 1 (i + 1) ? (i) 2 for all
?1 i k ? 1 and that i k?l? = k ? 1.) It is easy to verify that
0
1
2
1
2
1
(i + p) ? (i) 2p ? 1 for all ? 1 i k ? p:
Therefore, ij ? ij p for all 0 j 2k ? l ? 2 (if ij ? ij > p, then
(ij +1+ p) ? (ij +1) = 2p). Now, in section Bj , d can be chosen as the even distribution
of f(ij ) + 1; (ij ) + 2; : : :; (ij )g f0; 1gl?k among ij + 1; ij + 2; : : : ; ij in succession,
according to the lexicographical order on f0; 1; : : : ; l ? 1g f0; 1gl?k .
+1
+1
+1
l
+1
m
This yields load pp? 2l?k for the embedding f . And it is guaranteed that d((i); ) = i
for all 0 i k ? 1, 2 f0; 1gl?k . Therefore, all the cross-edges
2
1
(i; ) ? (i; (i)); i 2 f(0); (1); : : :; (k ? 1)g
of CCC (l) are mapped onto a corresponding cross-edge in CCC (k). All the other edges
are mapped onto a cycle-edge or onto a single node in CCC (k).
2
It is clear that the above construction yields dilation 1 and optimum load when
l = 2p ? 1 :
k
p
It can also be shown that the only other cases with dilation 1 and optimum load are
l = k + 1 or
(l; k) 2 f(7; 5); (8; 6); (9; 7); (10; 7); (13; 9)g :
Therefore, the smallest non-optimal pairs (l; k) with this construction are (8,5), (11,7),
(12,7), (10,8), (11,8) and (13,8).
9
2.4 Dilation 1 Embedding of the BFN
Theorem 3:
Let k, l be positive integers, l > k. There is a dilation 1 embedding of BFN (l)
into BFN (k) with load
8 l
l
>
>
>
< l k
m
2l?k m for
p? 2l?k for
lp
m
for
2l?k
2
>
>
>
:
2
5
3
l
k 2;
2p?4
l
p?1 < k
l
5
k 3:
p?2 ;
p
2
7 p 2k;
Proof:
(A) kl 2
BFN (l) can be embedded into BFN (k) with dilation 1 and optimum load by dening d
in the 1st construction of Section 2.1 as described below.
Note that the same embeddings as in the proof of Theorem 2 for the CCC network do
not work as well for the BFN . The second embedding only yields dilation 2, because the
cross-edges
(i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g
might be stretched to length 2. The rst embedding only achieves dilation 1 and optimum
load for kl . If 2 kl < , then the cross-edges
9
4
9
4
(i; ) ? ((i + 1) mod l; (i)); k + 1 i l ? 1
might be stretched to length 2, because the lexicographical distance between
(i; ak ak : : : al? ) and ((i + 1) mod l; akak : : : ai : : : al? ) can be up to 2l?k (for
i = k + 1). So, if we distribute the nodes on each cycle in BFN (k) according to the
lexicographical order, each node of BFN (k) must have capacity more than 2l?k in
order to guarantee dilation 1.
The problem for 2 kl < can be overcome by using a slightly dierent distribution d for
the nodes on each cycle. The idea is to distribute the elements in the rst and the second
half of each cycle in a dierent way. Before distributing the nodes evenly in lexicographical
order, the crucial bit ai of each node (i; ) = (i; a a : : : al? ), k + 1 i l ? 1 is shifted
towards the end of the string in order to reduce the lexicographical distance between
(i; ak ak : : : al? ) and (i + 1; ak ak : : : ai : : :al? ), k + 1 i l ? 2. This can be done
by reversing
: : :al? in thej rstk half of the cycle in BFN (k) (i.e. if
j
kthe part ak ak
l
k
ki
? 1). In the second half (i.e. if l k i l ? 1), no change is needed. As
we will see later on, for the edges
+1
1
+1
5
4
1
9
4
9
4
0 1
+1
1
+1
+1
+2
1
1
1
+
2
+
2
(i; ) ? (i + 1; ); (i; ) ? (i + 1; (i)) for i =
10
j
l+k
2
k
?1
in the middle of the cycle, it is important to leave the highest bit ak in its original position
and only to reverse the remaining part ak ak : : : al? in the rst half of the cycle.
Formally, let d : fk; k +1; : : : ; l ? 1gf0; 1gl?k ! f0; 1; : : : ; k ? 1g be the even distribution
of the elements of fk; k + 1; : : : ; l ? 1g f0; 1gl?k among 0; 1; : : : ; k ? 1 in succession,
according to the lexicographical order on fk; k + 1; : : :; l ? 1g f0; 1gl?k . Then the
distribution d is dened as
8
j
k
l k ? 1;
< d(i; a a
k l? al? : : :ak ) if kj ik d(i; ak ak : : : al? ) := : d(i; ak ak : : : al? ) if l k i l ? 1:
+1
+2
1
+1
1
2
+
2
+1
1
+1
+
2
1
If l = 2k, then f (i; a a : : : al? ) = (i mod k; a : : : ak? ), and the cross-edges
0 1
1
0
1
(i; ) ? ((i + 1) mod l; (i)); k i l ? 1;
of BFN (l) are mapped onto a corresponding cycle-edge in BFN (k).
Let l 2k + 1. Considering the cross-edge
(i; ) ? (i + 1; (i)); k + 1 i j
l+k
k
2
? 2;
(i; ) = (i; a a : : : al? ) is mapped onto
(d(i; ak : : : al? ); a : : :ak? )
= (d(i; ak al? : : :ak ); a : : :ak? )
and (i + 1; (i)) onto
(d(i + 1; ak : : :ai? aiai : : :al? ); a : : : ak? )
= (d(i + 1; ak al? : : : ai? aiai : : : ak ); a : : : ak? ):
The lexicographical distance between
0 1
1
1
0
1
1
+1
1
0
+1
1
1
1
1
0
1
+1
+1
0
1
(i; ak al? al? : : : ak ) and (i + 1; ak al? : : : ai? aiai : : : ak )
1
is at most
2
1 d2e
@1 +
2
0
l?k
+1
+3
1
1
0
l?k
1
A 2l?k @1 +
2
2
1
1
0
(i; ) ? (i + 1; (i));
'
&
j
l+k
2
k
i l ? 2;
11
+1
k+1
1
A 2l?k @1 +
2
1
1 + k 2l?k l ?k k 2l?k l ?k k 2l?k :
Likewise, for the cross-edge
+1
2
1
A
2l?k
(i; ) = (i; a a : : : al? ) is mapped onto
(d(i; ak : : : al? ); a : : :ak? )
= (d(i; ak ak : : : al? ); a : : :ak? )
and (i + 1; (i)) onto
(d(i + 1; ak : : :ai? aiai : : :al? ); a : : : ak? )
= (d(i + 1; ak : : :ai? aiai : : :al? ); a : : : ak? ):
The lexicographical distance between
0 1
1
1
0
1
+1
1
1
0
1
+1
1
1
0
+1
1
1
0
1
(i; ak ak : : :al? ) and (i + 1; ak ak : : :ai? aiai : : : al? )
+1
1
is at most
1 b2c
@1 +
2
0
l?k
+1
+1
1
0
1
1
l?k
1
A 2l?k @1 +
2
2
A
2l?k
+1
1
l ?k k 2l?k :
&
'
Hence, in both cases the capacity of each node in BFN (k) is sucient to keep all the
nodes between (i; ) and (i + 1; (i)).
The remaining problem are the edges
(i; ) ? (i + 1; ); (i; ) ? (i + 1; (i)) for i =
j
l+k
k
2
?1
between the rst and the second half of each cycle in BFN (k). In order to guarantee
dilation 1 for these edges, the nodes must be aligned in a proper way in the middle of
each cycle in BFN (k). This can be achieved e.g. by demanding that the distribution d
above be \symmetric" in the rst and the second half of each cycle in BFN (k), i.e. it
satises the property
j k
jd? (j )j = jd? (k ? 1 ? j )j for all j = 0; 1; : : : ; k ? 1:
()
1
1
2
Now, considering the edges
(i; ) ? (i + 1; ); (i; ) ? (i + 1; (i)) for i =
j
l+k
2
k
? 1;
(i; ) = (i; a a : : : al? ) is mapped onto
(d(i; ak : : : al? ); a : : :ak? )
= (d(i; ak al? : : :ak ); a : : :ak? )
and (i + 1; a : : :ai? bai : : :al? ); b 2 fai; aig onto
(d(i + 1; ak : : :ai? bai : : :al? ); a : : : ak? )
= (d(i + 1; ak : : :ai? bai : : :al? ); a : : : ak? ):
By using kl 2 and property () of d, it can easily be veried (cf. [11]) that
0 1
1
1
0
1
0
1
1
+1
+1
1
0
1
1
+1
1
1
+1
0
1
1
0
12
1
j
d l
k
+
2
k
; a ? d
j
l+k
2
k
? 1; a 1 for all a 2 f0; 1g; ; 2 f0; 1gl?k? :
1
Hence, the two image nodes of
1. (i; ) and (i + 1; ); i =
j
l+k
2. (i; ) and (i + 1; (i)); i =
k
2
j
? 1;
l+k
2
k
? 1;
have at most distance 1 on the cycle in BFN (k).
(B) 1 < kl < 2
The embedding f of BFN (l) into BFN (k) is described in two stages:
1st Stage:
BFN (l) is embedded into BFN (k) with dilation 1 and load 2l?k by specifying d and in the 2nd construction of Section 2.1 as described below.
Note that the same embedding as in the proof of Theorem 2 for the CCC network does
not work as well for the BFN , because the cross-edges
(i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g
might be stretched to length 2. For dilation 1, the problem is that in the case of the
BFN , because of these cross-edges, not only the nodes (i; ); i = (j ); 0 j k ? 1
have to be mapped to level j of a cycle in BFN (k), but also the nodes ((i + 1) mod l; )
have to be mapped to level (j + 1) mod k.
So, once 0 (0) < (1) < : : : < (k ? 1) l ? 1 are chosen for the 2nd construction of
Section 2.1, the distribution d(i; ) is already determined for all
i 2 f(0); (0) + 1; (1); (1) + 1; : : : ; (k ? 1); ((k ? 1) + 1) mod lg:
In order to achieve a low load, the values of (0); (1); : : :; (k ? 1) must be spread as
evenly as possible among 0; 1; : : : ; l ? 1. The best possible load is obtained by demanding
(i) ? (i ? 1) 2 for all 0 i k ? 1;
where (?1) = ?1 for formal reasons. ((0); (1); : : :; (k ? 1) exist because kl < 2.)
This way, the distribution d(i; ) is dened completely as
d((j ); ) = j;
d(((j ) + 1) mod l; ) = (j + 1) mod k for all 0 j k ? 1:
Then, the dilation of the embedding is 1, and the nodes (i; ) with (i) ? (i ? 1) = 1
have load 2l?k and those with (i) ? (i ? 1) = 2 have load 2 2l?k .
13
2nd Stage:
The nodes are locally rearranged between dierent cycles in order to improve the load of
the embedding.
For this purpose, let us call a node v = (i; ) in BFN (l) an A-node if i 2
f(0); (1); : : :; (k ? 1)g, i.e. the cycle-edge
(i; ) ? ((i + 1) mod l; )
is mapped to a corresponding cycle-edge in BFN (k). Otherwise, v is called a B-node,
and the cycle-edge
(i; ) ? ((i + 1) mod l; )
is mapped to a single node in BFN (k).
By choosing (0); (1); : : :; (k ? 1) appropriately in the rst stage, one tries to partition
the cycles of BFN (k) into certain segments in which the load can be rearranged between
linked cycles.
Segment A
One type of such segment where a rearrangement is possible is a sequence of nodes
BAABA as displayed in Figure 1.
z
i1
i2
?1
}|
{
u u u u
u u u u
z
?2
}|
{
u u u u
u u u u
?3
z
}|
{
u u u u
u u u u
z
?4
}|
{
u u u u
u u u u
)
@? @?
@? @?
?Q@?@Q ?Q@?@Q ?@?@ ?@?@ I , Load 2 2l?k
Q?@QQQQ?@QQQ ?@?@
QQQQQQQQ
QQQQQQQQ
QQQQQQQQ
QQQQQQQQ
QQQQQQQQ
QQQQQQQQQQQQ )
u PPuPPPuPPPuPPP QPuQPPuQPPPuQPPPuPPPuuuu QuQuQuQu I , Load 2l?k
i P
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
PPPPPPPPPPPPPPPP PPPPPPPPPPPPPPPP
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
u Huuu Hu HuuuPPPHuPPHuPPuPu PPPHuPPHuPPuPu )
i H
u H
u HuHHuHu
u HuHHuHu
u HuHHuHu I , Load 2 2l?k
uHHuHu
i 3
1
2
4
3
5
1 1 1 1
2 2 2 2
3 3 3 3
Figure 1: Segment A
14
4 4 4 4
Prior nodes of BFN (l) are indicated by black dots and small letters, image
nodes in BFN (k) by capitals. Prior edges of BFN (l) are illustrated by lines,
edges in BFN (k) are equivalent to four parallel lines. Formally, we have
f (i ; j ) = f (i ; j ) = (I ; ?j );
f (i ; j ) = (I ; ?j );
f (i ; j ) = f (i ; j ) = (I ; ?j );
where f is the mapping after the rst stage, and is a synonym for , , or
.
In the situation of Figure 1, the nodes on 2 groups of cycles f ; ; ; g, f ; ; ; g
(where , , 6= stand for , , or ) can be moved as shown in Figure 2. (An
arrow indicates that a prior vertex of BFN (l) is transferred from one node in BFN (k)
to another.)
1
3
4
2
1
5
3
2
1
?1
?2
z}|{
?3
z}|{
2
3
4
1
2
3
4
?4
z}|{
z}|{
o
kQQ
QQQ
QQQQQ3 I
+ QQQu
QQsQu?o
?P
u
i u
P
PPPPP6iPPPPP1P6 I
PP)PPPqPPPuPPPPPPPu o I
i u u
i2
u
u
u
u
1
3
4
2
1
2
?1
3
?2
z}|{
?3
z}|{
3
4
?4
z}|{
z}|{
o
uQ
QQQQQ3u
QQkQQu I
QQsQu? ?u+ QQQu o
i u
PP
P PP I
6iPPPPPPPPPPPPPPPPPP1 6o
) uPPPu PPqPu I
i u
i2
u
1
3
4
2
1
2
3
4
3
Figure 2: Rearrangement A,
1 element is transferred from I to I and from I to I .
By applying Rearrangement A once or twice in the situation of Figure 1, the following
results are obtained for each cycle ?i in BFN (k):
By rearrangements on 4 (2) prior cycles of BFN (l), 2 (1) elements of the upper and
the lower, overloaded nodes I ; I in BFN (k) are transferred to the middle node I .
Dilation 1 is kept up if the elements i and i on the cycles i; i; i ; i stay where
they are. This means that on these cycles no movements are allowed directly above
or underneath the shown section. But all the other cycles are not aected.
1
1
2
3
3
2
2
1
15
5
Segment B
The second type of segment we consider is a sequence of nodes BABA as displayed in
Figure 3. (For an explanation of the gures below, confer the descriptions for Segment
A.)
?1
z
i1
i2
}|
{
?2
z
u u u u
u u u u
}|
{
u u u u
u u u u
)
@? @?
@? @? I , Load 2
Q?@QQQQ?@QQQ ?@?@
QQQQQQQQ
QQQQQQQQQQQQ
u u QQQuQQuQQuQu )
u HHuH
i H
u uHuHu
Hu HHuHHuHu I , Load 2
i 3
4
1 1 1 1
1
2
2l?k
2l?k
2 2 2 2
Figure 3: Segment B
Four kinds of moves can be constructed similarly to above (see Figures 4,5,6,7), each
aecting 2 prior cycles of BFN (l). Any two of them may also be combined.
?1
z}|{
?2
z}|{
?1
o
I0
6
o
u
i uQQQ
I
6 kQQQQ o
Qu I
i u+
2
3
1
2
?2
z}|{
z}|{
o
6o I
0
QQQQ3u6 I
QQQsQu o
u
i I
1
i2
2
3
u
1
1
2
2
Figure 4: Rearrangement B1,
1 element is transferred from I to I and from I to I .
2
16
1
1
0
?1
z}|{
?2
z}|{
?1
o
QQQu I
QQk
? +QQQu o I
i u
o
?
I
i2
u
1
i2
2
3
3
2
z}|{
o
QQQQ3u I
QQQsQu?o
u
i I
?o I
1
3
?2
z}|{
u
1
2
1
3
2
Figure 5: Rearrangement B2,
1 element is transferred from I to I and from I to I .
1
?1
z}|{
?2
z}|{
3
u
1
2
?1
o
u
I
Q
k
Q
Q
Q
6 QQQ o
+ Qu I
i u
i2
2
?2
z}|{
z}|{
o
QQQQ3u6 I
QQQsQu o
u
I
i 1
i2
2
3
2
3
u
1
1
2
2
Figure 6: Rearrangement B3,
1 element is transferred from I to I .
2
?1
z}|{
?2
z}|{
?1
o
kQQ
QQQ
? +QQQu o I
i u
i2
3
u
1
u
2
1
?2
z}|{
z}|{
o
QQQQ3 I
QQQsQu?o
u
i I
I1
i2
2
3
u
u
1
1
2
2
Figure 7: Rearrangement B4,
1 element is transferred from I to I .
1
2
Again, dilation 1 is kept up if the elements i and i on the cycles i; i; i; i stay where
they are. This means that on these cycles no movements are allowed directly above or
underneath the shown section. But all the other cycles are not aected.
1
4
A combination of the rearrangement techniques outlined above and an appropriate choice
of (0); (1); : : :; (k ? 1) in the rst stage lead to the desired load of the nal embedding.
17
From now on,
?
@
u
u
will always be associated with a node in BFN (k) of load 2 2l?k , which is equivalent to
a node sequence BA in BFN (l).
?
@
u
will be associated with a node in BFN (k) of load 2l?k , which is equivalent to an A-node
in BFN (l).
We distinguish dierent cases:
(B1)
l
k
5
3
I. l ? k even:
Let l ? k = 2i + 2; i 0. By the right choice of (0); : : :; (k ? 1) in the rst stage, each
cycle can be subdivided into a node sequence (ABAAB )i A. The load is rearranged as
follows:
+1
...
?
1 l?k 2
@?
@
31 l?k ??
2
@
6
3
??
@
@
??
@
1
2l?k ?@
?
?
@
31
l?k
2
6@
3
??
@
..@
u
u
u
u
u
u
u
u
u
u
9
>
>
>
>
>
>
=
>
>
>
>
>
>
;
i times
.
An arrow and a number next to it indicate how many nodes are moved
from
k one place
j
l
?
k
to another. In each section
of the cycle, by using Rearrangement
[A] 2 times (thus
j
k
l
m
l
?
k
l
?
k
l
?
k
aecting at most 2 2 + 2 2 cycles), a load of 2 is achieved.
1
3
1
3
5
3
18
II. l ? k odd:
m
l
With similar techniques as in Case I, a load of 2l?k can be derived. A detailed
discussion can be found in [11]. The only thing which has to be checked in every case is
that at most 2l?k cycles of BFN (l) are aected by the rearrangements anywhere in the
cycle in BFN (k).
(B2) pp?? < kl pp? ; 7 p 2k
5
3
2
2
4
1
2
l m
By choosing (i) := ilk for 0 i k ? 1 in the rst stage, each cycle is partitioned into
sections of the following two types:
Type 2
Type 1
)
u
u
u
u
u
u
u
u
u
@
??
@
@
@
??
@
??
@
@
??
@
@
??
@
)
)
u
u
u
u
u
u
u
u
u
u
u
u
u
u
@??@
@??@
@??@
@??@
@??@
@??@
@??@
@??@
t1 times
t2 times
((AB )t1 AAB (AB )t2 AB )+
)
)
t3 times
2t4 times
t3 times
((AB )t3 AAB (AB )2t4 AAB (AB )t3 AB )+
where
t =
1
j
p?4
k
;t =
2
4
j
p?6
4
k
;t =
3
j
p?1
4
k
;t =
4
j
p?3
4
k
:
Let
n=t +t +3=
1
2
j k
p
2
;
(
uneven,
m = 2t + 2t + 5 = p ?p 1 ifif pp even
3
4
be the length of the two sections above. It is shown below that the load can be distributed
optimally in each of these sections, thus yielding a load of
19
l
l
n?1 2l?k
n
2
m
in sections of Type 1,
m
in sections of Type 2.
m?2 2l?k
m
2
Therefore, the whole embedding has load at most
l
p?2 2l?k
p
2
m
.
Load Balancing in Sections of Type 1
In sections of Type 1, the load can be distributed evenly by shifting it from the overloaded
nodes on the outside to the underloaded nodes in the middle. This is done by applying
Rearrangements [B1]; : : :; [B4] in the
u
u
@
??
@
parts, and by using Rearrangement [A] in the
u
u
u
@
?
?
@
@
??
@
parts. The only thing to make sure is that at most 2l?k cycles of BFN (l) are aected by
the rearrangements anywhere in the cycle in BFN (k).
It turns out that the cases n 2 f4r + 3; 4r + 4; 4r + 5; 4r + 6g, for r 2 IN , have to be
distinguished. Here, we only state the case n = 4r +l3. All them other cases work in a
similar way (cf. [11]). In each case, the load derived is nn? 2l?k .
0
2
Let n = 4r + 3, and let
be balanced as follows:
j
k
1
l?k be abbreviated by L. Then the load in each section can
n2
1
20
u
u
u
u
u
u
u
u
@
??
@
? ?@
??
@
2L ?1L ?@
??
@
2L 2L @?
? ??@
1L ?
1L 1L
2L ..
?.
u
rL ? @
u??
@
u
rL rL @?
? ?u?@
u
(r +1) L ?rL ?@
u??
@
(r +1) L 6 6@
@
rL u??
u
u??
@
rL 6rL 6@
u
rL 6 @
u??
@
..
1L 6.
u
@
1
L
6
6
u??
@
1L
u
1L 6 @
u??
@
u
@
u??
@
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
2r times
2r times
If r = 0 (i.e. l = 5; k = 3), at most 2L + 2 = 4 2l?k cycles are aected by the
rearrangements. If r 1 (i.e. l ? k 6), at most (2rL + 2) + (2(r+1) L + 2) 2l?k cycles
are aected.
The load derived is 2 2l?k ? L =
l
n?1 2l?k
n
2
m
.
Load Balancing in Sections of Type 2
In sections of Type 2, the load can be distributed evenly by shifting it in the same manner
in the upper and the lower half of each section, namely by shifting it from the overloaded
nodes on the outside of each half to the underloaded nodes in the middle. The way this
is done has already been explained for sections of Type 1.
It turns out that the cases m 2 f8r + 7; 8r + 9; 8r + 11; 8r + 13g, for r 2 IN , have to be
0
21
distinguished. Here, we only state the case m = 8r +l7. All them other cases work in a
similar way (cf. [11]). In each case, the load derived is mm? 2l?k .
2
Let m = 8r + 7, and let
be balanced as follows:
u
u
u
u
@
??
@
1L ?@
??
@
1L ?
1L
?
1L ..
?.
u
@
rL ?
u??
@
u
rL
? rL ?@
u??
@
u
rL @?
(r +1) L ?
?u?@
l
m @
(r +1) L 6
@
(r + ) L 6u??
m u
l
(r + ) L 6@
6
u??
@
rL
u
@
rL 6
u??
@
..
.
1L 6
u
1L 6 l Lm 6@
u??
@
m u
l
L 6@
@
u??
1
2
1
2
1
2
1
2
..
.
j
2
k
l?k be abbreviated by L. Then the load in each section can
m2
2
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
)
2r times
2r + 1 times
reverse to above with b c instead of d e
u
u
@
??
@
l
m
As (2 (r + ) L + 2) + (2(r +1) L + 2) 2l?k (for l ? k 5), at most 2l?k cycles are
aected.
1
2
The load derived is 2 2l?k ? L =
l
m?2 2l?k
2
m
2
22
.
3 Experimental Results
To program a distributed system, one has to specify a conguration map of the network
topology. This map describes the process-processor mapping, the communication channels
between the processes, and the mapping of the logical communication channels (between
the processes) to the physical communication channels (the link connections between the
processors).
Our compression tool takes any conguration map for a logical CCC network of dimension
l and builds a conguration map for a physical CCC of any dimension k l by applying
the algorithms of chapter 2.
Since the edge congestion and dilation are greater than one in some cases, we have to
integrate multiplexing and routing processes in the program for the target CCC . This
is done automatically by our tool. The user only has to describe the conguration of
his logical CCC and to insert the dimension of his physical CCC . So a number of user
processes are mapped to one processor and run in parallel to routing and multiplexing
processes. By this compression, a lot of adjacent processes are mapped to the same
processor which results in a faster communication between them.
To measure the overhead which is caused by the additional routing and multiplexing
processes, we rst wrote dierent user programs Pk consisting of k 2k processes congured
as a CCC of dimension k. Every program performs x iterations each one consisting of y
internal dummy operations and one communication with a process of the network. We
executed program P on CCC (3) and the compressed version of P on CCC (2). We also
ran P on CCC (2) and the compressed version of P on CCC (1).
3
3
2
2
Table 1 shows the execution times in seconds and the resulting overhead for dierent x
and y. One can see that the overhead is even better than the optimal load factor for
the corresponding embedding (i.e. 4 when embedding CCC (2) into CCC (1) and 3 when
embedding CCC (3) into CCC (2)), if there is a large amount of communication between
the processes. This is due to the fact that communication between processes on the same
processor is approximately 2.5 times as fast as communication between linked processors.
Table 2 shows results for a distributed branch & bound algorithm solving the Vertex
Cover Problem (cf. [13]). We tested the algorithm for 10 instances on CCC (2) and after
the compression for the same instances on CCC (1). The results show that there is no
overhead at all because of the huge amout of communication for this algorithm.
23
y
x CCC (2) CCC (2 ! 1)
CCC (3) CCC (3 ! 2)
10000000
1 280.19
1160.27 4.141 855.36
2566.55 3.000
1000000
10 280.43
1161.22 4.140 857.00
2573.24 3.002
100000
100 282.83
1172.08 4.144 873.44
2629.53 3.010
10000 1000 307.00
1276.25 4.157 1030.26
3189.78 3.096
1000 1000
57.64
233.31 4.047 335.86
900.26 2.680
100 10000 439.48
1333.64 3.034 3189.14
8341.35 2.615
10 10000 433.57
1268.10 2.924 3182.23
8339.98 2.620
1 100000 4328.18
12641.41 2.920 31850.16
83329.14 2.616
Table 1: Compressing an Arbitrary Distributed Algorithm
ID CCC (2) CCC (2 ! 1)
0
65.91
263.85 4.003
1
46.48
175.14 3.881
2
59.38
234.04 3.940
3
56.55
223.29 3.948
4
38.75
153.34 3.956
5
73.02
285.20 3.905
6
47.55
187.30 3.938
7
46.23
179.91 3.891
8
66.73
266.04 3.986
9
57.35
203.50 3.548
Table 2: Compressing a Branch & Bound Algorithm
24
4 Conclusion
In this paper, new solutions to the problem of simulating large networks on smaller ones
are proposed for the cube-connected cycles (CCC ) and the buttery network (BFN ),
two classes of networks which are of major importance for practical applications. It is
demonstrated that the problem can be solved very eciently for these networks, i.e. compared to the inherent slowdown, the simulation only takes a little amount of additional
running time (due to a communication overhead or to an imbalanced distribution of the
workload).
For an analysis, the network simulation problem is modeled as a graph embedding problem, transferring the desired properties of the simulation into properties of the corresponding embedding (dilation, load, edge congestion). A number of embeddings are presented
with small dilation and/or small load, leading to the main simulation result above. In
addition to these properties, we show that our techniques behave very well for real applications.
Therefore, one might say as a summary that a lot of fundamental applications for large
CCC 's and BFN 's can be implemented very eciently on a network of realistic size by
using the results of this paper, thus underlining the fact that the CCC and the BFN are
very powerful interconnection structures for computer architectures in parallel processing.
An interesting question to investigate is whether the non-optimal dilation 1 embeddings
of large CCC 's and BFN 's (of dimension l) into smaller ones (of dimension k) can still be
improved with respect to their load or whether the embeddings presented can be shown
to be optimal. In some special cases, e.g. (l; k) = (8; 5) for the CCC and (l; k) = (3; 2),
(l; k) = (6; 4) for the BFN, it is possible to achieve optimum load by applying similar
techniques as described in this paper (see [11]). We are also investigating a new idea which
might lead to an improvement in many cases when embedding CCC (l) into CCC (k),
l=k < 2, e.g. for (l; k) = (11; 7). But especially for the BFN , even for simple cases such
as the embedding of BFN (4) (with 64 processors) into BFN (3) (with 24 processors), it
is not clear whether the load can be improved any further. Finally, a further study should
also consider the edge congestion of the embeddings and try to minimize it while keeping
up the same dilation and load.
Acknowledgement
The authors would like to thank Juraj Hromkovic for helpful discussions and a careful
reading of the manuscript.
25
References
[1] M.J. Atallah, S.R. Kosaraju, "Optimal simulations between mesh-connected arrays
of processors", Journal of the ACM, vol. 35 (1988), pp. 635-650.
[2] S.N. Bhatt, F.R.K. Chung, J.-W. Hong, F.T. Leighton, A.L. Rosenberg, "Optimal
Simulations by Buttery Networks", Proceedings of the 20th ACM Symposium on the
Theory of Computing (1988), pp. 192-204.
[3] F. Berman, L. Snyder, "On mapping parallel algorithms into parallel architectures",
Journal of Parallel and Distributed Computing, vol. 4 (1987), pp. 439-458.
[4] H.L. Bodlaender, "The classication of coverings of processor networks", Journal of
Parallel and Distributed Computing, vol. 6 (1989), pp. 166-182.
[5] H.L. Bodlaender, J. van Leeuwen, "Simulation of large networks on smaller networks", Information and Control, vol. 71 (1986), pp. 143-180.
[6] J. Ellis, Z. Miller, I.H. Sudborough, "Compressing meshes into small hypercubes",
Technical Report, Computer Science Program, University of Texas at Dallas,
Richardson, TX 75083-0688.
[7] M.R. Fellows, "Encoding graphs in graphs", Ph.D. Dissertation (1985), University
of California at San Diego.
[8] J.P. Fishburn, R.A. Finkel, "Quotient networks", IEEE Transactions on Computers,
vol. C-31 (1982), pp. 288-295.
[9] R. Feldmann, W. Unger, "The Cube-Connected-Cycle is a subgraph of the Buttery
network", Department of Mathematics and Computer Science, University of Paderborn, Germany, 1990, submitted for publication.
[10] A.K. Gupta, S.E. Hambrusch, "Embedding large tree machines into small ones", Proceedings of the 5th MIT Conference on Advanced Research in VLSI (1988), pp. 179198.
[11] R. Klasing, "Simulating large cube-connected cycles and large buttery networks on
smaller ones", Master Thesis (1990), Universitat-GH Paderborn, Fachbereich 17 {
Mathematik/Informatik, Germany.
[12] R. Klasing, R. Luling, B. Monien, "Compressing cube-connected cycles and buttery networks", Proc. 2nd IEEE Symposium on Parallel and Distributed Processing,
pp. 858-865, 1990.
[13] R. Luling, B. Monien, "Two strategies for solving the Vertex Cover Problem on a
Transputer Network", 3rd International Workshop on Distributed Algorithms 1989,
LNCS 392, pp. 160-170.
[14] B. Monien, "Simulating binary trees on X-trees"' Proc. 3rd ACM Symposium on
Parallel Algorithms and Architectures (SPAA 91), pp. 147-158, 1991.
26
[15] B. Monien, I.H. Sudborough, "Embedding one Interconnection Network in Another",
Computing Suppl. 7 (1990), pp. 257-282.
[16] P.A. Nelson, L. Snyder, "Programming solutions to the algorithm contraction problem", Proceedings of the 1986 International Conference on Parallel Processing,
pp. 258-261.
[17] R. Peine, "Cayley-Graphen und Netzwerke", Master Thesis (1990), Universitat-GH
Paderborn, Fachbereich 17 { Mathematik/Informatik, Germany.
[18] F.P. Preparata, J.E. Vuillemin, "The cube-connected cycles: a versatile network for
parallel computation", Communications of the ACM, vol. 24 (1981), pp. 300-309.
[19] A.G. Ranade, "How to emulate shared memory", Proceedings of the 28th IEEE Symposium on Foundations of Computer Science (1987), pp. 185-194.
[20] A.L. Rosenberg, "Graph embeddings 1988: Recent breakthroughs, new directions",
Proceedings of the 3rd Aegean Workshop on Computing (AWOC): VLSI Algorithms
and Architectures (1988), LNCS 319, pp. 160-169.
[21] J.T. Schwartz, "Ultracomputers", ACM Transactions on Programming Languages
and Systems, vol. 2 (1980), pp. 484-521.
[22] H.S. Stone, "Parallel processing with the perfect shue", IEEE Transactions on
Computers, vol. C-20 (1971), pp. 153-161.
27