Compressing Cube-Connected Cycles and Buttery Networks y Ralf Klasing, Reinhard Luling, Burkhard Monien Universitat-GH Paderborn, FB 17, Warburger Str. 100, W-4790 Paderborn, Germany e-mail : [email protected], [email protected], [email protected] Abstract We consider the simulation of large cube-connected cycles (CCC ) and large buttery networks (BFN ) on smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a xed size. We show that large CCC 's and BFN 's can be embedded into smaller networks of the same type with (a) dilation 2 and optimum load, (b) dilation 1 and optimum load in most cases, (c) dilation 1 and nearly optimum load in all cases. Our results show that large CCC 's and BFN 's can be simulated very eciently on smaller ones. Additionally, we implemented our algorithm for compressing CCC 's and ran several experiments on a Transputer network, which showed that our technique also behaves very well from a practical point of view. A preliminary version of these results appears in: Proc. 2nd IEEE Symposium on Parallel and Distributed Processing (1990), pp. 858-865. yThis work was supported by grant Mo 285/4-1 from the German Research Association (DFG). 1 Introduction Over the past few years, a lot of research has been done in the eld of interconnection networks for parallel computer architectures (for a survey, cf. [15], [20]), as most of these architectures can actually be realized in hardware (e.g. as a network of Transputers). Much of the work has been focused on the capability of certain networks to simulate other network or algorithm structures, in order to execute parallel algorithms of a special structure eciently on dierent processor networks (as some outstanding work, see e.g. [2], [14]). But the problem generally neglected is that most of the existing algorithms are designed for arbitrarily large networks (see e.g. [18, 21, 22]), whereas, in practice, the processor network will be xed and of smaller size. Thus, the larger network must be simulated in an ecient way (i.e. needing little simulation time) on the smaller target network. Solutions to this problem, which is commonly modeled as a graph embedding problem, have been proposed so far for common network structures like hypercubes, binary trees, meshes, shue-exchange networks, deBruijn networks, etc. in [1, 3, 4, 5, 6, 7, 8, 10, 16, 17]. So far, only partial results are known about two classes of networks which are very important for practical purposes, namely the cube-connected cycles (CCC ) { as introduced in [18] { and the buttery network (BFN ). In [4], [8], and [17], embeddings with optimum dilation and load are presented in the case of embedding CCC 's and BFN 's of dimension l into k where kjl. The authors also restrict themselves to special kinds of embeddings of a very regular structure, like coverings [4], homogeneous emulations [8], and homomorphisms [17]. Because of the very restricted nature, Bodlaender [4] and Peine [17] are also able to classify their embeddings completely. In [3], a general procedure is described for mapping parallel algorithms into parallel architectures. This procedure is applied to the CCC network achieving dilation 1, but very high load. Also, only special kinds of embeddings, so-called contractions, are considered. This paper investigates the embedding problem for CCC and BFN taking into account general embedding functions and any possible network dimension. The central statement derived is: Large CCC 's and BFN 's can be simulated very eciently (almost optimally) on smaller ones. In more detail, we prove for the cube-connected cycles network that CCC (l) can be embedded into CCC (k), l > k, with (a) dilation 2 and optimum load, i.e. l l l?k k2 m . (b) dilation 1 and optimum load, if kl 2. (c) dilation 1 and optimum load, for certain values of l, k, if kl < 2. (d) dilation 1 and "nearly" optimum load, for all other values of l, k, if kl < 2. 2 More precisely, the load in cases (c) and (d) is l i?1 2l?k i 2 m for i?3 i?1 2 < kl i?1 ; i 2 2 i k: For the buttery network, we show that BFN (l) can be embedded into BFN (k), l > k, with (a) { (d) like above. Here, the load in cases (c) and (d) is specied by l i?2 2l?k li m 5 l?k 2 2 3 m for for i?4 i?1 l k 2 < kl : 5 3 i?2 ; i 2 7 i 2k; The general strategy of the embeddings is to map 2l?k cycles in CCC (l)=BFN (l) of length l onto a cycle in CCC (k)/BFN (k) of length k and to distribute their nodes as evenly as possible on the new cycle. A specication or a variation of this general idea will yield a lot of the results above. But in one important case, namely the dilation 1 embedding of BFN (l) into BFN (k) for kl < 2, this construction is not powerful enough. (It only yields load 2 2l?k .) Here, we come up with a subtle method that allows local rearrangement of nodes between dierent cycles. As an eect, the load is distributed more evenly in this part of the network. Our results have a major impact on a lot of elds in parallel processing, as CCC 's and BFN 's have been generally accepted as two benchmark architectures for multicomputers because of their xed degree and good routing capabilities. To show the practical applicability of our techniques, we built a tool which allows mapping of any CCC of dimension l to a xed CCC of dimension k < l. We present results for a distributed branch & bound algorithm solving the Vertex Cover Problem [13] and for a program which simulates an arbitrary distributed algorithm. Using our mapping tool, a lot of important algorithms for large CCC 's and BFN 's can be implemented very eciently on a network of realistic size. E.g. the simulation of a PRAM (parallel random access machine) on large BFN 's as described in [19] can now easily be transferred to a xed network of processors congured as a buttery. 2 Basic Denitions and Proofs (Most of the terminology is taken from [15], [18], [20].) Let n be a positive integer. Let a denote the binary complement of a 2 f0; 1g. Networks The cube-connected cycles network of dimension n, denoted by CCC (n), has vertex-set Vn = f0; 1; : : : ; n ? 1g f0; 1gn : Its edges connect vertex (i; ) = (i; a a : : : an? ) 0 1 1 with both ((i + 1) mod n; ) and (i; (i)); 3 where (i) := a a : : : ai? aiai : : : an? . CCC (n) has n2n nodes, 3n2n? edges and degree 3. The buttery network of dimension n, denoted by BFN (n), has vertex-set Vn = f0; 1; : : : ; n ? 1g f0; 1gn : Its edges connect vertex 0 1 1 +1 1 1 (i; ) = (i; a a : : : an? ) 0 1 1 with both ((i + 1) mod n; ) and ((i + 1) mod n; (i)): BFN (n) has n2n nodes, n2n edges and degree 4. An edge of the type (i; ) ? ((i + 1) mod n; ) is called a cycle-edge, one of the type (i; ) ? (j; (i)), j 2 fi; (i + 1) mod ng a cross-edge. For each 2 f0; 1gn , the cycle +1 (0; ) ? (1; ) ? : : : ? (n ? 1; ) ? (0; ) of length n will be denoted by (; ). Lexicographical Ordering For many of the proofs later on, we will need the notion of lexicographical ordering. For this purpose, let the lexicographical numbering Lex : f0; 1; : : : ; m ? 1g f0; 1gn ! IN be dened as Lex(i; a a : : : an? ) = i2n + a 2n? + a 2n? + : : : + an? 2 : 0 1 1 0 1 1 2 1 0 Then, the lexicographical order on f0; 1; : : : ; m ? 1g f0; 1gn is specied by (i; ) < (j; ) , Lex(i; ) < Lex(j; ) ; and the lexicographical distance between (i; ) and (j; ) is dened as jLex(i; ) ? Lex(j; )j : Network Simulations Let G and H be nite undirected graphs. An embedding of G into H is a mapping f from the nodes of G to the nodes of H . G is called the guest graph and H is called the host graph of the embedding f . The dilation of the embedding f is the maximum distance in the host between the images of adjacent guest nodes. Its load factor is the maximum number of vertices of the guest graph G that l are m mapped to the same host graph vertex. (The optimum load achievable is the ratio jjHGjj of the number of nodes in G and H .) Its edge congestion is the maximum number of edges that are routed through a single edge of H . (A routing is a mapping r of G's edges to paths in H , r(v ; v ) = a path from f (v ) to f (v ) in H .) 1 2 4 2 1 An embedding of G into H is an abstraction of a simulation of G by H as an interconnection network. The dilation and edge congestion are measures for the communication time, the load for the maximum work to be done by a processor. In this paper, we focus on dilation and load. Edge congestion will only play a minor role. 2.1 The General Embedding Strategy The basic idea of most of the embeddings presented here is to map 2l?k cycles in CCC (l)=BFN (l) of length l onto a cycle in CCC (k)/BFN (k) of length k and to distribute their nodes as evenly as possible on the new cycle by squeezing the old cycles together in an appropriate way. Two dierent kinds of such embeddings are distinguished: 1st Construction: The cycles of CCC (l)/BFN (l) are mapped together such that the rst k nodes of each cycle in CCC (l)/BFN (l) are mapped onto the k nodes of a cycle in CCC (k)/BFN (k) and the remaining l ? k nodes are distributed among the nodes of that cycle in increasing order. Those cycles (; a a : : : al? ) with the same sequence of bits a : : :ak? are identied. The distribution of the nodes on a cycle is determined by choosing a (distribution) function d : fk; k +1; : : : ; l ?1gf0; 1gl?k ! f0; 1; : : : ; k ?1g which is only applied to the signicant bits ak ; ak ; : : : ; al? . (On each cycle, the same distribution function is used.) Formally, the embedding f is of the form 0 1 +1 1 0 1 ( if 0 i k ? 1; else: : : : ak ? ) f (i; a a : : :al? ) := (d(i; a a (i; a: : :a k k l? ); a : : : ak? ) 0 1 1 0 1 1 +1 1 0 1 The load of f is determined by the distribution function d. Therefore, d should distribute the nodes as evenly as possible on each cycle. All the cross-edges (i; ) ? (i; (i)); 0 i k ? 1; (i; ) ? (i + 1; (i)); 0 i k ? 2 of CCC (l)/BFN (l) are mapped onto a corresponding cross-edge in CCC (k)/BFN (k). Likewise, all the cycle-edges (i; ) ? (i + 1; ); 0 i k ? 2 of CCC (l)/BFN (l) are mapped onto a corresponding cycle-edge in CCC (k)/BFN (k). All the other edges of CCC (l)/BFN (l) are mapped onto a path on a single cycle in CCC (k)/BFN (k). So, in this case the dilation is directly dependant on the distribution d of the nodes on the cycle and stands partly in contrast to the evenness of the distribution as explained above. For low dilation, the nodes (i; a a : : : al? ) and (j; b b : : :bl? ) of CCC (l)/BFN (l) with a small lexicographical distance between (i; ak ak : : : al? ) and (j; bk bk : : :bl? ) should 0 1 1 0 1 +1 5 1 1 +1 1 be mapped close together on a cycle in CCC (k)/BFN (k). For kl < 2, the 1st construction does not work at all, because there is no distribution d with a small load and dilation. 2nd Construction: The cycles of CCC (l)/BFN (l) are mapped together such that the l nodes of each cycle in CCC (l)/BFN (l) are distributed among the k nodes of a cycle in CCC (k)/BFN (k) in increasing order. The cycles to be mapped together are specied by selecting indices (0); (1); : : :; (k ? 1) 2 f0; 1; : : : ; l ? 1g, (0) < (1) < : : : < (k ? 1), and by identifying those cycles (; a a : : : al? ) with the same sequence of bits a : : :a k? . The distribution of the nodes on a cycle is determined by choosing a (distribution) function d : f0; 1; : : : ; l ? 1g f0; 1gl?k ! f0; 1; : : : ; k ? 1g which is only applied to the signicant bits ai, i 62 f(0); (1); : : :; (k ? 1)g. (On each cycle, the same distribution function is used.) Formally, let a : : : al? n = a : : :a ? a : : : a i ? a i : : : a k? ? a k? : : : al? : Then the embedding f is of the form 0 1 1 1)+1 1 (0) 0 ( 1) 1 ( 1 0 (0) 1 ( 1) (0)+1 ( ) 1 ( )+1 f (i; a a : : :al? ) := (d(i; a : : :al? n ); a : : :a k? ): 0 1 1 0 1 (0) ( 1) Again, the load of f is determined by the distribution function d. Therefore, d should distribute the nodes as evenly as possible on each cycle. All the cross-edges (i; ) ? (i; (i)); i 2 f(0); (1); : : :; (k ? 1)g; (i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g of CCC (l)/BFN (l) are mapped onto a path consisting of one corresponding cross-edge in CCC (k)/BFN (k) and two (possibly empty) paths on two dierent cycles. All the other edges of CCC (l)/BFN (l) are mapped onto a path on a single cycle in CCC (k)/BFN (k). In both cases, the dilation is directly dependant on the distribution d of the nodes on the cycle and stands partly in contrast to the evenness of the distribution as explained above. For low dilation, the values of (0); (1); : : :; (k ? 1) should be spread evenly among 0; 1; : : : ; l ? 1 (or in parts of these numbers), and the nodes (i; a a : : :al? ) and (j; b b : : : bl? ) of CCC (l)/BFN (l) with a small lexicographical distance between (i; a a : : :al? n ) and (j; b b : : : bl? n ) should be mapped close together on a cycle in CCC (k)/BFN (k). 0 1 0 1 0 1 1 1 1 0 1 1 Note that the 1st construction is a special case of the 2nd one by specifying (i) = i; 0 i k ? 1; ( d (i; b b : : :bl?k? ) = d (i; b b :i : : b ) l?k? 2 0 1 1 1 0 1 1 if 0 i k ? 1; if k i l ? 1; where d and d denote the distribution d in the 1st and the 2nd construction respectively. 1 2 6 2.2 Dilation 2 Embedding of the CCC and the BFN Theorem 1: Let k, l be positive integers, l > k. 1. There is al dilation m 2 embedding of BFN (l) into CCC (k ) with optimum l l ? k load, i.e. k 2 . 2. There is a dilation 2 embedding of CCC (l) into CCC (k) with optimum load. 3. There is a dilation 2 embedding of BFN (l) into BFN (k) with optimum load. Proof: BFN (l) can be embedded into CCC (k) with dilation 2 and optimum load by an obvious choice of d and in the 2nd construction of Section 2.1: Let d be the even distribution in lexicographical order, i.e. d : f0; 1; : : : ; l ? 1g f0; 1gl?k ! f0; 1; : : : ; k ? 1g satisfying d(0; 0l?k ) = 0; d(l ? 1; 1l?k ) = k ? 1; d(i; ) d(i0; 0); if (i; ) (i0; 0) according to the lexicographical order on f0; 1; : : : ; l ? 1g f0; 1gl?k , l l l?k k2 m l ? 1 jd? (j )j kl 2l?k 1 m for all j = 0; 1; : : : ; k ? 1, and choose (i) such that d((i); 1l?k ) = i for all 0 i k ? 1. (This ensures that i ? 1 d((i); ) i for all i; .) As the distribution d is even, f obviously has optimum load. As i ? 1 d((i); ) i for all i; , all the cross-edges (i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g of BFN (l) are mapped onto a path consisting of one cross-edge and at most one cycleedge in CCC (k). As the distribution d is even and in lexicographical order, all the other edges of BFN (l) are mapped onto a path on a single cycle in CCC (k) of length at most 2. Therefore, f has dilation 2. As CCC (n) is a subgraph of BFN (n) [9], there is also a dilation 1 embedding of CCC (n) into BFN (n). Hence, an embedding of CCC (l) into CCC (k) with dilation 2 and optimum load is obtained by rst embedding CCC (l) into BFN (l) and then BFN (l) into CCC (k). An embedding of BFN (l) into BFN (k) with dilation 2 and optimum load can be derived analogously by rst embedding BFN (l) into CCC (k) and then CCC (k) into BFN (k). 7 2.3 Dilation 1 Embedding of the CCC Theorem 2: Let k, l be positive integers, l > k. There is a dilation 1 embedding of CCC (l) into CCC (k) with load 8 < : m l l 2l?k l k m 2p?1 l?k 2 p for for l k 2; 2p?3 l p?1 < k p?1 ; p 2 2 p k: Proof: (A) kl 2 Each of the two constructions of Section 2.1 yields a straightforward way to embed CCC (l) into CCC (k) with dilation 1 and optimum load: In each case, let d be the even distribution in lexicographical order, and for the 2nd construction, choose (i) such that d((i); ) = i for all 2 f0; 1gl?k , 0 i k ? 1. ((0); (1); : : :; (k ? 1) exist because kl 2.) As the distribution d is even, f has optimum load. All the cross-edges (i; ) ? (i; (i)); 0 i k ? 1; (1st constr.) (i; ) ? (i; (i)); i 2 f(0); (1); : : :; (k ? 1)g (2nd constr.) of CCC (l) are mapped onto a corresponding cross-edge in CCC (k). All the other edges are mapped onto a cycle-edge or onto a single node in CCC (k). Note that the edge congestion of the rst construction is at least 2 2l?k and at most 2l?k , that of the second one is at least 2l?k and at most 2l?k . Therefore, the second embedding should be preferred. 5 2 (B) 3 2 p?3 p?1 2 < kl p?1 ; p 2 2pk l m CCC (l) can be embedded into CCC (k) with dilation 1 and load pp? 2l?k by specifying d and in the 2nd construction of Section 2.1 as described below. As already explained in Section 2.1, the 1st construction does not work at all for dilation 1 and small load when kl < 2. The 2nd one still works quite well for dilation 1, but optimum load cannot be guaranteed any longer. The load can only be balanced in certain sections of each cycle (; ) in CCC (k). The aim is to make these sections as large as possible and almost equally long. To achieve this, the values of (0); : : :; (k ? 1) must be spread evenly among 0; 1; : : : ; l ? 1. l m Therefore, let (i) := ilk for ?1 i k (where (?1) = ?1 and (k) = l are dened for formal reasons). Now, each cycle (; ) in CCC (k) is partitioned into sections 8 2 1 B ; B ; : : :; B k?l? where 0 1 2 1 Bj = (ij + 1; ) | (ij + 2; ) | : : : | (ij ? 1; ) | (ij ; ); +1 +1 and i ; i ; : : :; i k?l? are iteratively dened by 0 1 2 1 i = ?1; 8j 2 f1; 2; : : : ; 2k ? l ? 1g : Let ij > ij? such that (i + 1) ? (i) = 2 for all ij? < i < ij , (ij + 1) ? (ij ) = 1. 0 1 1 (Note that i ; i ; : : : ; i k?l? are well-dened because 1 (i + 1) ? (i) 2 for all ?1 i k ? 1 and that i k?l? = k ? 1.) It is easy to verify that 0 1 2 1 2 1 (i + p) ? (i) 2p ? 1 for all ? 1 i k ? p: Therefore, ij ? ij p for all 0 j 2k ? l ? 2 (if ij ? ij > p, then (ij +1+ p) ? (ij +1) = 2p). Now, in section Bj , d can be chosen as the even distribution of f(ij ) + 1; (ij ) + 2; : : :; (ij )g f0; 1gl?k among ij + 1; ij + 2; : : : ; ij in succession, according to the lexicographical order on f0; 1; : : : ; l ? 1g f0; 1gl?k . +1 +1 +1 l +1 m This yields load pp? 2l?k for the embedding f . And it is guaranteed that d((i); ) = i for all 0 i k ? 1, 2 f0; 1gl?k . Therefore, all the cross-edges 2 1 (i; ) ? (i; (i)); i 2 f(0); (1); : : :; (k ? 1)g of CCC (l) are mapped onto a corresponding cross-edge in CCC (k). All the other edges are mapped onto a cycle-edge or onto a single node in CCC (k). 2 It is clear that the above construction yields dilation 1 and optimum load when l = 2p ? 1 : k p It can also be shown that the only other cases with dilation 1 and optimum load are l = k + 1 or (l; k) 2 f(7; 5); (8; 6); (9; 7); (10; 7); (13; 9)g : Therefore, the smallest non-optimal pairs (l; k) with this construction are (8,5), (11,7), (12,7), (10,8), (11,8) and (13,8). 9 2.4 Dilation 1 Embedding of the BFN Theorem 3: Let k, l be positive integers, l > k. There is a dilation 1 embedding of BFN (l) into BFN (k) with load 8 l l > > > < l k m 2l?k m for p? 2l?k for lp m for 2l?k 2 > > > : 2 5 3 l k 2; 2p?4 l p?1 < k l 5 k 3: p?2 ; p 2 7 p 2k; Proof: (A) kl 2 BFN (l) can be embedded into BFN (k) with dilation 1 and optimum load by dening d in the 1st construction of Section 2.1 as described below. Note that the same embeddings as in the proof of Theorem 2 for the CCC network do not work as well for the BFN . The second embedding only yields dilation 2, because the cross-edges (i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g might be stretched to length 2. The rst embedding only achieves dilation 1 and optimum load for kl . If 2 kl < , then the cross-edges 9 4 9 4 (i; ) ? ((i + 1) mod l; (i)); k + 1 i l ? 1 might be stretched to length 2, because the lexicographical distance between (i; ak ak : : : al? ) and ((i + 1) mod l; akak : : : ai : : : al? ) can be up to 2l?k (for i = k + 1). So, if we distribute the nodes on each cycle in BFN (k) according to the lexicographical order, each node of BFN (k) must have capacity more than 2l?k in order to guarantee dilation 1. The problem for 2 kl < can be overcome by using a slightly dierent distribution d for the nodes on each cycle. The idea is to distribute the elements in the rst and the second half of each cycle in a dierent way. Before distributing the nodes evenly in lexicographical order, the crucial bit ai of each node (i; ) = (i; a a : : : al? ), k + 1 i l ? 1 is shifted towards the end of the string in order to reduce the lexicographical distance between (i; ak ak : : : al? ) and (i + 1; ak ak : : : ai : : :al? ), k + 1 i l ? 2. This can be done by reversing : : :al? in thej rstk half of the cycle in BFN (k) (i.e. if j kthe part ak ak l k ki ? 1). In the second half (i.e. if l k i l ? 1), no change is needed. As we will see later on, for the edges +1 1 +1 5 4 1 9 4 9 4 0 1 +1 1 +1 +1 +2 1 1 1 + 2 + 2 (i; ) ? (i + 1; ); (i; ) ? (i + 1; (i)) for i = 10 j l+k 2 k ?1 in the middle of the cycle, it is important to leave the highest bit ak in its original position and only to reverse the remaining part ak ak : : : al? in the rst half of the cycle. Formally, let d : fk; k +1; : : : ; l ? 1gf0; 1gl?k ! f0; 1; : : : ; k ? 1g be the even distribution of the elements of fk; k + 1; : : : ; l ? 1g f0; 1gl?k among 0; 1; : : : ; k ? 1 in succession, according to the lexicographical order on fk; k + 1; : : :; l ? 1g f0; 1gl?k . Then the distribution d is dened as 8 j k l k ? 1; < d(i; a a k l? al? : : :ak ) if kj ik d(i; ak ak : : : al? ) := : d(i; ak ak : : : al? ) if l k i l ? 1: +1 +2 1 +1 1 2 + 2 +1 1 +1 + 2 1 If l = 2k, then f (i; a a : : : al? ) = (i mod k; a : : : ak? ), and the cross-edges 0 1 1 0 1 (i; ) ? ((i + 1) mod l; (i)); k i l ? 1; of BFN (l) are mapped onto a corresponding cycle-edge in BFN (k). Let l 2k + 1. Considering the cross-edge (i; ) ? (i + 1; (i)); k + 1 i j l+k k 2 ? 2; (i; ) = (i; a a : : : al? ) is mapped onto (d(i; ak : : : al? ); a : : :ak? ) = (d(i; ak al? : : :ak ); a : : :ak? ) and (i + 1; (i)) onto (d(i + 1; ak : : :ai? aiai : : :al? ); a : : : ak? ) = (d(i + 1; ak al? : : : ai? aiai : : : ak ); a : : : ak? ): The lexicographical distance between 0 1 1 1 0 1 1 +1 1 0 +1 1 1 1 1 0 1 +1 +1 0 1 (i; ak al? al? : : : ak ) and (i + 1; ak al? : : : ai? aiai : : : ak ) 1 is at most 2 1 d2e @1 + 2 0 l?k +1 +3 1 1 0 l?k 1 A 2l?k @1 + 2 2 1 1 0 (i; ) ? (i + 1; (i)); ' & j l+k 2 k i l ? 2; 11 +1 k+1 1 A 2l?k @1 + 2 1 1 + k 2l?k l ?k k 2l?k l ?k k 2l?k : Likewise, for the cross-edge +1 2 1 A 2l?k (i; ) = (i; a a : : : al? ) is mapped onto (d(i; ak : : : al? ); a : : :ak? ) = (d(i; ak ak : : : al? ); a : : :ak? ) and (i + 1; (i)) onto (d(i + 1; ak : : :ai? aiai : : :al? ); a : : : ak? ) = (d(i + 1; ak : : :ai? aiai : : :al? ); a : : : ak? ): The lexicographical distance between 0 1 1 1 0 1 +1 1 1 0 1 +1 1 1 0 +1 1 1 0 1 (i; ak ak : : :al? ) and (i + 1; ak ak : : :ai? aiai : : : al? ) +1 1 is at most 1 b2c @1 + 2 0 l?k +1 +1 1 0 1 1 l?k 1 A 2l?k @1 + 2 2 A 2l?k +1 1 l ?k k 2l?k : & ' Hence, in both cases the capacity of each node in BFN (k) is sucient to keep all the nodes between (i; ) and (i + 1; (i)). The remaining problem are the edges (i; ) ? (i + 1; ); (i; ) ? (i + 1; (i)) for i = j l+k k 2 ?1 between the rst and the second half of each cycle in BFN (k). In order to guarantee dilation 1 for these edges, the nodes must be aligned in a proper way in the middle of each cycle in BFN (k). This can be achieved e.g. by demanding that the distribution d above be \symmetric" in the rst and the second half of each cycle in BFN (k), i.e. it satises the property j k jd? (j )j = jd? (k ? 1 ? j )j for all j = 0; 1; : : : ; k ? 1: () 1 1 2 Now, considering the edges (i; ) ? (i + 1; ); (i; ) ? (i + 1; (i)) for i = j l+k 2 k ? 1; (i; ) = (i; a a : : : al? ) is mapped onto (d(i; ak : : : al? ); a : : :ak? ) = (d(i; ak al? : : :ak ); a : : :ak? ) and (i + 1; a : : :ai? bai : : :al? ); b 2 fai; aig onto (d(i + 1; ak : : :ai? bai : : :al? ); a : : : ak? ) = (d(i + 1; ak : : :ai? bai : : :al? ); a : : : ak? ): By using kl 2 and property () of d, it can easily be veried (cf. [11]) that 0 1 1 1 0 1 0 1 1 +1 +1 1 0 1 1 +1 1 1 +1 0 1 1 0 12 1 j d l k + 2 k ; a ? d j l+k 2 k ? 1; a 1 for all a 2 f0; 1g; ; 2 f0; 1gl?k? : 1 Hence, the two image nodes of 1. (i; ) and (i + 1; ); i = j l+k 2. (i; ) and (i + 1; (i)); i = k 2 j ? 1; l+k 2 k ? 1; have at most distance 1 on the cycle in BFN (k). (B) 1 < kl < 2 The embedding f of BFN (l) into BFN (k) is described in two stages: 1st Stage: BFN (l) is embedded into BFN (k) with dilation 1 and load 2l?k by specifying d and in the 2nd construction of Section 2.1 as described below. Note that the same embedding as in the proof of Theorem 2 for the CCC network does not work as well for the BFN , because the cross-edges (i; ) ? ((i + 1) mod l; (i)); i 2 f(0); (1); : : :; (k ? 1)g might be stretched to length 2. For dilation 1, the problem is that in the case of the BFN , because of these cross-edges, not only the nodes (i; ); i = (j ); 0 j k ? 1 have to be mapped to level j of a cycle in BFN (k), but also the nodes ((i + 1) mod l; ) have to be mapped to level (j + 1) mod k. So, once 0 (0) < (1) < : : : < (k ? 1) l ? 1 are chosen for the 2nd construction of Section 2.1, the distribution d(i; ) is already determined for all i 2 f(0); (0) + 1; (1); (1) + 1; : : : ; (k ? 1); ((k ? 1) + 1) mod lg: In order to achieve a low load, the values of (0); (1); : : :; (k ? 1) must be spread as evenly as possible among 0; 1; : : : ; l ? 1. The best possible load is obtained by demanding (i) ? (i ? 1) 2 for all 0 i k ? 1; where (?1) = ?1 for formal reasons. ((0); (1); : : :; (k ? 1) exist because kl < 2.) This way, the distribution d(i; ) is dened completely as d((j ); ) = j; d(((j ) + 1) mod l; ) = (j + 1) mod k for all 0 j k ? 1: Then, the dilation of the embedding is 1, and the nodes (i; ) with (i) ? (i ? 1) = 1 have load 2l?k and those with (i) ? (i ? 1) = 2 have load 2 2l?k . 13 2nd Stage: The nodes are locally rearranged between dierent cycles in order to improve the load of the embedding. For this purpose, let us call a node v = (i; ) in BFN (l) an A-node if i 2 f(0); (1); : : :; (k ? 1)g, i.e. the cycle-edge (i; ) ? ((i + 1) mod l; ) is mapped to a corresponding cycle-edge in BFN (k). Otherwise, v is called a B-node, and the cycle-edge (i; ) ? ((i + 1) mod l; ) is mapped to a single node in BFN (k). By choosing (0); (1); : : :; (k ? 1) appropriately in the rst stage, one tries to partition the cycles of BFN (k) into certain segments in which the load can be rearranged between linked cycles. Segment A One type of such segment where a rearrangement is possible is a sequence of nodes BAABA as displayed in Figure 1. z i1 i2 ?1 }| { u u u u u u u u z ?2 }| { u u u u u u u u ?3 z }| { u u u u u u u u z ?4 }| { u u u u u u u u ) @? @? @? @? ?Q@?@Q ?Q@?@Q ?@?@ ?@?@ I , Load 2 2l?k Q?@QQQQ?@QQQ ?@?@ QQQQQQQQ QQQQQQQQ QQQQQQQQ QQQQQQQQ QQQQQQQQ QQQQQQQQQQQQ ) u PPuPPPuPPPuPPP QPuQPPuQPPPuQPPPuPPPuuuu QuQuQuQu I , Load 2l?k i P PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP PPPPPPPPPPPPPPPP PPPPPPPPPPPPPPPP PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP u Huuu Hu HuuuPPPHuPPHuPPuPu PPPHuPPHuPPuPu ) i H u H u HuHHuHu u HuHHuHu u HuHHuHu I , Load 2 2l?k uHHuHu i 3 1 2 4 3 5 1 1 1 1 2 2 2 2 3 3 3 3 Figure 1: Segment A 14 4 4 4 4 Prior nodes of BFN (l) are indicated by black dots and small letters, image nodes in BFN (k) by capitals. Prior edges of BFN (l) are illustrated by lines, edges in BFN (k) are equivalent to four parallel lines. Formally, we have f (i ; j ) = f (i ; j ) = (I ; ?j ); f (i ; j ) = (I ; ?j ); f (i ; j ) = f (i ; j ) = (I ; ?j ); where f is the mapping after the rst stage, and is a synonym for , , or . In the situation of Figure 1, the nodes on 2 groups of cycles f ; ; ; g, f ; ; ; g (where , , 6= stand for , , or ) can be moved as shown in Figure 2. (An arrow indicates that a prior vertex of BFN (l) is transferred from one node in BFN (k) to another.) 1 3 4 2 1 5 3 2 1 ?1 ?2 z}|{ ?3 z}|{ 2 3 4 1 2 3 4 ?4 z}|{ z}|{ o kQQ QQQ QQQQQ3 I + QQQu QQsQu?o ?P u i u P PPPPP6iPPPPP1P6 I PP)PPPqPPPuPPPPPPPu o I i u u i2 u u u u 1 3 4 2 1 2 ?1 3 ?2 z}|{ ?3 z}|{ 3 4 ?4 z}|{ z}|{ o uQ QQQQQ3u QQkQQu I QQsQu? ?u+ QQQu o i u PP P PP I 6iPPPPPPPPPPPPPPPPPP1 6o ) uPPPu PPqPu I i u i2 u 1 3 4 2 1 2 3 4 3 Figure 2: Rearrangement A, 1 element is transferred from I to I and from I to I . By applying Rearrangement A once or twice in the situation of Figure 1, the following results are obtained for each cycle ?i in BFN (k): By rearrangements on 4 (2) prior cycles of BFN (l), 2 (1) elements of the upper and the lower, overloaded nodes I ; I in BFN (k) are transferred to the middle node I . Dilation 1 is kept up if the elements i and i on the cycles i; i; i ; i stay where they are. This means that on these cycles no movements are allowed directly above or underneath the shown section. But all the other cycles are not aected. 1 1 2 3 3 2 2 1 15 5 Segment B The second type of segment we consider is a sequence of nodes BABA as displayed in Figure 3. (For an explanation of the gures below, confer the descriptions for Segment A.) ?1 z i1 i2 }| { ?2 z u u u u u u u u }| { u u u u u u u u ) @? @? @? @? I , Load 2 Q?@QQQQ?@QQQ ?@?@ QQQQQQQQ QQQQQQQQQQQQ u u QQQuQQuQQuQu ) u HHuH i H u uHuHu Hu HHuHHuHu I , Load 2 i 3 4 1 1 1 1 1 2 2l?k 2l?k 2 2 2 2 Figure 3: Segment B Four kinds of moves can be constructed similarly to above (see Figures 4,5,6,7), each aecting 2 prior cycles of BFN (l). Any two of them may also be combined. ?1 z}|{ ?2 z}|{ ?1 o I0 6 o u i uQQQ I 6 kQQQQ o Qu I i u+ 2 3 1 2 ?2 z}|{ z}|{ o 6o I 0 QQQQ3u6 I QQQsQu o u i I 1 i2 2 3 u 1 1 2 2 Figure 4: Rearrangement B1, 1 element is transferred from I to I and from I to I . 2 16 1 1 0 ?1 z}|{ ?2 z}|{ ?1 o QQQu I QQk ? +QQQu o I i u o ? I i2 u 1 i2 2 3 3 2 z}|{ o QQQQ3u I QQQsQu?o u i I ?o I 1 3 ?2 z}|{ u 1 2 1 3 2 Figure 5: Rearrangement B2, 1 element is transferred from I to I and from I to I . 1 ?1 z}|{ ?2 z}|{ 3 u 1 2 ?1 o u I Q k Q Q Q 6 QQQ o + Qu I i u i2 2 ?2 z}|{ z}|{ o QQQQ3u6 I QQQsQu o u I i 1 i2 2 3 2 3 u 1 1 2 2 Figure 6: Rearrangement B3, 1 element is transferred from I to I . 2 ?1 z}|{ ?2 z}|{ ?1 o kQQ QQQ ? +QQQu o I i u i2 3 u 1 u 2 1 ?2 z}|{ z}|{ o QQQQ3 I QQQsQu?o u i I I1 i2 2 3 u u 1 1 2 2 Figure 7: Rearrangement B4, 1 element is transferred from I to I . 1 2 Again, dilation 1 is kept up if the elements i and i on the cycles i; i; i; i stay where they are. This means that on these cycles no movements are allowed directly above or underneath the shown section. But all the other cycles are not aected. 1 4 A combination of the rearrangement techniques outlined above and an appropriate choice of (0); (1); : : :; (k ? 1) in the rst stage lead to the desired load of the nal embedding. 17 From now on, ? @ u u will always be associated with a node in BFN (k) of load 2 2l?k , which is equivalent to a node sequence BA in BFN (l). ? @ u will be associated with a node in BFN (k) of load 2l?k , which is equivalent to an A-node in BFN (l). We distinguish dierent cases: (B1) l k 5 3 I. l ? k even: Let l ? k = 2i + 2; i 0. By the right choice of (0); : : :; (k ? 1) in the rst stage, each cycle can be subdivided into a node sequence (ABAAB )i A. The load is rearranged as follows: +1 ... ? 1 l?k 2 @? @ 31 l?k ?? 2 @ 6 3 ?? @ @ ?? @ 1 2l?k ?@ ? ? @ 31 l?k 2 6@ 3 ?? @ ..@ u u u u u u u u u u 9 > > > > > > = > > > > > > ; i times . An arrow and a number next to it indicate how many nodes are moved from k one place j l ? k to another. In each section of the cycle, by using Rearrangement [A] 2 times (thus j k l m l ? k l ? k l ? k aecting at most 2 2 + 2 2 cycles), a load of 2 is achieved. 1 3 1 3 5 3 18 II. l ? k odd: m l With similar techniques as in Case I, a load of 2l?k can be derived. A detailed discussion can be found in [11]. The only thing which has to be checked in every case is that at most 2l?k cycles of BFN (l) are aected by the rearrangements anywhere in the cycle in BFN (k). (B2) pp?? < kl pp? ; 7 p 2k 5 3 2 2 4 1 2 l m By choosing (i) := ilk for 0 i k ? 1 in the rst stage, each cycle is partitioned into sections of the following two types: Type 2 Type 1 ) u u u u u u u u u @ ?? @ @ @ ?? @ ?? @ @ ?? @ @ ?? @ ) ) u u u u u u u u u u u u u u @??@ @??@ @??@ @??@ @??@ @??@ @??@ @??@ t1 times t2 times ((AB )t1 AAB (AB )t2 AB )+ ) ) t3 times 2t4 times t3 times ((AB )t3 AAB (AB )2t4 AAB (AB )t3 AB )+ where t = 1 j p?4 k ;t = 2 4 j p?6 4 k ;t = 3 j p?1 4 k ;t = 4 j p?3 4 k : Let n=t +t +3= 1 2 j k p 2 ; ( uneven, m = 2t + 2t + 5 = p ?p 1 ifif pp even 3 4 be the length of the two sections above. It is shown below that the load can be distributed optimally in each of these sections, thus yielding a load of 19 l l n?1 2l?k n 2 m in sections of Type 1, m in sections of Type 2. m?2 2l?k m 2 Therefore, the whole embedding has load at most l p?2 2l?k p 2 m . Load Balancing in Sections of Type 1 In sections of Type 1, the load can be distributed evenly by shifting it from the overloaded nodes on the outside to the underloaded nodes in the middle. This is done by applying Rearrangements [B1]; : : :; [B4] in the u u @ ?? @ parts, and by using Rearrangement [A] in the u u u @ ? ? @ @ ?? @ parts. The only thing to make sure is that at most 2l?k cycles of BFN (l) are aected by the rearrangements anywhere in the cycle in BFN (k). It turns out that the cases n 2 f4r + 3; 4r + 4; 4r + 5; 4r + 6g, for r 2 IN , have to be distinguished. Here, we only state the case n = 4r +l3. All them other cases work in a similar way (cf. [11]). In each case, the load derived is nn? 2l?k . 0 2 Let n = 4r + 3, and let be balanced as follows: j k 1 l?k be abbreviated by L. Then the load in each section can n2 1 20 u u u u u u u u @ ?? @ ? ?@ ?? @ 2L ?1L ?@ ?? @ 2L 2L @? ? ??@ 1L ? 1L 1L 2L .. ?. u rL ? @ u?? @ u rL rL @? ? ?u?@ u (r +1) L ?rL ?@ u?? @ (r +1) L 6 6@ @ rL u?? u u?? @ rL 6rL 6@ u rL 6 @ u?? @ .. 1L 6. u @ 1 L 6 6 u?? @ 1L u 1L 6 @ u?? @ u @ u?? @ 9 > > > > > > > > > > > > > > > > > > > > > > > > > > > = > > > > > > > > > > > > > > > > > > > > > > > > > > > ; 9 > > > > > > > > > > > > > > > > > = > > > > > > > > > > > > > > > > > ; 2r times 2r times If r = 0 (i.e. l = 5; k = 3), at most 2L + 2 = 4 2l?k cycles are aected by the rearrangements. If r 1 (i.e. l ? k 6), at most (2rL + 2) + (2(r+1) L + 2) 2l?k cycles are aected. The load derived is 2 2l?k ? L = l n?1 2l?k n 2 m . Load Balancing in Sections of Type 2 In sections of Type 2, the load can be distributed evenly by shifting it in the same manner in the upper and the lower half of each section, namely by shifting it from the overloaded nodes on the outside of each half to the underloaded nodes in the middle. The way this is done has already been explained for sections of Type 1. It turns out that the cases m 2 f8r + 7; 8r + 9; 8r + 11; 8r + 13g, for r 2 IN , have to be 0 21 distinguished. Here, we only state the case m = 8r +l7. All them other cases work in a similar way (cf. [11]). In each case, the load derived is mm? 2l?k . 2 Let m = 8r + 7, and let be balanced as follows: u u u u @ ?? @ 1L ?@ ?? @ 1L ? 1L ? 1L .. ?. u @ rL ? u?? @ u rL ? rL ?@ u?? @ u rL @? (r +1) L ? ?u?@ l m @ (r +1) L 6 @ (r + ) L 6u?? m u l (r + ) L 6@ 6 u?? @ rL u @ rL 6 u?? @ .. . 1L 6 u 1L 6 l Lm 6@ u?? @ m u l L 6@ @ u?? 1 2 1 2 1 2 1 2 .. . j 2 k l?k be abbreviated by L. Then the load in each section can m2 2 9 > > > > > > > > > > > > > > > > > = > > > > > > > > > > > > > > > > > ; 9 > > > > > > > > > > > > > > > > > = > > > > > > > > > > > > > > > > > ; ) 2r times 2r + 1 times reverse to above with b c instead of d e u u @ ?? @ l m As (2 (r + ) L + 2) + (2(r +1) L + 2) 2l?k (for l ? k 5), at most 2l?k cycles are aected. 1 2 The load derived is 2 2l?k ? L = l m?2 2l?k 2 m 2 22 . 3 Experimental Results To program a distributed system, one has to specify a conguration map of the network topology. This map describes the process-processor mapping, the communication channels between the processes, and the mapping of the logical communication channels (between the processes) to the physical communication channels (the link connections between the processors). Our compression tool takes any conguration map for a logical CCC network of dimension l and builds a conguration map for a physical CCC of any dimension k l by applying the algorithms of chapter 2. Since the edge congestion and dilation are greater than one in some cases, we have to integrate multiplexing and routing processes in the program for the target CCC . This is done automatically by our tool. The user only has to describe the conguration of his logical CCC and to insert the dimension of his physical CCC . So a number of user processes are mapped to one processor and run in parallel to routing and multiplexing processes. By this compression, a lot of adjacent processes are mapped to the same processor which results in a faster communication between them. To measure the overhead which is caused by the additional routing and multiplexing processes, we rst wrote dierent user programs Pk consisting of k 2k processes congured as a CCC of dimension k. Every program performs x iterations each one consisting of y internal dummy operations and one communication with a process of the network. We executed program P on CCC (3) and the compressed version of P on CCC (2). We also ran P on CCC (2) and the compressed version of P on CCC (1). 3 3 2 2 Table 1 shows the execution times in seconds and the resulting overhead for dierent x and y. One can see that the overhead is even better than the optimal load factor for the corresponding embedding (i.e. 4 when embedding CCC (2) into CCC (1) and 3 when embedding CCC (3) into CCC (2)), if there is a large amount of communication between the processes. This is due to the fact that communication between processes on the same processor is approximately 2.5 times as fast as communication between linked processors. Table 2 shows results for a distributed branch & bound algorithm solving the Vertex Cover Problem (cf. [13]). We tested the algorithm for 10 instances on CCC (2) and after the compression for the same instances on CCC (1). The results show that there is no overhead at all because of the huge amout of communication for this algorithm. 23 y x CCC (2) CCC (2 ! 1) CCC (3) CCC (3 ! 2) 10000000 1 280.19 1160.27 4.141 855.36 2566.55 3.000 1000000 10 280.43 1161.22 4.140 857.00 2573.24 3.002 100000 100 282.83 1172.08 4.144 873.44 2629.53 3.010 10000 1000 307.00 1276.25 4.157 1030.26 3189.78 3.096 1000 1000 57.64 233.31 4.047 335.86 900.26 2.680 100 10000 439.48 1333.64 3.034 3189.14 8341.35 2.615 10 10000 433.57 1268.10 2.924 3182.23 8339.98 2.620 1 100000 4328.18 12641.41 2.920 31850.16 83329.14 2.616 Table 1: Compressing an Arbitrary Distributed Algorithm ID CCC (2) CCC (2 ! 1) 0 65.91 263.85 4.003 1 46.48 175.14 3.881 2 59.38 234.04 3.940 3 56.55 223.29 3.948 4 38.75 153.34 3.956 5 73.02 285.20 3.905 6 47.55 187.30 3.938 7 46.23 179.91 3.891 8 66.73 266.04 3.986 9 57.35 203.50 3.548 Table 2: Compressing a Branch & Bound Algorithm 24 4 Conclusion In this paper, new solutions to the problem of simulating large networks on smaller ones are proposed for the cube-connected cycles (CCC ) and the buttery network (BFN ), two classes of networks which are of major importance for practical applications. It is demonstrated that the problem can be solved very eciently for these networks, i.e. compared to the inherent slowdown, the simulation only takes a little amount of additional running time (due to a communication overhead or to an imbalanced distribution of the workload). For an analysis, the network simulation problem is modeled as a graph embedding problem, transferring the desired properties of the simulation into properties of the corresponding embedding (dilation, load, edge congestion). A number of embeddings are presented with small dilation and/or small load, leading to the main simulation result above. In addition to these properties, we show that our techniques behave very well for real applications. Therefore, one might say as a summary that a lot of fundamental applications for large CCC 's and BFN 's can be implemented very eciently on a network of realistic size by using the results of this paper, thus underlining the fact that the CCC and the BFN are very powerful interconnection structures for computer architectures in parallel processing. An interesting question to investigate is whether the non-optimal dilation 1 embeddings of large CCC 's and BFN 's (of dimension l) into smaller ones (of dimension k) can still be improved with respect to their load or whether the embeddings presented can be shown to be optimal. In some special cases, e.g. (l; k) = (8; 5) for the CCC and (l; k) = (3; 2), (l; k) = (6; 4) for the BFN, it is possible to achieve optimum load by applying similar techniques as described in this paper (see [11]). We are also investigating a new idea which might lead to an improvement in many cases when embedding CCC (l) into CCC (k), l=k < 2, e.g. for (l; k) = (11; 7). But especially for the BFN , even for simple cases such as the embedding of BFN (4) (with 64 processors) into BFN (3) (with 24 processors), it is not clear whether the load can be improved any further. Finally, a further study should also consider the edge congestion of the embeddings and try to minimize it while keeping up the same dilation and load. Acknowledgement The authors would like to thank Juraj Hromkovic for helpful discussions and a careful reading of the manuscript. 25 References [1] M.J. Atallah, S.R. Kosaraju, "Optimal simulations between mesh-connected arrays of processors", Journal of the ACM, vol. 35 (1988), pp. 635-650. [2] S.N. Bhatt, F.R.K. Chung, J.-W. Hong, F.T. Leighton, A.L. Rosenberg, "Optimal Simulations by Buttery Networks", Proceedings of the 20th ACM Symposium on the Theory of Computing (1988), pp. 192-204. [3] F. Berman, L. Snyder, "On mapping parallel algorithms into parallel architectures", Journal of Parallel and Distributed Computing, vol. 4 (1987), pp. 439-458. [4] H.L. Bodlaender, "The classication of coverings of processor networks", Journal of Parallel and Distributed Computing, vol. 6 (1989), pp. 166-182. [5] H.L. Bodlaender, J. van Leeuwen, "Simulation of large networks on smaller networks", Information and Control, vol. 71 (1986), pp. 143-180. [6] J. Ellis, Z. Miller, I.H. Sudborough, "Compressing meshes into small hypercubes", Technical Report, Computer Science Program, University of Texas at Dallas, Richardson, TX 75083-0688. [7] M.R. Fellows, "Encoding graphs in graphs", Ph.D. Dissertation (1985), University of California at San Diego. [8] J.P. Fishburn, R.A. Finkel, "Quotient networks", IEEE Transactions on Computers, vol. C-31 (1982), pp. 288-295. [9] R. Feldmann, W. Unger, "The Cube-Connected-Cycle is a subgraph of the Buttery network", Department of Mathematics and Computer Science, University of Paderborn, Germany, 1990, submitted for publication. [10] A.K. Gupta, S.E. Hambrusch, "Embedding large tree machines into small ones", Proceedings of the 5th MIT Conference on Advanced Research in VLSI (1988), pp. 179198. [11] R. Klasing, "Simulating large cube-connected cycles and large buttery networks on smaller ones", Master Thesis (1990), Universitat-GH Paderborn, Fachbereich 17 { Mathematik/Informatik, Germany. [12] R. Klasing, R. Luling, B. Monien, "Compressing cube-connected cycles and buttery networks", Proc. 2nd IEEE Symposium on Parallel and Distributed Processing, pp. 858-865, 1990. [13] R. Luling, B. Monien, "Two strategies for solving the Vertex Cover Problem on a Transputer Network", 3rd International Workshop on Distributed Algorithms 1989, LNCS 392, pp. 160-170. [14] B. Monien, "Simulating binary trees on X-trees"' Proc. 3rd ACM Symposium on Parallel Algorithms and Architectures (SPAA 91), pp. 147-158, 1991. 26 [15] B. Monien, I.H. Sudborough, "Embedding one Interconnection Network in Another", Computing Suppl. 7 (1990), pp. 257-282. [16] P.A. Nelson, L. Snyder, "Programming solutions to the algorithm contraction problem", Proceedings of the 1986 International Conference on Parallel Processing, pp. 258-261. [17] R. Peine, "Cayley-Graphen und Netzwerke", Master Thesis (1990), Universitat-GH Paderborn, Fachbereich 17 { Mathematik/Informatik, Germany. [18] F.P. Preparata, J.E. Vuillemin, "The cube-connected cycles: a versatile network for parallel computation", Communications of the ACM, vol. 24 (1981), pp. 300-309. [19] A.G. Ranade, "How to emulate shared memory", Proceedings of the 28th IEEE Symposium on Foundations of Computer Science (1987), pp. 185-194. [20] A.L. Rosenberg, "Graph embeddings 1988: Recent breakthroughs, new directions", Proceedings of the 3rd Aegean Workshop on Computing (AWOC): VLSI Algorithms and Architectures (1988), LNCS 319, pp. 160-169. [21] J.T. Schwartz, "Ultracomputers", ACM Transactions on Programming Languages and Systems, vol. 2 (1980), pp. 484-521. [22] H.S. Stone, "Parallel processing with the perfect shue", IEEE Transactions on Computers, vol. C-20 (1971), pp. 153-161. 27
© Copyright 2026 Paperzz