A Class of Redundant Path Multistage Interconnection Networks

1099
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-32, NO. 12, DECEMBER 1983
A
Redundant Path Multistage
Interconnection Networks
Class of
KRISHNAN PADMANABHAN AND DUNCAN H. LAWRIE, SENIOR
Abstract-A general class of fault-tolerant multistage interconnection networks is presented, wherein fault-tolerance is achieved by
providing multiple disjoint paths between every input and output. These
networks are derived from the Omega networks and as such retain all
the connection properties of the parent networks in the absence of
faults. An R-path network in this class can tolerate (R-1) arbitrary
faults in the intermediate stages of the network at a cost that is far less
than providing R copies of the original network. Different techniques
for constructing such networks are presented and relevant properties
and control algorithms are investigated.
Index Terms-Array processors, fault tolerance, interconnection
networks, multiprocessor systems, nonblocking networks, Omega
networks, parallel processing, redundant path networks.
I. INTRODUCTION
A PLETHORA of interconnection networks have been
proposed in the last quarter of a century for applications
in telephone switching and more recently, in multiple processor
systems. We shall be focusing our attention in this paper on
a class of networks called multistage networks, particularly
suited to communication between processor and memory units
operating in a tightly coupled fashion. Many such independently developed networks have been shown to be topologically
equivalent, i.e., their graphs are isomorphic [14]. For our
purpose, we shall consider as a representative of this class the
Omega network in [8]. The graph of such a network has the
property that between any input node and any output node,
there is a unique path made up of switching nodes. Breakdown
of any such node or an edge thus makes some outputs inaccessible to certain inputs. We propose, in this paper, a class of
networks derived from the Omega network as a solution to the
fault-tolerance problem. Their close relation to the Omega
topology helps maintain all the connection properties of the
parent network.
There are, in general, three different techniques to provide
tolerance to faults in a network. (We shall use the term network in this paper to refer to a multistage interconnection
network or its graph.) In what we call a purely software approach, error-detecting/correcting codes are used for com-
Ili,
Manuscript received October 19, 1982; revised July 14, 1983. This work
was supported in part by the National Science Foundation under Grants
MCS81-00512 and MCS80-01561, the U.S. Department of Energy under
Contract DE-AC02-81ER10822, and by the Department of Computer
Science at the University of Illinois, Urbana-Champaign, IL.
The authors are with the Laboratory for Advanced Supercomputers, Department of Computer Science, University of Illinois at Urbana-Champaign,
Urbana, IL 61801.
MEMBER, IEEE
munication between modules. Such schemes [11], [4] require
little addition to existing network topology and minimal addition to the hardware required. They ordinarily do not tolerate
faults in the control portion of the network, detecting and
correcting errors, if any, only in the data that is transferred.
In a hardware/software fault tolerance approach as in [9],
redundancy is obtained using a combination of encoding and
voting. All control signals are encoded in triplicate (or quadruplicate) and hardware is roughly tripled. A variation, that
involves a network within a network idea, is to detect the
presence of a fault and automatically switch to a spare module
[10]. The third technique for achieving fault tolerance is to
provide more than one path to get from a source to a destination, either by augmenting the existing topology, as in [1] or
[5], or by devising a multistage topology that inherently has
this property, as in [3] or [6] and [13]. The latter is the approach we shall use in this paper. In such a scheme, if a fault
is detected anywhere along one path, an alternate one could
be automatically chosen. Note that we are interested in a direct
route from a source to a destination. If the-processor and
memory modules are all connected to the input (and output)
of the network, resulting in a network that is twice as large, it
is possible to avoid a faulty node by routing the data to a different (intermediate) processor and then on to the desired
memory; this is similar to the scheme proposed in [7] for partitioning the modules in a multiprocessor system. Such schemes
require central control of some sort in addition to the larger
network.
One way to augment an Omega network to introduce an
additional path between every input and output is to add an
extra stage of switches to the network, a technique presented
in [ 1] and [5]. In an Omega network, the paths from a source
S and from the source S, the bitwise complement of S, to a
destination D are nonoverlapping, except for the last switching
node in the paths [8]. The extra stage of switches permits
source S to access both S and S paths in the original network,
thereby providing two paths to get to D. The extra stage approach will be seen to be a specific instance of the class of
networks we present in this paper.
The rearrangeable network in [3] and the data manipulator
network in [6] possess the multiple path property. In the second
network the number of paths is a function of the source-destination pair [13]. In addition, the paths between a source and
destination are not all edge or vertex disjoint. In this paper we
are specifically interested in disjoint paths that we formally
characterize in the next section. Networks with multiple
0018-9340/83/1200-1099$01.00 © 1983 IEEE
11 oo
IEEE TRANSACTIONS ON COMPUTERS, VOL.
C-32, NO. 12, DECEMBER 1983
nondisjoint paths between inputs and outputs are considered
in [12].
Section 1I of this paper presents some theoretical preliminaries and introduces the modified Omega networks. These
are uniform networks in the sense that all switching elements
in the network are of the same size. In Section III we present
an alternate technique for deriving multiple disjoint paths and
the modified Omega networks will be seen to form a subset of
that class. We consider the connection capabilities of these
networks and show that in the absence of faults they can realize
all the permutations that the Omega networks realize. Section
IV summarizes the results of the paper.
II. MODIFIED OMEGA NETWORKS
Omega networks were introduced in [8] based on an Q-base
representation of numbers. In its simplest form, an N-input
N-output Omega network, where N = 2n, consists of n stages
of 2 X 2 switching elements. Adjacent stages of switches are
connected by means of the perfect shuffle connection. In a
more general version, a Bn X B" Omega network is constructed
using B X B crosspoint switches and B * Bn- I shuffles interconnecting the stages. A P * Q shuffle is the permutation of
PQ elements defined as follows:
7r(i)
=
(Pi+ lI)
0<i< PQ- 1.
.Q. mod PQ
A 16 X 16 Omega network constructed out of 4 X 4 switching
elements is shown in Fig. 1, where a 4 * 4 shuffle interconnects
stages of switching elements.
In its most general form an Omega network of N inputs and
N outputs, where N is an arbitrary integer, is constructed out
of a complete set of factors of N and the sizes of the switches
used correspond to the set of factors. (Thus, when N is a power
of 2, 2 X 2 switches suffice.)
In this paper, we shall consider N to be a power of 2 and the
size of any switching element used to construct the network
also to be a power of 2. Such networks can conveniently be
referred to as binary networks. Generalization of what follows
to nonbinary networks is possible [ 12]. We now introduce the
concept of a modified Omega network.
Definition 1: Let N = 2" and B = 2b. Given that B divides
N, a modified Omega network consists of rlOgB N] identical
stages of B X B switching elements. Each stage consists of N/B
switches and stages are interconnected by the B * N/B
shuffle.
Thus, a modified Omega network is a uniform network with
all switches of size B. A 32 X 32 modified Omega network
constructed out of 4 X 4 switches is shown in Fig. 2.
Note that the modified Omega network does not require N
to be a power of B; in fact for the purposes of this paper, it is
critical that it not be. When N = Bk, the modified Omega
network is the same as the Omega network.
When considered as a graph, an input-output path in an
Omega network (standard or modified) consists of rlOgB N]
+ 2 nodes and rlOgB N] + 1 (directed) edges. The first node
in any such path is a source, the last a destination, and all the
intermediate nodes are switching elements, one per stage. In
an Omega network such as that in Fig. 1, where N = Bk, there
-
---\-I-
Fig. 2. A modified Omega network of 32 inputs and 32 outputs. The
solid and dashed edges indicate the two sets of disjoint paths.
is exactly one path between any source and a destination.
(Input and output nodes are not drawn in the figures; each
switch constitutes a node in the graph.) On the other hand, in
the modified Omega network in Fig. 2, there are two ways to
get from a source to a destination (shown by' the solid and
dashed lines in Fig. 2). To characterize such multiple paths
between inputs and outputs, we have the following definition.
Definition 2: Two (different) paths from an input to an
output in an Omega network are said to be disjoint iff they
share exactly four nodes-the first two and the last two.
Two disjoint paths, e1 -e2-e3-e6 and e I -e4.-e5-e6 are shown
l l1
PADMANABHAN AND LAWRIE: REDUNDANT PATH MULTISTAGE INTERCONNECTION NETWORKS
in Fig. 2 connecting source 8 to destination 12. Two disjoint
paths can be identified in that figure for every source-destination pair. The definition of disjoint paths lead us to the
concept of multipath Omega networks.
Definition 3: A modified Omega network is a multipath
network if there exist multiple disjoint paths between every
source and every destination. In addition, the number of such
disjoint paths is the same for any source-destination pair. This
number is termed the redundancy of the modified Omega
network.
In the rest of this paper, we shall refer to Omega networks
as defined in [8] as 1-path Omega networks. Modified Omega
networks with a redundancy of R will be called R-path Omega
networks. A 4-path 16 X 16 Omega network is shown in Fig.
3. It is constructed using 8 X 8 crosspoint switches and the 8
* 2 shuffle connection.
Multiple disjoint paths between sources and destinations
will be our basis for providing redundancy and fault-tolerance
in Omega networks. Since redundant paths are edge-disjoint
(see Fig. 2), failure of any number of internal links of one type
(solid or dashed) will not prevent a source from accessing a
destination. In general, in an R-path network, the set of all
internal edges in the network can be divided into R disjoint
classes and as long we have all the edges in any one class operative, any source can access any destination. Thus, in Fig.
3, all straight, dashed, and dotted edges could break down,
operative and we would
leaving only edges of the type
still be able to provide connection between any input and any
output in the network. Note that since faulty links reduce the
effective number of paths that can be supported by the system,
blocking within the network would increase, but there would
still be a path between every source and every destination. The
above discussion can be carried over to node failures. We can
see that at least (R - 1) complete node failures can be tolerated in the intermediate stages of the network; partial switch
failure can be tolerated anywhere in the system. A failure of
a switch in the first stage is partial if it is still possible to get
from any input port in the switch to all output edges of one class
(solid or dashed, in Fig. 2). A failure in the last stage is partial
if it is still possible to get from every input port of one class to
all the output ports in the switch. In large modular switches,
partial failures are much more probable than total failures. The
effect of switch failures is considered in some more detail in
Section II-B.
We envisage switching elements that can detect the inaccessibility of any output port (within the switch) and signal the
presence of the fault back to the switches in the previous stage.
Such a switch that can keep functioning (with reduced competence) in the presence of faults falls under the general class
of systems termed "gracefully degradable" in [2]. If a switch
failure is total, the previous switching element can detect, from
the absence of proper handshaking, this faulty condition.
Implementations of such switches are considered in [ 12] and
will not be our main concern in this paper. Every source keeps
track of the set of paths it is using to get to the destinations, and
when it receives notice of a fault in one of those paths, it
switches to one of the alternate (R - 1) sets of paths. The
choice of the appropriate set of paths will be specified in the
following sections.
Fig. 3. A 4-path 16 X 16 modified Omega network.
Denote rxl - x by [x], where x is a real number. The following theorem formally characterizes the redundancy of a
modified Omega network.
Theorem I. Let N = 2n and B = 2b. The modified Omega
network is an R-path Omega network, where
R = B[n/bl.
Proof: b bits are needed to set a switch in one stage and
hence a destination tag must consist of brlogB N] bits to be
able to set up an entire path. As in the case of unipath Omega
networks [8], an input-output path is specified here by concatenating the brlogB N] destination bits to the n source address bits:
Sn-1
sOs1
**
...
*dod. dn--
(1)
It is easily seen from (1) that the number of extra bits available
in the destination tag (denoted by *'s) is brlogB N - n, resulting in
2b[1ogB N]-n = BrIogB NI-1ogB N = B[n/b]
paths of the form (1).
What is not so obvious is that the paths represented by (1)
are indeed disjoint. To prove that we first note that the output
terminal (of a switch) that a path occupies at stage i(0 < i <
r logs N]) is given by the n-bit window in (1) starting at bit
position bi:
SOS I
..
SbiSbi+ I
...
Sn- I......
dn-1-
For instance, referring to Fig. 2 and the path el-e2-e3-e6,
which is
01000001100,
IJj
S *D
the terminals this path occupies at different stages are given
by
terminal
01000 (e1 = S)
00000 (e2)
2
00011 (e3)
0l1 00 (e6= D)
3
We now show that the brlogB N] - n extra bits are a part
of every window in (1) representing a link, except the input and
stage
0
102
IEEE TRANSACTIONS ON COMPUTERS, VOL.
output links. Then if these bits do not match in two paths, these
two paths will never share a link within the network and hence,
will be disjoint.
Note that
brlogBN] -n<b.
(2)
Therefore, the extra bits are a part of the n-bit window corresponding to stage 1. Window rlogB N - I starts at bit position brlogB N - b and from (2) this is less than n so that this
window also includes the extra bits. By monotonicity, windows
corresponding to all intermediate stages must also do the
n)rnod
b-
Proof:
R = B[n/b]
R = I when (n
R
=
O)mod b
=
(N is a power of B)
BN
-when(n
-
i)modb
=
22
isapowerofB
.
Ingeneral,n=pb+q 0 <q <b-1.
Hence b = + q.
'b
b'
p
r-i
b
=
p+
nIb-bn
b
R
b
=
1
if R > 1.
q
=
1-bqb
B I -qlb = _ .
Q.E.D.
2q
Note that since q > 1 (for R > 2), R < B/2. This gives an
bound on the achievable redundancy in uniform networks. Thus, higher redundancies require larger switches to
be used. While this results in more disjoint paths, larger
switches have a higher failure rate due to their increased
complexity. The effect of switch sizes and redundancy on the
reliability of the system is considered in [12].
Corollary 2: For a given N(= 2n), the possible values of
redundancy R = 2r are given by
upper
n
-
(p + 1)q
p
p,q
1.
Proof- From Corollary 1, the following two conditions
must be satisfied:
n
=pb+q
p,q > 1
NO.
12, DECEMBER 1983
B = 2qR.
(4)
From (3),
i.e., B
=
2(n-q)/p
or
BP
= -.
Substituting (4) into this expression we get
same.
Thus, all paths of the form (1) between source S = so
Sn- I and destination D = do... dn- I are disjoint and the redundancy is given by
R = B[n/b].
Q.E.D.
Figs. 2 and 3 provide instances of this theorem. The following two corollaries give the inverse of Theorem 1; Corollary
1 specifies what value of B (if any) can be chosen for a desired
value of R in an N X N network and Corollary 2 characterizes
the possible values of R in a uniform N X N network.
Corollary 1: To get a redundancy R (>1) in a uniform N
X N Omega network, B must be of the form 2iR, where (i =
C-32,
(2qR)P
(2q2r)P.
2q
=
=
-
2q
2n.
This gives r = (n - (p + I)q)/p for somep, q > 1. Q.E.D.
As an application of these corollaries consider a 1024 X
1024 network and assume we desire a redundancy of 4 (R =
4, r = 2). Corollary 2 requires
10- (p + l)q
2
=
p
so
that the possible values of p and q are
p =
q =
4
(B
=
64)
=
16)
p =
2
q =
2
(B
p =
3
q =1
(B
=
8).
On the other hand, if N = 32 and we need a redundancy of
2, we can use either 4 X 4 switching elements (shown in Fig.
2) or 8 X 8 switches. In general the only way to obtain a redundancy of N/4 (the maximum possible) is to use N/2 X
N/2 switches as shown in Fig. 4. (Recall that R < B/2.)
Table I gives the possible values of redundancy that can be
obtained in a uniform N X N network using this scheme and
the sizes of the switches to be used.
A. Permutation Capability and Control Scheme
In this section we consider the operation of the multipath
network under a no-fault situation. There are two aspects to
the operation of the network under this condition. First, such
a network should be able to pass every permutation that a 1path network does. Second, the blocking (due to conflicts) in
such a network should be no more than that in the corresponding 1-path network. When a source is forced to take an
alternate path because a "regular" one is down, it could interfere with (or block) another input that would normally have
taken that path. To avoid this from happening under no-fault
conditions, we have to ensure that certain sources follow only
certain paths and not others. For instance in Fig. 2, when the
connections 0-0 and 8-12 are desired, the two connections
cannot simultaneously use the paths e7-e2-e8-e9 and ele2-e3-e6. Since a source can take one of multiple paths in
getting to a destination, we need to specify an algorithm for
choosing an appropriate path.
Since B and N are powers of 2 and B is a factor of N, a 1path network can be constructed using B X B and K X K
switches, where
I1103
PADMANABHAN AND LAWRIE: REDUNDANT PATH MULTISTAGE INTERCONNECTION NETWORKS
N*2 Shuffle
Fig. 4, A modified Omega network with 'redundancy of N/4, the
maximum possible.
TABLE I
POSSIBLE VALUES OF REDUNDANCY IN UNIFORM NETWORKS
B(smallest)
R
N
2
2
2
4
8
1
I
2,1
16
32
64
128
256
512
1024
2048
4,1
8,2,1
16,4,1
32,8,4,2,1
64,16,4,2,1
128,32,8,2,1
4,2
256,64,16,4,1
512,128,32,8,2,1
8,2
16,4,2
32,16,2
64,32,8,4,2
128,64,32,8,2
256,128,16,4,2
512,256,128,8,2
1024,512,256,128,4,2
N
Fig. 5. An (I-path) Omega network of 32 inputs and 32 outputs.
sols I
s2s3s4dojd1 d2d3d4
I -path network
sos I3j4§37 j d d2d3d4 2-path network
Terminals that these paths occupy at the output of stage 1
(5) are indicated by the boxes. From this it can be seen that if we
BL10gB NJ
rotate the source address bits right by one position (in the
Such a network [8] consists of one stage of K X K switches second case) and set * = S4, the two windows would match. In
followed by LlogB NJ stages of B X B switches. (We assume addition, all following windows would also match in the two
that N is not a power of B.) The first stage is preceded by the paths meaning the paths would be identical in the two networks
K * N/K shuffle and the remaining stages by the B * N/B beyond this point. Rotating the source bits is a fixed permushuffle connection. An example of a 1-path 32 X 32 network tation that can be accomplished by a connection in front of the
constructed using 2 X 2 and 4 X 4 switches is shown in Fig. 5. network. With this permutation, source s0s 1S2S3S4 is connected
Comparing this with the corresponding 2-path network in Fig. to terminal S4S0SIS2S3 at the (input to the) first stage of the
2, we see that except for the first stage, the two networks are network. If this source sets the extra tag bit * to S4, the path
identical. This is true in the general case and is a property of from the input of the network to the output is given by
constructing a 1-path network using B X B and K X K switches
54505 S2s3s4dodX d2d3d4
where K is defined in (5).
With this observation, one way to ensure that a multipath and this path occupies the same terminals at the output of each
Omega network passes every permutation that a 1-path net- stage as does the path in the 1-path network.
Denote the permutation obtained by a k-bit right rotation
work does is to ensure that an input-output path occupies the
same terminal at the output of stage 1 in the former as it does by Fk. The following theorem generalizes the above idea.
Theorem 2: Let log R = r. If a multipath Omega network
in the latter network. In an R-path network, a packet (or path
being created) on leaving the source could fork in one of R is constructed with a lr permutation in front and every source
ways at the first stage. Upon leaving the first stage of switching sOs]. Sn1_ employs the tag Snr
Sn-Idod,d ... d,-I to
...
elements, there is exactly one path the packet has to follow to access the destination dodI dn- 1, then the multipath Omega
reach the destination. If this portion of the path can be made network will, in a no-fault condition, pass every permutation
identical to that in the 1-path network, the permutation will that the corresponding 1-path Omega network does.
Proof: Consider a permutation wr = {(S, D)} that is passed
be passed by the multipath network.
For illustration, consider again the 32 X 32 networks in Figs. by the 1-path Omega network, and a path S -D.
The 'r permutation in the multipath network takes source
2 and 5. Paths between input S and output D in the two networks are given by
Sn-r
Sn- I tO Sn..n-I So Sn-r-1. Following
SOSI
K=
...
...
...
..
11l04
IEEE TRANSACTIONS ON COMPUTERS, VOL.
C-32, NO. 12, DECEMBER 1983
this permutation, a path in the network is given by
Sn-r ..Sn- SoS I.. Sti
-r- I Sti-r
..Sn- I
IL-J
**
..
...
dn- 1,
***
where the extra tag bits are set as in the statement of the theorem.
As observed in the proof of Theorem 1, the output terminal
such a path will occupy in stage 1 is given by the n-bit
window
(6)
Sb-r ... Sn- A ... db-r- I
It can be verified that b - r = log K where K is defined as in
(5) so that (6) is also the terminal that the path from S to D
would occupy in a 1-path Omega network at the output of
stage 1. This is true of every output terminal (and every
input-output path) in stage 1 so that no two paths would occupy the same terminal in stage 1.
Successive windows corresponding to terminals in the remaining stages are identical in the two networks so that no
conflict can occur in the multipath network if none occurs in
the 1-path network.
Thus, the multipath network will pass any permutation r
Q.E.D.
that the 1-path network passes.
This theorem tells us that an R-path Omega network (with
a bit-rotate permutation in front) in a fault-free situation will
be able to emulate a 1-path network if sources use their r least
significant bits as the extra tag bits. A B * N/B shuffle is
equivalent to rotating the source address bits b positions to the
left while a permutation rotates it r positions to the right.
Hence, the net result is a b - r bit left rotate in front of the first
stage of switches. If R = B/2 (r = b - 1), this is the perfect
shuffle. An example of a 2-path network realizing the identity
permutation is shown in Fig. 6. It can be seen that when tags
are set as in Theorem 2, the settings of the switches (in all
stages except the first) are identical to those in the corresponding 1-path network.
1r 1 Connection
Perfect Shuffle
Fig. 6. A 2-path 16 X 16 Omega network with switches set up to realize
the identity permutation.
one-to-one correspondence between them and the R classes
of connections in any permutation. It is now readily seen that
if there are faults (any arbitrary number) in any one of these
classes of nodes or edges, all the connections associated with
that class can be transferred to one of the other R - I classes
just by changing the extra tag bits. This will, of course, require
an additional pass through the network. Consider, for example,
the 2-path network in Fig. 2. There are two classes of (intermediate) nodes and edges, indicated by the solid and dashed
lines. If any number of nodes or edges of one class should break
down, we can transfer all the paths of the affected class to the
fault-free class of switches and edges. The following theorem
results from this discussion.
B. Permutation Capability in Presence of Faults
Theorem 3: Let there be i classes of switches and edges that
It is clear that in the presence of any fault at all, the modified contain faults in the R-path network. The number of passes
Omega network will not be able to realize a permutation in one needed to pass a permutation (realized by the 1-path network)
pass through the network. By a pass we mean a set of input- is given by
output connections in the network that do not conflict with
each other. It is possible that more than one pass could com[R -i+ 1
plete all the connections in a permutation. We now characnumber of faults in any class.
of
the
and
is
independent
terize the number of passes that will be needed to complete a
In
capability of the network is impermutation
general,
permutation that the fault-free network could satisfy in one
used in a system operating in a
is
network
when
the
portant
pass.
As mentioned earlier, the set of input-output paths required single-instruction-multiple-data stream (SIMD) mode. In a
multiple-instruction-multiple-data stream (MIMD) mode of
by a permutation is given by
operation, the sources make their requests at random. In such
dod, dn- 1
Isos I.. Sn-I l*
a case, the effect of faults is much less severe (on the whole)
r
than in the previous case. Only the affected path(s) need be
where the extra tag bits are characterized by Theorem 2. The rerouted, rather than all the paths using the affected class of
set of N paths is thus divided into R subsets with all the paths switches. The number of tries that an input will have to make
in a subset employing the same extra tag bits and not sharing to get to the destination in the presence of a fault is now a
any link or node except possibly at the first and last stages. random variable, bounded above by the expression in Theorem
Furthermore, the set of intermediate switching nodes and edges 3. The mean value of the number of tries will be much less than
in the network can be divided into R disjoint classes with a this bound.
Fr
.
PADMANABHAN AND LAWRIE: REDUNDANT PATH MULTISTAGE INTERCONNECTION NETWORKS
It now remains to characterize the permissible faults in the
first and last stages of switches. Since these switches are connected to sources or destinations, their complete breakdown
will cut off a source or destination from the network, unless
some extra hardware is added to the network [1]. In large
modular switches, the probability of a complete failure is extremely small and in most cases a gracefully degrading
switching element is easy to implement. By a fault in a switch,
we mean that an output port (within the switch) is not accessible to an input port. Complete and partial failures in the first
and last stage switches are characterized in Section II. It is
possible to divide the B (input or output) terminals in such a
switch into R classes, determined by the edge connected to
each terminal. If these are now included in the statement of
Theorem 3, then it is general enough to include all switches in
the network.
111. THE FOLDING APPROACH TO DERIVING MULTIPLE
DISJOINT PATHS
We now consider an alternate approach to building networks
with redundant paths between inputs and outputs. We do this
by starting with a larger Omega network than what is needed
and collapsing it to the desired size by employing a suitable
mapping of the terminals at each stage. The term "fold" is used
in a generic sense to denote any such (many-to-one) mapping
that yields a valid network. Folded networks can be nonuniform and will be seen to form a superset of the modified Omega
networks in Section 11. In particular, any desired redundancy
(within a range) can be obtained by choosing a suitably larger
Omega network and folding it appropriately.
Definition 4: Let R = 2r and M = 2' with r < m. Then the
R-reduction of the set f0, 1, -* , M - 11 is the function
R-reduction(i) = i modulo R.
If i = ioi X i - 1, then
R-reduction(io ... ir ..Iir... in1- ) = ...ir ... im-l.
R-reduction thus divides a line segment of length M into R
parts and then overlaps the R parts to yield a segment of length
MIR.
Definition 5: The R-reduced P * Q shuffle, R < min{P, Q},
is the permutation wr of PQ/R elements 0, 1, , PQ/R - 1,
defined as follows:
Let
i
Ii
if R
=
=
P
=
IO..
r
Q
Oir
..ip+q- 1
otherwise.
0*
r
p =Ii.I'pr I ir ..ip
7r(i) is the R-reduction of a slightly altered P * Q shuffle. If
the P * Q shuffle is directly reduced, it would result in the
following mapping, which is not a bijection:
R-reduction P * Q-shuffle (O ... ir ... ip+q- I)
= R-reduction (ip ... ip+q-I0 ... Oir * ip_. )
= O ... Oip+r ... ip+q-10 *'* Oir ... ip-1I
1 105
The loss of the bits ip ... ip+r- l leads to more than one input
being mapped to the same output for any reduction R. In order
to avoid this, we modify the P * Q shuffle slightly before reducing it. The PQ output terminals are divided into R groups
and every terminal in group i (0 < i < R - 1) is translated
down rP/Rli positions to facilitate overlapping of the R segments. Fig. 7 illustrates how a 4 * 4 shuffle is 2-reduced. Note
from Definition 5 that the P-reduced P * P shuffle is the
identity connection while the 2-reduced 2 * P shuffle (the
perfect shuffle) results in the perfect shuffle (of P/2 elements).
We are now in a position to formally define a reduced
Omega network.
Definition 6: Let M = 2' and R = 2r. Assume that it is
possible to construct an M X M Omega network using kI
stages of B1 X B1 switches, followed by k2 stages of B2 X B2
switches, , followed by kf stages of Bf X Bf switches with
B1 < B2 < ... < Bf. (The Be's together constitute a complete
set of factors of M and M = llBki.) Then an R-reduced M X
M Omega network (with M/R inputs and M/R outputs), with
R < min {B1, M/Bf}, consists of kI stages of B1 X B1 switches,
followed by k2 stages of B2 X B2 switches, etc., with each stage
of B, X Bi switches preceded by the R-reduced Bi * M/Bi
shuffle.
The reduced network is obtained from the larger network
in a manner similar to the derivation of the reduced shuffle
from the larger shuffle. For instance, to construct a 2-reduced
2N X 2N network, we divide a 2N X 2N Omega network into
two halves; the two parts are overlapped after a rB,/Rl = Bi
2-shift of the terminals at the input and output of each Bi X
Bi switch in the lower half (to permit overlapping). Fig. 8
shows a 2-reduced 32 X 32 network.
By an argument similar to that in Theorem 1, it can be
shown that an R-reduced Omega network as in Definition 6
has R disjoint paths between any input and any output. The
restriction on R (R < B1) is critical, since without it a network
constructed using 2 X 2 switches, for instance, could be reduced an arbitrary number of times. It can be seen that any
value of R > 2 does not yield more than two disjoint paths
between any input and any output. The upper bound on the
redundancy that can be obtained by reduction of a given network is thus the size of the smallest switch used in the network.
An R-reduction of an NR X NR Omega network thus yields
an R-path Omega network. The following section takes up the
permutation capability of such a network.
A. Permutation Capability of the Reduced Networks
The R different tags for a destination are simply the addresses of the R different destinations in the unreduced network that get mapped to a single destination by the reduction
function. Thus any one of the R tags in ** ... * dod, ... d,1 I
* = O1 1 } will establish a path to destination dod ... d_ I as
in the case of the modified Omega networks.
The similarity between the two networks goes further.
Observe that when the reduced network uses switches of just
one size, the connection between stages is identical to that in
the modified network except for a permutation in front of each
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-32, NO. 12, DECEMBER 1983
1106
Fig. 9. An N X N network with a redundancy of N. (The N-reduced N2
X N2 Omega network.)
4*4 Shuffle of the 8 elements
Modified to permit overlap
Fig. 7. Construction of the 2-reduced 4 * 4 shuffle.
Fig. 8. The 2-reduced 32 X 32 Omega network. Note that the perfect shuffle
in the first stage will have to be replaced by an identity connection for the
network to realize all Omega permutations.
switch. This permutation is of no consequence since the switch
uses the same b bits to establish the connection and at the
output of the switch the path occupies the same terminal in the
two networks. Consider, for example, the first stage. Source
s
snI is mapped by the R-reduced shuffle to Sb
Sn- ISb-r
Sb- SO
Sb-r- I and at the output of the first
stage this terminal is connected to Sb
Sn-i *
* do
db-r- 1, the same as in the modified network. This is true of
every stage in the network so that an input-output path is
specified in the case of the reduced network also by concatenating the n + r destination tag bits to the n source address
bits:
...
5051
...
Sn** **
*
dod
...
dn-l.
The reduced network however is more general than the modified networks of Section II. It permits using switches of different sizes and any power of 2 redundancy to be realized, by
choosing a larger network and an appropriate fold. Also, the
upper bound on the redunancy is increased to N and the network with this redundancy is the N-reduced N2 X N2 network
shown in Fig. 9. Recall that an N-reduction of the N * N
shuffle is the identity connection. In this context it is worth
mentioning that if the two copies of the network in Fig. 9 were
connected in parallel, the network would have a redundancy
of only 2. (It will have twice the bandwidth though.)
Since input-output paths are identical in the two classes of
networks (modified and reduced), all the discussion in Sections
II-A and II-B regarding permutation capability of the modified
networks can be carried over to the reduced networks. In
particular, if an r-bit right rotate permutation is added in front
of the network (leading to a net permutation of b, - r bit left
rotate in the first stage), and every source eniploys as the extra
tag bits the r least significant bits of its address, the reduced
network will, in a no-fault condition, pass every permutation
that the 1-path network does.
When a network of twice the required size, constructed using
2 X 2 switches is 2-reduced, we get log2 N + 1 stages of 2 X
2 switches, each stage preceded by the perfect shuffle. The
1 -bit right rotate in front effectively cancels the first shuffle.
The extra stage approach to building redundant networks in
[1 ] and [5] is thus a special case of the reduced networks.
Fig. 10 shows a 2-reduced 2N X 2N network that can be
seen to be equivalent to the 3-stage rearrangeable networks
discussed in [3]. (The two networks are different by a fixed
permutation in front of each N/2 X N/2 switch, which does
not affect the connections that can be realized.) The relation
between redundancy and nonblocking nature of a network is
an aspect that will have to be investigated. It is clear that
(unless a single N X N crosspoint swith is used), redundancy
(of at least 2) is a necessary but not sufficient condition for a
network to be nonblocking. The second half of the assertion
can be seen from the fact that if in Fig. 10, we interchange the
last two stages (resulting in a different 2-reduced 2N X 2N
network,) the network is no longer nonblocking.
To partially alleviate the bottleneck of the first and the last
stages, it is possible to reduce the RN X RN network fewer
than R times, and let a source node access more than one input
port in the network. For instance, if we fold the network R/2
times, a 2N X 2N network results. If now we connect source
Si to inputs i and i in this network (and destination Di to
outputs i and i), we still have R disjoint paths between every
input and output. But now a link or complete switch failure in
the first or last stage does not destroy the connectivity between
the inputs and outputs. Thus in addition to (R - 1) failures in
the intermediate stages, we can tolerate one failure in the first
or last stage. A similar result is obtained when two copies of
an R/2-path network are provided.
B. Alternate Folds
The larger network (of size RN) can be collapsed in ways
other than that indicated in the last section. For instance, instead of overlapping the R "slices" of the network, we could
fold the R sections over as in Fig. 11 (a). Such a fold is similar
to the reduction except that alternate sections are overlapped
in the reverse order. A 2-folded 32 X 32 Omega network is
shown in Fig. 11 (b).
PADMANABHAN AND LAWRIE: REDUNDANT PATH MULTISTAGE INTERCONNECTION NETWORKS
1 107
Another technique would be to "squeeze" the larger network
wherein the R adjacent switches overlap. (Note that the shuffle.
in the larger network would have to be suitably modified to
permit any of these folds.)
In terms of Omega capability all such folds are equivalent,
since by employing suitable tags all appropriately folded networks (where "fold" is used in a generic sense) can realize any
permutation that the Omega network does. However, the
performance of the networks in the presence of faults could be
different under certain circumstances. (The net probability
of blocking will be the same in all cases.)
2-reduced
* 4 shuffle
Fig. 10. A 2-reduced 2N X 2N Omega network that is rearrangeably
nonblocking.
(a)
IV. CONCLUSION
In conclusion, we have developed a general class of multistage networks that preserve the connection properties of the
Omega network and at the same time provide significant tolerance to faults in the form of multiple disjoint paths between
every input and every output. A network with any desired redundancy R, can be constructed at a cost of O(N log (NR))
crosspoints, that is far less than providing multiple copies of
the network. In the VLSI realm, where switching modules are
the basic units, the increase in cost is at most O(N/R) switches.
The set of intermediate switches and links in such a network
can be divided into R classes and only one of these classes need
be operative for the network to provide connectivity between
any input and any output. The network can thus tolerate the
breakdown of any (R - 1) intermediate switches or links. In
the absence of faults, these networks function in a manner
identical to the Omega networks, rerouting certain paths only
when a fault is detected. The control algorithm is also as simple
as that for the parent network.
REFERENCES
Fig.
11.
(a) The 4-fold. (b) A 2-folded 32 X 32 Omega network.
When a network of size RN is folded R times, R terminals
of the form
*,****ODand** **D
r-l
r-1
overlap each other. Destination tags to reach the appropriate
output in this case are not obtained by setting the extra tag bits
to any one of the R values. Tag bits are derived as a function
of both source and destination bits and this is a disadvantage
of such a fold.
[ I] G. B. Adams and H. J. Siegel, "The extra stage cube: A fault-tolerant
interconnection network for supersystems," IEEE Trans. Comput., vol.
C-31, pp. 443-454, May 1982.
[2] M. D. Beaudry, "Performance related reliability measures for computing
systems," IEEE Trans. Comput., vol. C-27, pp. 540-547, June 1978.
[3] V. E. Benes, Mathematical Theory of Connecting Networks and
Telephone Traffic. New York: Academic, 1965.
[4] Final Report-Numerical Aerodynamic Simulation Feasibility Study,
prepared by Burroughs Corporation for Ames Research Center, National Aeronautics and Space Administration, Mar. 1979.
[5] K. M. Falavarjani and D. K. Pradhan, "A design of fault-tolerant interconnection networks," unpublished memo., 1981.
[6] T. Y. Feng, "Data manipulating functions in parallel processors and
their implementations," IEEE Trans. Comput., vol. C-23, pp. 309-318,
Mar. 1974.
[7] L. R. Goke and G. J. Lipovski, "Banyan networks for partitioning
multiprocessor systems," in Proc. Ist Annu. Symp. Comput. Arch.,
1974, pp. 21-28.
[81 D. H. Lawrie, "Memory-processor connection networks," Univ. Illinois,
Urbana-Champaign, Dep. Comput. Sci. Rep. UIUCDCS-R-73-557,
Feb. 1973.
[9] C. Leung and J. Dennis, "Design of a fault-tolerant packet communication architecture," M.I.T. Lab. Comput. Sci. Computat. Structures,
Group Memo. 196, July 1980.
[10] K. N. Levitt, M. W. Green, and J. Goldberg, "A study of the data
commutation problems in a self-repairable multiprocessor," in Proc.
1968 Spring Joint Comput. Conf., pp. 515-527.
[11] J. E. Lilienkamp, D. H. Lawrie, and P-C. Yew, "A fault tolerant interconnection network using error correcting codes," Univ. Illinois,
Urbana-Champaign, Dep. Comput. Sci. Rep. UIUCDCS-R-82-1094,
June 1982.
[12] K. Padmanabhan, "Fault-tolerance and performance improvement in
IEEE T
ON C
T
VOL.
C-32-NO 12,
DECEMBER
1v83
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-32, NO. 12, DECEMBER 1983
1108
shuffle-exchange type networks," Ph.D. dissertation, Univ. Illinois,
Urbana-Champaign, in preparation, 1983.
[13] D. S. Parker and C. S. Raghavendra, "The Gamma network: A multiprocessor interconnection network with redundant paths," in Proc.
9th Annu. Symp. Comput. Arch., 1982, pp. 73-80.
[14] C.-L. Wu and T.-Y. Feng, "On a class of multistage interconnection
networks," IEEE Trans. Comput., vol. C-29, pp. 694-702, Aug.
1980.
Krishnan Padmanabhan is pursuing a doctoral program at the Department of Computer Science,
University of Illinois at Urbana-Champaign. He
received the M.S. degree in 1981 from Wash-
ington University, St. Louis, MO, and was a gradu-
ate student at the Indian Institute of Science prior
to that. He is currently involved in the Cedar Supercomputer Project at the University of Illinois.
His research interests include high speed computer architecture, fault tolerance, and VLSI design.
Mr. Padmanabhan is a member of the American Mathematical Society.
Dunman H. Lawrie (S'66-M'73-SM'81) is cur-,
rently an Associate Professor in the Department of
Computer Science and Director of the Laboratory
for Advanced Supercomputers at the University of
Illinois at Urbana-Champaign. He has contributed to the design of several large computers including the Illiac IV where he designed and implemented Glynir, the first high level language for
that machine, and the Burroughs Scientific Processor where he was a Principal Architect, specializing on the array memory system. He is active in
the areas of high speed algorithm design, communication networks, virtual
memory performance, and the use of mass storage devices. He has been a
consultant to industry and government in the areas of computer organization,
local networking, and application studies. He was chairman of the Symposium
on High Speed Computer and Algorithm Organization, Program Chairman
of the Ninth International Conference on Parallel Processing, on the organizing committee for the Third International Conference on Distributed
Computing Systems, and was the Editor of the Computer Architecture and
Systems Department of the Communications of the Association for Computer
Machinery. He is currently serving as General Chairman of the Fourth International Conference on Distributed Computing Systems.