Systolic Sorting on a Mesh-Connected Network

6EEE
652
c-34,
TRANSACTIONS ON COMPUTERS, VOL.
NO.
7,
JULY
1985
Correspondence.
Systolic Sorting on a Mesh-Connected Network
HANS-WERNER LANG, MANFRED SCHIMMLER,
HARTMUT SCHMECK, AND HEIKO SCHRODER
Abstrat -A parallel algorithm for sorting n data items in O(\/-) steps
is presented. Its simple structure and the fact that it needs local communication only make it suitable for an implementation in VLSI technology.
The algorithm is based on a merge algorithm that merges four subfiles
stored in a mesh-connected processor array. This merge algorithm is
composed of the perfect shuffle and odd-even-transposition sort. For the
VLSI implementation a systolic version of the algorithm is presented. The
area and time complexities for a bit-serial and a bit-parallel version of this
implementation are analyzed.
Index Terms - Odd-even-transposition sort, mesh-connected processor
array, perfect shuffle, sorting, systolic array, VLSI algorithms, VLSI
complexity.
processors can be connected by a network with local and regular
interconnections only.
3) It uses extensive pipelining and parallel processing.
As usual, VLSI hardware is modeled by communication graphs.
The first such graph we consider for our algorithm is a grid of
A/;; x \/¶ identical processors, each of which is connected with its
four direct neighbors (Fig. 1). During the sorting process, every
processor contains one data item in its register. Observe that there
are situations where two elements initially loaded at the opposite
corners of the array have to be interchanged. Since even for this
simple transposition at least 2\/¶ - 2 local exchange steps are
needed, no algorithm on such a mesh-connected processor array
can sort n data items in less than f(IfV') steps.
Standard VLSI complexity measures are the time (T), the period
(P), the chip area (A) of an algorithm, and combinations of these
like AT, AT2, or ATP (see [13]). There are different opinions on
how to weight the time for long-distance communication on the
chip [2], [13]. The analysis of algorithms on a processor grid is
independent of these differences because all interconnections have
constant length.
I. INTRODUCTION
VLSI technology allows the integration of a large number of
simple processing elements on a single chip. This creates a need for
III. THE SORTING ALGORITHM
algorithms exploiting the potentially high degree of parallelism in
Our
is composed of the shuffle operation and
sorting
algorithm
networks of such processing elements.
of
sort.
odd-even-transposition
In this correspondence we present an algorithm for sorting n data
The shuffle operation transforms a sequence z1,.*, Z2n into its
x \/ mesh-connected processor array that requires
items on a
O(O-§) comparison steps and O(V-) unit-distance routing steps (n perfect shuffle zl, zn1 , Z2, Zn+2, * , Zn, Z2n [11]. This operation can
is assumed to be a power of 4). The algorithm has a very simple be realized by n - 1 parallel local exchange steps (see Section V).
Odd-even-transposition sort can be described as follows.
structure and needs only local communication between the proLet zl, .* , Zn be a sequence of n elements to be sorted. In the
cessors. Therefore, it is well suited for an implementation in VLSI.
zi of the
Q(&-V) is a lower bound for sorting n elements on a mesh- odd (respectively, even) step of the algorithm, all elements
are compared
connected processor array (see Section II). There are other algo- sequence having an odd (respectively, even) subscript
,
their successors and exchanged if zi > zi+l (i E {1,
rithms of time complexity O(V¶) [6], [10], [12], but they are much with
more complex in their structure than our algorithm. Simpler algo- n - 1}). The odd and even steps are executed in alternating order.
rithms like the odd-even-transposition sort [5], the bitonic merge After at most n steps the sequence is sorted. A simple proof of this
sort [1], the rebound sorter [3], or the zero time sorter [9] require can be found in [8].
Example 1: Sorting the sequence 6 5 2 3 4 1 by odd-eventime fQ(n).
In Section II some requirements concerning the design of VLSI transposition sort is shown in Fig. 2. A "-" indicates a comparisonAfter six steps the sequence is sorted.
algorithms are discussed. Our new sorting algorithm and the proof exchange.
We now give an algorithm for merging four arrays of size
of its validity are presented in Section III. In Section IV the algorithm is slightly modified to improve its time performance. In Sec- k/2 x k/2 where k is a power of 2 and the elements of each array
tions II-IV we operate with an array of "processors" -logic units are in snake-like ordering (Fig. 3).
Algorithm MERGE
having one data register and being able to execute a sequence of
different instructions. In Section V a systolic version of the algo- A: Shuffle in each row of the k x k array, i.e., interchange the
according to the perfect shuffle [Fig. 4(a)].,
rithm is presented. In this version the processors are replaced by B: columns double
Sort all
columns, i.e., all k x 2 subarrays into snakesimple "processing cells," executing always the same operation,
2k steps of odd-even-transposition sort
like
ordering
using
and the data items are "pumped" through this array of cells. The
[Fig. 4(b)].
structure of these cells, for the bit-serial and the bit-parallel case, is
C: Apply 2k steps of odd-even-transposition sort to the whole
briefly outlined in Section VI.
k x k array, assuming a snake-like ordering [Fig. 4(c)].
For the time complexity analysis we assume t, to be the time
required by one comparison-exchange step and te ' t, to be the
II. MODEL OF COMPUTATION
exchange step.
There are some properties a "good" VLSI algorithm should time required by a simple
Part A requires (k/2 - 1)te time units. For part B we need 2k
have [4].
steps, i.e., the time 2kt,. Part C requires
1) It can be implemented by only a few types of simple comparison-exchange
2kt, time units. Thus, the time needed to merge four k/2 x k/2
processors.
2) Its data and control flow is simple and regular so that the arrays is
Manuscript received November 11, 1983; revised September 10, 1984.
The authors are with the Institut fiir Informatik und Praktische Mathematik,
Christian-Albrechts-Universitat Kiel, D-2300 Kiel, West Germany.
TM(k)
=
4kt,
+
(k/2
-
1)te
-
4.5kt,
Example 2: Consider the 4 x 4 array in Fig. 5(a) consisting of
four 2 x 2 arrays sorted in snake-like ordering. Part A is an inter-
0018-9340/85/0700-0652$01.00 C) 1985 IEEE
653
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 7, JULY 1985
*change of the second and the third columns [Fig. 5(b)]. Eight steps
of odd-even-transposition sort on each 4 x 2 subarray in part B
lead to the situation shown in Fig. 5(c). Now the double columns are
sorted in snake-like ordering. Application of part C yields the completely sorted array [Fig. 5(d)-(h)]. Note that the array is already
sorted after four steps of odd-even-transposition sort [Fig. 5(g)],
but the worst case takes 2k steps.
The correctness of this algorithm may be demonstrated by use of
the 0-1 principle [5].
If a network sorts all sequences of 0's and 1 's, then it will sort any
arbitrary sequence of elements chosen from an ordered set.
Thus, we may assume that the inputs are 0's and 1's in four subarrays, each of them sorted in snake-like ordering [Fig. 6(a)]. The
subarrays consist of a (respectively, b, c, d) complete 1-rows and
possibly one further row beginning with some l's. According to the
snake direction, these l's can begin at the left or at the right side of
the subarrays. After shuffling in part A there are a + b + c + d
1's in every double column plus at most four more 1's because of
the incomplete 1-rows [Fig. 6(b)].
Thus, if a + b + c + d is even, the last (a + b + c + d)/2
rows of the whole array consist of 1's after sorting the double
columns in part B. There are at most two more rows in which the
l's of the incomplete 1-rows can appear [Fig. 6(c)]. These two rows
consist of 2k elements. Obviously, they are sorted by 2k steps of
odd-even-transposition sort in part C of the algorithm.
In the other case, if a + b + c + d is odd, the l's of the complete 1-rows form a "step" in every double column after execution
of part B [Fig. 6(d)]. Now, if there were a double column with four
and another double column with zero additional l's of the incomplete 1-rows, one would have three unsorted rows in the whole
array. But this can only happen if all four incomplete 1-rows begin
at the same side of the subarrays, which is not possible if
a + b + c + d is odd. Thus, also in this case there are at most
two unsorted rows in the whole array after part B, and 2k steps of
odd-even-transposition sort in part C are sufficient to complete
the sorting.
Obviously, one can sort a completely unsorted array of n = 22
elements by iteratively applying the merge algorithm (Fig. 7).
Initially, each 1 x 1 array is sorted since it consists of only one
element.
Algorithm SORT
Fori :=1,2,3, ! jdo
sort all 2' x 2" arrays by application of algorithm MERGE to the
sorted 21-1 x 2i-1 subarrays.
Let the time required for sorting a k x k array be denoted by
Ts(k). Then we have
Fig. 1. Mesh-connected processor array.
odd
even
odd
even
odd
even
6-5 2-3 4-1
5 6-2 3-1 4
5-2 6-1 3-4
2 5-1 6-3 4
2-1 5-3 6-4
1 2-3 5-4 6
Example of odd-even-transposition sort.
Fig. 2.
le
k
2
c
-0.
__j
gk
merl-ge
c.
Fig. 3. Merging four k/2 x k/2 arrays (data items sorted in
snake-like ordering).
shuf f le
T (1)
=
0
and for k > 1
(a)
T,(k)
=
T,(k/2)
+
TM(k) ' T3(k/2) + 4.5ktc.
This implies
2
P-%-
Ts (V-l)
<
(9V
-
9)tc.
(
ki
l
(b)
(c)
Fig. 4. (a) Column shuffle in part A of algorithm MERGE. (b) Each double
column is sorted into snake-like ordering in part B. (c) The whole array is
sorted into snake-like ordering in part C.
IV. MODIFICATION OF ALGORITHM MERGE
By a slight modification we can improve the time performance
of the MERGE algorithm. Part B does not take advantage of the fact
that the double columns are already presorted: each double column
consists of four completely sorted k/2 x 1 arrays.
Therefore, we replace part B by the sequence of parts B1,
B2, B3.
B 1: In all even rows interchange every two (adjacent) elements of
column 2i 1 and 2i (i = 1, 2, * * *, k/2).
B2: Sort all single columns by odd-even-transposition sort (in vertical direction).
B3: Apply two steps of odd-even-transposition sort to every double
column, starting with a horizontal step.
-
~~~~~~C
654
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 7, JULY 1985
A
(b)
(a)
odd
'=1>~
ZZ~E~I~
H
evenoi
B
(c)
A4-~AA
l
even
=4>
even,
odd
>
5
=t
(f)
Ce)
(d)
odd,
,==> even, odd,=:v
(g)
1's
l
a
W
C
I
I
a
,
b
bI
4
d
(a)
L4-L4
(b)
r_d
bc
|ac+b+c+d
(d)
(c)
Fig. 6. (a) The situation before part A. (b) After part A. (c) After part B if
a + b + c + diseven.(d)"Steps"ineachdoublecolumnifa + b + c + d
is odd.
.D 1:;) .:)
.Z) 1:;) .:)
I I
merge
4
2TL
h
h)
Fig. 5. Illustration of Example 2.
O'sI
1
*
me rge*
Fig. 7. Iterative application of algorithm MERGE yields algorithm SORT.
4
IEEE TRANSACTIONS
ON COMPUTERS, VOL.
c-34,
NO.
7,
JULY
1985
We may see the equivalence of part B and the sequence of
parts B1, B2, B3 again by the 0-1 principle: let the array contain
only 0's and l's.
The situation in each double column after part A is shown in
Fig. 8(a). The difference of the number pf 1's in the upper half
of the two columns is la - cl. After part B 1 this difference is at
most 1. The same holds for the lower half, where the difference
of the number of l's is lb - dI before executing BI. So after BI
the difference between the number of 1's in the two columns of
each double column is at most 2 [Fig.! 8(b)].
This worst case situation after execution of part B2 is shown
again in Fig. 8(c).
Obviously, one horizontal and one vertical step of odd-eventransposition sort completes the sorting of the double columns
[Fig. 8(d) or (e)].
Further analysis of the original algorithm reveals that we can save
another two steps in part C of the algorithm.
Suppose that the two unsorted rows, consisting of 0's and 1 's, are
to be sorted as shown in Fig. 9(a). Then the four numbers at the left
side can only be OOXX orXX 11 since they have been sorted in part B
of the algorithm (see Fig. 9(b) and (c), X stands for 0 or 1). Thus,
2k - 2 steps of odd-even-transposition sort suffice to complete
the sorting in step C.
The two modifications lead to the following improved time performance of the merge algorithm:
TM(k)
=
655
(d)
(e)
Bi
B2
B3
B 1. (c) After B2. (d) and (e) During B3.
rIS
i
-/1
: t I....L
Ej
Tolo
4
...
-
(a)
l
...
(c)
(b)
Fig. 9. After part B3 the array can contain at most 2k
and l's in tworows.
-
2 unsorted 0's
ZI Z5 Z2 Z6 Z3 Z7 Z4 Z8
C
For the sorting algorithm we thus get
t
Ts(V-l) (7V 1-7)tc.
--
..
Fig. 8. (a) The situation in each double column before part B 1. (b) After
(k/2 - l)te + te + kt, + 2t, + (2k - 2)t, . 3.5ktc.
partA
i
(c)
(b)
(a)
i
-
V. SYSTOLIC IMPLEMENTATION OF THE ALGORITHM
A straightforward implementation of our algorithm by a N x
\/n grid of processors will not be suitable for VLSI because each
processor would have to be capable of either interpreting a sequence
of global control signals or of generating the necessary sequence of
operations locally. Thus, only very few of these relatively complex
processors could be integrated on a chip.
Therefore, we preshnt a systolic version of the algorithm data
are pumped through an array of cells, each executing only one
operation and then passing its data to the next cell, executing the
next operation, and so on. Since a single cell has to execute always
the same operation (a comparison-exchange, an exchange, or a
simple delay), these lls can be implemented by very simple logic
(see Section VI).
As is well known, the shuffle operation on a sequence of length
n has a systolic implementation by an array of n/2 - 1 rows of
simple exchange cells as depicted in Fig. 10 for the case n = 8.
The squares of the 3 x 8 array are storage cells. Every two squares
connected by a double-headed arrow form an exchange cell. The
input sequence z1,.*, Z8 iS shifted through the array from the bottom to the top. After three steps, the shuffled input sequence appears
on top of the array.
A systolic version of odd-even-transposition sort for sorting a
u X v array of data items in snake-like ordering (u even, u * v = n)
can be obtained in the following way (this process is illustrated in
Fig. 1I (a)-(d) for the caso dj = 8, = 4, v = 2).
Start from the systolic version of odd-even-transposition sort
on 1 X n arrays, i.e., on sequences zi, * z,, [Fig. II(a)].-Every
two storage cells connected by an arrow form a comparisonexchange cell.
Divide z1, , z,, into s/2 blocks of 2v data items each. As
indicated in Fig. 11(b) we ingert u - 1 diagonals of delay cells into
the array of Fig. 11(a), i.e., we obtain an (n + u - 1) x n array
which is still capable of sorting *Z, . , Zn, but the comparisonexchange operations on data items of block i (1 ' i ' u/2) are
delayed by i - 1 steps.
Zl Z2 Z3 Z4 ZS Z6 Z7 Z8
Fig. 10. Systolic computation of the perfect shuffle of eight data items.
1 2
43
56
8 7
1 2 3 4 5 6 7 8
1 2 34
5 6 7 8
1 2 3 4 5 6 7 8
2v{
2v{
l-
.4t
-.
xro
_
,I
L4t- _
_14 3
2v
2v
t
5 78 6 1 4 3 2
5 7 8 6 1 4 3 2
(a)
(b)
Cc)
5
6
1
2
7
8
4
3
(d)
Fig. 11. (a) Systolic version of odd-even-transposition sort for a 1 x 8 array
of data items. (b) The systolic array of Fig. 11(a) with additional delay cells.
(c) One subarray of the systolic array of Fig. 11(b) can be used to sort a
u/2 X 2v array of data items (u = 4, v = 2 in this figure). (d) Folding the
systolic array of Fig. 11(c) along the vertical middle axis yields the systolic
array for sorting u x v arrays of data items.
656
c-34,
IEEE TRANSACTIONS ON COMPUTERS, VOL.
The array of Fig. 11(b) can be decomposed into u/2 subarrays
of width 2v which are identical except for the fact that each comparator of the (i + 1)st subarray acts one step later than the corresponding one in the ith subarray. Thus, if the u/2 blocks of the
sequence z1, ,Zn are input row by row, i.e., as a u/2 x 2v array,
then one of these subarrays can be used to sort the sequence
[Fig. 11(c)]. The long arrows of Fig. 11(c) correspond to the comparisons between adjacent subarrays, i.e., between different rows
of the input sequence.
One difficulty arises when two u/2 x 2v arrays are consecutively shifted through the array. As the two arrays are to be sorted
separately, a comparison-exchange cell should only be activated
if both storage elements contain data items belonging to the same
sequence. Therefore, we augment every data item with an additional
"stop bit," which is set to 1 for the first element of the sequence to
be sorted. The stop bit is not exchanged if an exchange of two data
items occurs.
Folding the array of Fig. 11(c) along the vertical middle axis
eliminates the long arrows and yields the desired systolic version
of odd-even-transposition sort for sorting a u x v array into snakelike ordering [Fig. 11 (d)]. Because of the folding, now data have to
be shifted by two rows at each step.
The correctness of the above method is obvious from the fact that
it is a sequence of correctness preserving transformations.
Example 3: Each step of sorting a 4 x 2 array using the appropriate version of odd-even-transposition sort is shown in Fig. 12.
Systolic versions of parts A, B, and C of the original algorithm
MERGE and of parts B 1, B2, and B3 of the modified algorithm are
now easily obtained. The above-m'entioned stop bits are taken care
of by introducing a new part S, which sets the stop bits to 1 for each
data item in the odd rows of the input, and a new part R, which is
inserted between every two iteration steps. Since the structures to be
sorted double in size after each iteration step, every second stop bit
that is set to 1 is reset in parts R.
The complete systolic array for sorting 64 data items using the
modified version of algorithm SORT iS shown in Fig. 13. The parts
that are denoted by Ai, Bli, B2i, etc., correspond to parts A, Bl,
B2, etc., of the modified algorithm MERGE for merging subarrays
of size 2'-1 x 2-1
The initial 2 x 2 arrays are sorted by three comparison-exchange
steps in part X.
Due to the additional delay cells the time performance of
the systolic version is slightly worse compared to the original
algorithm.
The time
T;iS(k) required by
the systolic algorithm MERGE on
k/2 X k/2 subarrays consists of
It.
to reset the stop bits;
for part A;
(k/2 - l)t,
Ilt5
(3/2k
5/2t,
(2k where t,
=
(4k + 2)t,
and T'sY(V), the time complexity of the systolic algorithm SORT, iS
TSY'(N-) = (8
-
+
log n
JULY
1985
........................
36
. 2644
,,,,.,,.6 5 8 7
1 537
....62
84
7.3
8
,
....4
-I
5 1
,,,2
6
56 78
5 26 8
4.........................5681 7
8 734
................
86
32
5 14
6 8 2 3
1 4
2 3
Fig. 12. Illustration of Example 3.
An exchange cell can be realized by two shift registers with a
crossover of wires at their outputs and with delay t, [Fig. 14(a)].
Ignoring the logic necessary for adequately processing the stop
bits, a comparison-exchange cell consists of two shift registers and
a simple logic circuit C [Fig. 14(b)]. Essentially, this circuit is a
1-bit comparator with two feedback carry signals ri and s1, indicating whether input x has been greater than, less than, or equal to
input y down to the ith bit (i = m, m - 1, -, 0) [Fig. 14(c)].
Let x = xm-l,
, , y = Ym- I
,y,. Starting from the initial
values rm= sm = 0, ri and si are computed from r 11, si , xi,
and yi by the following functions:
-
ri
rj+j or xi $ Yi
and
si :=
-
si+I or
(not ri+ 1 and xi
>
yi) .
The outputs (x',y') of C are (xi, yi) if si = 0 and (yi,xi) if si = 1.
Clearly, the time te of the comparison-exchange cell is proportional to m, the length of the data items. This time cannot be
reduced by use of pipelining because some of the comparisonexchange cells get their input from different rows of the systolic
array (Fig. 13).
The bit-serial version of the systolic algorithm SORT thus needs
time O(N/- m) and area 0(n * m) to sort a sequence of n m -bit data
items, i.e., its AT2 complexity is in O(n2m3).
If the computation is bit parallel, a comparator can be realized
by a chain of 1-bit comparators Cm-,, * *, CO. The carry signals ri
and si are now passed from Ci to Ci,I. Using a skewed input of x
and y (Fig. 15), an mr-bit comparison can be done in time 0(m)
with period 0(1), i.e., the comparisons are not performed truly
bit parallel, but are pipelined through the "pipe" of m 1-bit comparators. At the same time the ith bit of some x and y is compared
in Ci, the (i + 1)st bit of the next two data items i and 9 is compared in Ci.4
Thus, the time complexity of the bit-parallel version of the algorithm is independent of the length of the data items. On the other
hand, the interconnections between two m-bit comparators in
parts Ci of the algorithm require area proportional to m2 (Fig. 16).
The bit- arallel version of the systolic algorithm SORT thus needs
time O(Vn) and area 0(n * m2) to sort a sequence of n mr-bit data
items, i.e., its AT2 complexity is in 0(n2m ).
-
for part B1;
- 1/2)t,
for part B2;
for part B3;
for part C
lft,
is the time required by one step. Thus, we get
TsyS(k)
I; i
1
7,
1 2
4 3
1 2 56
4 3 8 7
1 2 5 6
53 8 7
54 6
...........................i.....
-1 246
+i
-
NO.
-
8)t,.
VI. REALIZATION OF THE CELLS
Besides the cells that are necessary to handle the stop bits in
parts S and R, there are only three different types of cells used in
the systolic version of algorithm SORT: delay cells, exchange cells,
and comparison-exchange cells. Assuming bit-serial computation
on m -bit data items, the cells can be realized as follows.
A delay cell is simply a shift register having the appropriate delay
t, determined by the comparison-exchange cell.
VI. CONCLUSION
We have presented a new algorithm to sort n elements in time
O(V%-). It iteratively uses a simple merge algorithm, which is composed of the shuffle operation and odd-even-transposition sort.
We have outlined how it can be implemented on a systolic array
657
IEEE TIRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 7, JULY 1985
LI
II
I
1 1
[
2
16 15
314151617181
14j13j12]11j10j_jfl
__~h~
__T
. _ 1 ,111,1,1,
tltititlt
___
--il 1 11 1
TTTTTTT w
__ L _-1 1 1
T T T,_ T w T
TT
_
___
_ _ 1- 1 1 111.
_- -I4
-
4
, S4
_ - I -t
-
--ssssss
T _____,
TTTTT
TT_
B23
$-F
-1 11 111-
T T w T T w ET_-
__ _ _____
L 1111 11 1
-C3
L-11 1 1 1 1
wTwtwttt
L l l lll ll
ff
_
-F
.-
_
__ _ _- _ __
_ _ _ ___ _
B13
I- -------
_ _ _ ___ _ _
__ _ --_
LE _ _ __ __
___L__
r
r r RE r r r r
_ _ I__ _ _
63
14
48 50
3-
A3
R3
\
37 46 19 8
.2'7134154 9147136 71
-6 1231
-
-
-
Fig. 13. The complete systolic version of algorithm SORT for n
data items.
of simple comparator, exchange, and delay cells. There are other
sorting algorithms designed for mesh-connected processor arrays
having the same asymptotic time complexity O(V\n) [6], [10], [12].
But their control structure is more complex, which is partly due
to the fact that they use two levels of recursion. Since no systolic
versions of these algorithms have been presented yet, it is difficult
=
64
to fairly compare their time performance to the time complexity
of our algorithm. In a systolic algorithm, routing and comparison
steps take the same time. Therefore, a comparison of the number
of routing and comparison steps of the nonsystolic algorithms
presented in [6], [10], [12] and of our systolic algorithm indicates
that it is unlikely that a systolic version of any of these algorithms
658
IEEE
y
x
XI
YB
I
I
[5] D. E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and
Searching. Reading, MA: Addison-Wesley, 1973.
[6] M. Kumar and D. S. Hirschberg, "An efficient implementation of
Batcher's odd-even merge algorithm and its application in parallel sort-
xI y
[7]
SX
SR
SR
[8]
SR
[9]
y
x
x.
y
Xi
[10]
Yj
[11]
(c)
(b)
(a)
14.
An
cell.
A
(b)
(a)
exchange
comparson-exchange cell.
Fig.
(c) A 1 -bit domparator.
rm
ri
Si
Sm.
TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 7, JULY 1985
[12]
[13]
ing schemes," IEEE Trans. Comput., vol. C-32, pp. 254-264, Mar.
1983.
H. -W. Lang, M. Schimmler, H. Schmeck, and H. Schrbder, "A method
for realistic comparisons of sorting algorithms for VLSI," to be
published.
J. van Leeuwen, "Distributed computing," Dep. Comput. Sci., Univ.
Utrecht, Utrecht, The Netherlands, Tech. Rep. RUU-CS-82-8, 1982.
G. S. Miranker, L. Tang, and C. K. Wong, "A "Zero Time"VLSI sorter,"
IBM J. Res. Develop., vol. 27, pp. 140-148, Mar. 1983.
D. Nassimi and S. Sahni, "Bitonic sort on a mesh-connected parallel
computer," IEEE Trans. Comput., vol. C-28, pp. 2-7, Jan. 1979.
H. S. Stone, "Parallel processing with the perfect shuffle," IEEE Trans.
Comput., vol. C-20, pp. 153-161, Feb. 1971.
C. D. Thompson and H. T. Kung, "Sorting on a mesh-connected parallel
computer," Commun. Ass. Comput. Mach., vol. 20, pp. 263-271,
Apr. 1977.
C. Thompson, "A complexity theory for VLSI," Ph.D. dissertation,
Dep. Comput. Sci., Carnegie-Mellon Univ., Pittsburgh, PA, Tech.
Rep. CMU-CS-80-140, 1980.
0
t
Xm-2Ym-2
Xm-3Ym 3
xo Yo
Fig. 15. An mi-bit comparator with skewed input.
com-paraltel
coprator
y
Fig. 16. Wires between bit-parallel comparators occupy area 0(m2).
#iill be able to outperform our algorithm. Furthermore, clue to the
more complex structure, the interconnection pattern of a systolic
versidn of the other algorithms would be less regular. Thus, they
seem to be less guitable for VLSI.
Naturally, every hardware realization of a sorting algorithm
can only handle sequences whose length is bounded by some n. If
longer sequences have to be sorted, some external memory and
some "sort and merge" technique have to be used. The development
of such techniques will increase the practical relevance of fast
sortihg chips [7].
REFERENCES
[1] K. E. Batcher, "Sorting networks and their applications," in Proc. AFIPS
1968 SJCC, vol. 32, Montvale, NJ: AFIPS Press, pp. 307-314.
[2] B. M. Chazelle and L. M. Monier, "A model of computation for VLSI
with related complexity results," in Proc. 13th Annu. ACM Symp. Theory
Comput., May 1981, pp. 318-325.
[3] T. C. Chen, V. Y. Lum, and C. Tung, "The rebound sorter: An efficient
sort engine for large files," in Proc. 4th Int. Conf. Very Large Databases, 1978, pp. 312-318.
[4] M. J. Fohir and H. T. Kung, "The design of special-purpose VLSI
chips," IEEE Comput. Mag., vol. 13, pp. 26-40, Jan. 1980.
A Practical Approach to Fault Simulation and
Test Generation for Bridging Faults
M. ABRAMOVICI AND P. R. MtNON
Abstract -In this correspondence we prepent a practical approach to
fault simulation and test generation for bridging faults in combinational
circuits. Unlike previous work, we consider Unrestricted bridging faults,
including those that introduce feedback. Our p'proach is based on extending fault simulation and test generation for stuck fauits to cover bridging
faults as well. We consider combinational testing only, and show that
adequate bridging fault coverage can be obtained in most cases without
using sequences of vectors.
Index Terms -Bridging faults, fault detection, fault simulation, test
generation.
I. INTRODUCTION
Bridging faults (BF's) are caused by shorts between normally
unconnected signal lines. Although BF's occur often in practice, no
efficient approach exists for dealing with unrestricted BIP's in general com binational circuits. Most of the previous work [1]-[8] relies
on different simplifying assumptions regarding the BF model (such
as BF's between inputs of the same gate, BF's that do not create
leedback, BF's between primary inputs and output) and/or regarding the circuit (such as fan-out-free, irredundant, single-output wlth
known function, two-level, etc.). Methods based on explicit fault
simulation of BF's are not practical for large circuits.
In this correspondence we present a practical approach to fault
simulation and test generation for unrestricted BF's in general
combinational circuits. Since adequate single stuck fault (SSF)
coverage is usually a requirement, our approach is to efficiently
determine the BF's detected by a test set developed for SSF's. We
establish simple relations between the detection of BF's and the
detection of SSF's and monitor the occurrence of these relations
during the simulation of SSF's. This simple technique replaces
Manuscript received August 20, 1984; revised December 17, 1984. This
correspondence was published in the Proceedings of the IEEE 1983 International Test Conference, Philadelphia, PA, October 18-20, 1983.
The authors are with AT&T Bell Laboratoies, Naperville, IL 60566.
0018-9340/83/0700-0658$01.O0 C 1983 IEEE

Download Report

Systolic Sorting on a Mesh-Connected Network

Paperzz.com

Your Paperzz