Adaptive and Optimal Parallel Algorithms for Enumerating

Adaptive and Optimal Parallel Algorithms for Enumerating
Permutations and Combinations*
S.G.AKL
Department of Computing and Information Science, Queen's University, Kingston, Ontario, Canada K7L 3N6
Three new adaptive and cost-optimal parallel algorithms are described for the problems of enumerating permutations
and combinations ofm out ofn objects. The algorithms are designed to be executed on a very simple model of parallel
computation which consists ofk autonomous processors running synchronously without needing to communicate among
themselves. For the first two algorithms, 13$ k s$ N, where N is the number of permutations or combinations to be
enumerated. When 1 ^ k < N/n, both algorithms require (H$N/k\. m) time for an optimal cost ofO(N.m). The third
algorithm is designed for the case where m = n and 1 < * < / i . It runs in O(\nl/k]-n) time for an optimal cost of
Received January 1986, revised August 1986
1. INTRODUCTION
The enumeration of combinatorial objects occupies an
important place in computer science due to its many
applications in science and engineering.20-23 Also, in the
last few years there has been an increased interest in
parallel computation and hence in parallel algorithms.2- "
Recently, two parallel combinatorial enumeration algorithms were proposed. The first one uses k processors
running concurrently to generate all nPm = n\/{n — m)\
distinct permutations of m out of n objects in
§(\nPJk\.m\o%m)
time, where 1 «S k ^ nPm.10 The
second one generates all nCm = «!/((«—m)\m\) distinct
combinations ofm out of n objects.5 It uses m processors
and runs in 0("Cm) time on a shared memory
single-instruction stream multiple-data stream (SIMD)
machine, where two or more processors are allowed to
read simultaneously from the same memory location. In
Refs 5 and 10 the following problems were left open.
(1) A parallel algorithm is said to be cost-optimal if the
number of processors it uses multiplied by its running
time matches a lower bound on the number of operations
required to solve the problem at hand. In the case of
permutation generation, an Q(nPm.m) lower bound is
dictated by the size of the output. It follows that the
algorithm of Gupta is not cost-optimal.10 On the other
hand, all sequential permutation generation algorithms
(such as the ones described in Refs 20 and 23) run in
0(nPm. m) time on a single processor and are therefore
cost-optimal. The question here is: can a parallel
algorithm (i.e. one that runs on possibly more than one
processor) be derived for generating permutations on the
same model of computation used in Ref. 10, which in
addition would be cost-optimal?
(2) A parallel algorithm is said to be adaptive if it is
capable of modifying its behaviour according to the
number of processors actually available on the parallel
computer being used. In that sense, the algorithm of
Chan is not adaptive as it requires the presence of m
processors in order to function properly.5 Usually, it is
reasonable to assume that the number of processors on
* This work was supported by the Natural Sciences and Engineering
Research Council of Canada under grant NSERC-A3336 and by the
National Science Foundation under grant MCS-8313650.
a parallel computer is not onlyfixedbut also smaller than
the size of the typical problem. Hence, two questions
arise.
(i) Can a parallel combination-generation algorithm
be derived which uses k processors, where 1 < k < n C m ?
(ii) In view of the Q(nCm. m) lower bound on
combination generation dictated by the size of the
output, would this algorithm be cost-optimal?
(3) Although it is often convenient in theory to assume
that an arbitrary number of simultaneous-read operations from the same memory location are possible, in
practice, of course, this is unrealistic. Therefore it is
customary to attempt to remove such simultaneous reads
from a parallel algorithm that requires them. There is an
obvious and costly way to accomplish this: any parallel
algorithm that uses k processors and runs in time T with
simultaneous reads can be made to run in time 0(T \ogk)
without simultaneous reads.7 This approach applied to
the algorithm of Chan leads to a parallel algorithm which
is not cost-optimal.5 Can a parallel-combination
algorithm be derived which does not require simultaneous reads and is cost-optimal?
In this paper we answer all of the above questions in
the affirmative. Specifically, we propose a parallel
algorithm for generating all permutations of m out of n
objects which runs on an SIMD machine with k
processors, where 1 ^ k ^ nPm. Not only are simultaneous reads not needed but, in addition, the shared
memory itself is never used (and is hence excluded from
the model). In fact the k processors once started
independently execute the same algorithm and never need
to communicate among themselves. This is the model
used by Gupta.10 When 1 < k < nPm/n, the algorithm
runs in 0QnPm/k]. m) time, thus achieving an optimal cost
of 0(nPm. m). A similar approach is then used to derive
a parallel combination-generation algorithm. The algorithm generates all combinations of m out of n objects.
It runs on the same SIMD model of computation as the
permutation generator with 1 ^ k < nCm. When
1 ^ k < nCm/n, the algorithm requires 0(\nCm/k]. m)
time, for an optimal cost of 0("C m . m). This settles all of
the above questions. Finally, we describe a second
adaptive parallel algorithm for generating permutations.
It runs on the same model as the previous two algorithms
THE COMPUTER JOURNAL, VOL. 30, NO. 5, 1987 433
S. G. AKL
but is designed in particular for the special case where
m = n and only few processors are available, namely
1 sg k < n. The algorithm is conceptually very simple and
is also cost-optimal.
For the sake of completeness, we also mention at this
point two existing classes of parallel permutationgeneration algorithms and compare them with the
algorithms described in this paper:
(1) the class of permutation networks, as described for
example in Refs 4, 6, 9, 14, 15, 16, 18, 19, 21, 25 and 26;
(2) the class of application-dependent permutation
generators, as described for example in Refs 3, 8, 17 and
22.
Algorithms in classes (1) and (2) above are restricted
in at least one of the following three ways:
(1) they are based on a hard-wired interconnection of
a predefined number of processors that can generate
permutations for a fixed-size array only;
(2) they are capable of generating only a subset of all
possible permutations; and
(3) they typically require 0(/i) processors and 0(log°n)
steps, where a > 1, to generate one permutation of an
array of length n: all permutations are therefore
generated in 0(«! log ari) steps for a cost of 0(«. n! log °«),
which is not optimal.
By contrast, our algorithms are:
(1) adaptive, i.e. the number of available processors
bears no relation to the size of the array to be permuted;
(2) capable of generating all possible permutations of
a given array; and
(3) cost-optimal.
The remainder of this paper is organised as follows.
Section 2 contains our assumptions about existing
sequential procedures that are invoked by the parallel
algorithms. Our model of parallel computation is defined
in Section 3. In Sections 4 to 6 we present the new parallel
algorithms. Some concluding remarks are offered in
Section 7.
2. ASSUMPTIONS
In our subsequent development of the parallel combinatorial enumeration algorithms we make the following
assumptions.
(1) The set of objects to be operated on is
5 = {l,2,...,n}.
(2) An algorithm SEQUENTIAL PERMUTATIONS
is available for sequentially generating all nPm permutations of m out of n objects in lexicographic order. The
construction of such an algorithm - whose running time
is 0(nPm. AM) - is not difficult, and is left as an exercise for
the reader. A survey of permutation generation algorithms is provided in Sedgewick.24
(3) An algorithm SEQUENTIAL COMBINATIONS
is available for sequentially generating all nCm combinations of m out of n objects in lexicographic order.
References to a number of such algorithms - whose
running time is 0(nCm. m) - can be found in Ref. 1.
(4) A system for numbering all "Pm permutations of
m out of n objects is available. Let X = x1 x2... xm be a
permutation of a combination of m integers from S.
RANKP is a function which associates with each such
permutation X a unique integer RANKP(A'). The
function RANKP has the following properties: (i) it
preserves lexicographic ordering; (ii) its range is the set
{1,2,..., n P m }\ and (iii) it is invertible such that if
d = RANKP(A') then X can be obtained from
Such a function and its inverse - both of which have
a running time of 0(m. n) - are described in Ref. 13.
(5) A system for numbering all nCm combinations of
m out of n objects is available. A function RANKC and
its inverse RANKC"1 are described in Ref. 12, which
satisfy the same properties as RANKP and RANKP-1,
respectively.
3. MODEL OF COMPUTATION
Our model of parallel computation is a very simple one.
It consists of k processors Pi,p2,--,Pk' a ^ °f which
perform the same algorithm simultaneously. Each
processor operates on a different data set stored in its
local memory. The program for each processor can reside
in the processor's local memory or can be thought of as
a common sequence of instructions emanating from a
central control unit. The model precludes any communication among processors: there is no shared memory or
interconnection network available. It can be regarded as
a restricted version of the SIMD model of computation.
4. PARALLEL PERMUTATION
GENERATOR
In this section we describe an adaptive parallel algorithm
for generating all permutations of m out of w objects. The
algorithm runs on the model of computation defined in
Section 3, with 1 ^k^nPm.
It makes use of algorithm
SEQUENTIAL PERMUTATIONS and of function
RANKP-1 described in Section 2.
Algorithm PARALLEL PERMUTATIONS_1
for / = 1 to k do in parallel
Processor pt performs the following steps
1) Lety = ( » - 1 ) . r»P m /fcl+1
2) if y ^ nPm then
2.1) Obtain the yth permutation from
RANKP-X/); call this permutation Xt
2.2) Starting with X{, use algorithm
SEQUENTIAL PERMUTATIONS
until a maximum of \nPm/k] — 1 of the
next permutations have been generated
end if
end for. Q
Step 1 requires 0(w) operations. In step 2.1 0(m.«)
operations are needed to generate Xt. It takes on the order
of (\nPm/k] -\).m operations to generate the \nPm/k] — 1
subsequent permutations. The overall running time of
PARALLEL PERMUTATIONS_1 is therefore dominated by max {0(/M . n), 0(fnPm/k]. m)}. This implies that
the algorithm has an optimal cost of 0("P m . m) when
1 *S k ^ »PJn.
5. PARALLEL COMBINATION
GENERATOR
We now describe an adaptive parallel algorithm for
generating all combinations of m out of n objects. The
algorithm runs on the model of computations defined in
Section 3, with 1 ^ k ^ nCm. It makes use of algorithm
434 THE COMPUTER JOURNAL, VOL. 30, NO. 5, 1987
PARALLEL ALGORITHMS FOR ENUMERATING PERMUTATIONS AND COMBINATIONS
SEQUENTIAL COMBINATIONS and of function
RANKC"1 described in Section 2.
Algorithm PARALLEL COMBINATIONS
for i = 1 to k do in parallel
Processor p( performs the following steps
2) if; < nCm then
2.1) Obtain the /th combination from
RANKC-1^/); call this combination Qt
2.2) Starting with Qt, use algorithm SEQUENTIAL COMBINATIONS until
a maximum of f"Cm/A:l— 1 of the next
combinations have been generated
end if
end for. [)
Step 1 requires 0(m) operations. In step 2.1 0(m.n)
operations are needed to produce Q(. It takes on the order
of Q"Cm/k\ — X).m operations to generate the
\nCm/k] — 1 subsequent combinations. The overall running time of PARALLEL COMBINATIONS is therefore dominated by max {0(/w.«), 0QnCm/k].m)}. This
implies that the algorithm has an optimal cost of
0("Cm .m) when 1 ^ k ^ nCJn.
6. PARALLEL PERMUTATION
GENERATOR FOR m = n
Finally, we describe an adaptive parallel algorithm for
generating all n! permutations of n objects. The algorithm
runs on the model of computation defined in Section 3,
with the exception that in this case only a few processors
are assumed to be available, namely 1 ^k ^n. The
algorithm is a parallelisation of algorithm SEQUENTIAL PERMUTATIONS described in Section 2.
For ease of presentation, let us begin by assuming that
k = n, i.e. that there are as many processors available as
objects to permute. The idea is to let processor pf, for
1 < i ^ n, generate all («— 1)! permutations of the
integers in the set St = S—{/}; each of these permutations
is then appended to the integer i.
In general, for k processors, where 1 < k ^ n, each
processor generates \n \/k] permutations. In other words,
each processor does the job of \n/k] processors in the
informal algorithm described above. This requires
0(\n\/k].n) time, for an optimal cost of O(n\.ri). The
general algorithm is given below.
Algorithm PARALLEL PERMUTATIONS_2
fory = 1 to k do in parallel
Processor pf performs the following steps
for i = (J-1) • \n/k] +1 to j . \n/k] do
if i < n then
2) Apply algorithm SEQUENTIAL PERMUTATIONS beginning with the permutation jc1xg...jcn_1, where all the x's are
from S( andx t < xt< ... < xn_x
3) Each permutation generated in step 2 is
appended to the integer i
end if
end for
end for. fj
7. CONCLUSION
New parallel algorithms were presented in this paper for
the problems of generating permutations and combinations. When comparing these algorithms with previously known algorithms for the same problems, we
observe the following:
(1) The new permutation generators are faster than
the algorithm of Gupta, and unlike the latter they are
cost-optimal. Paradoxically, the new algorithms are
conceptually much simpler than that of Gupta, while
designed to run on the same model of computation.10
(2) Unlike the algorithm of Chan, the new combination generator is adaptive. Again paradoxically, its
level of conceptual difficulty is the same as that of Chan,
while designed to run on a much weaker model of
computation.5
(3) The algorithms in Sections 4 and 5 are designed to
run with k (the number of processors) taking any value
from 1 to iV(the number of permutations or combinations
to be generated). However, they are optimal only when
1 < k < N/n. The possibility of achieving optimality
over the full range from 1 to N remains an open question.
(4) The algorithms in Sections 4 and 5 rely on the
availability of numbering systems for permutations and
combinations, respectively. The permutation generator
in Section 6, on the other hand does not make use of such
a numbering system, but rather is a direct parallelisation
of a sequential algorithm. The existence of a parallel
combination generator which runs on the same model as
the algorithm in Section 6, while sharing all the properties
of the latter, is also left as an open question.
In conclusion we note that there are many other
combinatorial enumeration problems for which adaptive
and cost-optimal parallel algorithms are yet to be
developed. The results in this paper suggest a way for this
to be achieved on a very simple model of computation.
Acknowledgement
The author wishes to thank an anonymous referee for a
number of instructive comments.
REFERENCES
1. S. G. Aid, A comparison of combination generation
methods. ACM Transactions on Mathematical Software 7
(1), 42-45 (1981).
2. S. G. Akl, Parallel Sorting Algorithms. Academic Press,
Orlando, Florida (1985).
3. K. E. Batcher, The flip network in STARAN. Proceedings
of the 1976 International Conference on Parallel Processing.
IEEE Computer Society Press, Silver Spring, Maryland, pp.
65-71 (1976).
4. V. E. Benes, Mathematical Theory of Connecting Networks
and Telephone Traffic. Academic Press, New York (1965).
5. B. Chan and S. G. Akl, Generating combinations in
parallel. BIT 26 (1), 2-6 (1986).
6. C. Clos, A study of non-blocking switching networks. Bell
System TechnicalJournal 32, 406-424 (1953).
7. D. M. Eckstein, Simultaneous Memory Access. Technical
Report no. 79-6, Department of Computer Science, Iowa
State University, Ames, Iowa (1979).
THE COMPUTER JOURNAL, VOL. 30, NO. 5, 1987 435
S. G. AKL
8. D. Fraser, Array permutation by index digit permutation.
Journal of the ACM 23 (2), 298-308 (1976).
9: S. W. Golomb, Permutations by cutting and shuffling.
SIAM Review 3 (4), 293-297 (1961).
10. P. Gupta and G. P. Bhattacharjee, Parallel generation of
permutations. The Computer Journal 26 (2), 97-105 (1983).
11. R. W. Hockney and C. R. Jesshope, Parallel Computers.
Adam Hilger, Bristol (1981).
12. G. D. Knott, A numbering system for combinations.
Communications of the ACM 17 (1), 45-^6 (1974).
13. G. D. Knott, A numbering system for permutations of
combinations. Communications of the ACM 19 (6), 355-356
(1976).
14. D. H. Lawrie, Access and alignment of data in an array
processor. IEEE Transactions on Computers C-24 (12),
1145-1155(1975).
15. J. Lenfant, Parallel permutations of data: a Benes network
control algorithm for frequently used permutations. IEEE
Transactions on Computers 27 (7), 637-647 (1978).
16. J. Lenfant and S. Tahe, Permuting data with the Omega
network. Ada Informatica 21, 629-641 (1985).
17. D. Nassimi and S. Sahni, Data broadcasting in SIMD
computers. IEEE Transactions on Computers C-30 (2),
282-288 (1981).
18. D. Nassimi and S. Sahni, A self-routing Benes network and
parallel permutation algorithms. IEEE Transactions on
Computers C-30 (5), 332-340 (1981).
19. D. Nassimi and S. Sahni, Parallel permutation and sorting
algorithms and a new generalized connection network.
Journal of the ACM 29, 642-677 (1982).
20. A. Nijenhuis and H. S. Wilf, Combinatorial Algorithms.
Academic Press, New York (1978).
21. S. E. Orcutt, Implementation of permutation functions in
Illiac IV-type computers. IEEE Transactions on Computers
C-25 (9), 929-936 (1976).
22. M. C. Pease, The indirect binary w-cube microprocessor
array. IEEE Transactions on Computers C-26 (5), 458-473
(1977).
23. E. M. Reingold, J. Nievergelt and N. Deo, Combinatorial
Algorithms. Prentice-Hall, Englewood Cliffs, New Jersey
(1977).
24. R. Sedgewick, Permutation generation methods. ACM
Computing Surveys 19 (2), 137-164 (1977).
25. H. J. Siegel, Interconnection Networks for Large-Scale
Parallel Processing. D. C. Heath, Lexington, Massachusetts (1985).
26. C. L. Wu and T. Y. Feng, The universality of the shuffleexchange network. IEEE Transactions on Computers C-30
(5), 324-332(1981).
436 THE COMPUTER JOURNAL, VOL. 30, NO. 5, 1987