Randomized Shuffling

Randomized Shuffling
Dr. Dobb's Journal January 2000
By Timothy Rolfe
Tim is a visiting associate professor in the Computer Science Department of Eastern Washington
University. He can be contacted at [email protected].
Card shuffling is an example of putting a fixed number of items into completely random order. The
method used to shuffle cards applies to any programming circumstance where you need to
randomize the order of a fixed number of items, including Scrabble tiles, dominoes, standard
playing cards, Tarot cards, or lottery numbers, for example.
You can view this randomization of items as randomizing the permutation of entries in an array. In
this article, I'll examine a couple of randomizing algorithms -- one that does not generate all
permutations with equal probability, and another that does. Both algorithms are based on working
through the array, swapping entries in the process.
The first algorithm focuses on proceeding through the array, randomly positioning the value found
in each successive position. With this algorithm, as you proceed through the N positions of the array
(source positions), you choose the target position randomly from the N possible positions in the
array. The values in the source and target positions are then interchanged. H.M. Deitel and P.J.
Deitel refer to this as their "high-performance card shuffling" algorithm in their book C++: How to
Program (Prentice-Hall, 1998). In C++, this process looks like Example 1 (presuming a function or
macro Swap (L,R)). At first glance, it appears that this algorithm generates all permutations with
equal probability. On examination, however, you can see that this generates NN rearrangements of
elements -- each of the N iterations of the loop positions a value among the N available positions -even though there are only N! possible permutations of N elements. For each permutation there are a
number of ways to generate that permutation -- an average of NN/N! ways.
for ( Src = 0; Src < N; Src++ )
{
Dest = rand() % N; // All N positions equally likely
Swap (X[Src], X[Dest]);
}
Example 1: C++ implementation of shuffling algorithm that does not generate all
permutations with equal probability.
The second algorithm focuses on choosing a random value for each position in the array. Once the
value is placed in that position, the position (and the value) no longer participates in the shuffling.
While you could start at either end of the array, the code is simpler if the region for choosing a
source value always begins with subscript 0. To do that, you start with the rightmost position in the
array as the target and randomly select a source position from the front of the array to that target
position. You interchange the values, then treat the array as shortened by one position and do the
same thing over again. There are no choices left to be made when the remaining array has a single
element. Example 2 is C++ code that implements this approach. Examination of the structure of this
loop shows that it generates N! rearrangements of elements. All permutations are equally likely,
aside from the minor deviation from uniform distribution by selecting a random value between 0 and
Dest as (rand()%(Dest+1)).
for (Dest = N-1; Dest > 0; Dest--)
{
Src = rand() % (Dest+1); // Positions from [0] to
[Dest]
Swap (X[Src], X[Dest]);
}
Example 2: C++ implementation of shuffling algorithm that does generate all
permutations with equal probability.
A natural question is "How far from the uniform distribution are the permutations from the NN
algorithm discussed first?" To answer this, I'll develop a method of numbering permutations. Then
all of the permutations generated by the NN algorithm can be generated and counted. Since, in a
permutation, each position has one fewer choice than the one before it, you can borrow from
programming the notion of a multidimensional array, with dimensions: [N] [N-1] [N-2]...[3] [2] [1].
You can then map subscripts for such an N-dimensional array onto a one- dimensional offset. Each
permutation can then be viewed as N times selecting elements from those remaining (as selecting
from position [0], [1], and so on). At first, there are N elements to choose from, then (N-1), then (N2), down to 1. Viewed as subscripts in an N-dimensional array, these positions of elements chosen
will generate offsets (permutation indices) between 0 and (N!-1). A further advantage of the
numbering is that, if the original permutation is in increasing order, it numbers the permutations in
lexicographic order; that is, something like alphabetical or dictionary order. Thus the permutation
with all elements in ascending order is numbered "0," while the permutation with all elements in
descending order is numbered "N!-1."
Figure 1 illustrates how, in the permutation of eight elements, the 88(16777216) rearrangements
from the first algorithm map onto the 8! (40320) possible permutations. The file mapperms.cpp
(available electronically; see "Resource Center," page 5) implements this process. Table 1 shows
that there are some individual permutations that are markedly more likely than others. While most
appear to cluster between 200 and 600, the pattern is by no means random. Applying a moving
average to these data, you can show some underlying regularities in the data, again indicating that
the data are far from random. In Figure 2, a moving window of about 0.5 percent of all the data is
applied (201 cells averaged and assigned to the central point). Another way that you can get a feel
for the distribution is by sorting those 40320 data points; see Figure 3.
Figure 1: Mapping NN generated permutations onto N! permutation indices.
Table 1: Permutations that are markedly more likely than others.
Figure 2: Mapping NN generated permutations onto N! permutations indices.
Moving average, 201 cell window.
Figure 3: Number of hits: sorted 88/8!=416.1016
Another approach is to investigate, in the mapping of these NN reorderings onto the N! permutations,
the probability (fraction of the total) for each position in the initial string to end up in each position
in the shuffled string. Table 2 shows those probabilities. In it, each column and each row totals to
100 percent. Down the columns, each position contains one of the eight available characters. Across
the rows, each character shows up on one of the eight positions. The first character in the source
string shows equal probability of ending up in all eight positions of the rearranged string, while all
other characters in the source string show varying probabilities for their positions in the rearranged
string.
Table 2: Probability for each position in the initial string to end up in each
position in the shuffled string.
Also, you see that only the final position in the rearranged string has an equal probability of
receiving all of the characters of the source string. Figure 4 displays Table 2 with the eight series
taken from the rows.
Figure 4: Distribution of characters. Series labels: character positioned.
Acknowledgment
Thanks to Dr. Ray Hamel of the Eastern Washington University Computer Science Department for
his comments, specifically for the idea of examining the relationships between the initial and final
positions of items in the rearrangements.
DDJ