Technical Disclosure Commons Defensive Publications Series March 20, 2017 Sampling Without Replacement: DurstenfeldFisher-Yates Permutation for Very Large Permutation Sizes Martin Harriman Pure Storage, Inc. Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended Citation Harriman, Martin, "Sampling Without Replacement: Durstenfeld-Fisher-Yates Permutation for Very Large Permutation Sizes", Technical Disclosure Commons, (March 20, 2017) http://www.tdcommons.org/dpubs_series/430 This work is licensed under a Creative Commons Attribution 4.0 License. This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons. Harriman: Sampling Without Replacement: Durstenfeld-Fisher-Yates Permutatio PURE STORAGE® DEFENSIVE PUBLICATION Sampling without Replacement: Durstenfeld-Fisher-Yates Permutation for Very Large Permutation Sizes Author: Martin Harriman Publication Date: March 17, 2017 Published by Technical Disclosure Commons, 2017 2 Defensive Publications Series, Art. 430 [2017] Summary We present a modification of the Durstenfeld-Fisher-Yates random-permutation algorithm for use in sampling without replacement from a large population. Motivation We would like to sample without replacement from a potentially large population. The DurstenfeldFisher-Yates random permutation algorithm is attractive for small populations, as it generates the desired sample in time and memory proportional to the population size. We modify Durstenfeld to generate a partial permutation (and thus, sample without replacement) in time and memory proportional to the sample size. Description Given a vector p, the Durstenfeld random permutation algorithm repeatedly swaps two elements, call them p[i] and p[j] (in Durstenfeld's algorithm, i iterates sequentially over each element of p, and j is a random value, i <= j <= length of p). Instead of using a vector p, we will use a hash table h to record the entries we modify. The hash table records only those elements of p that we have changed, so it will require memory proportional to the length of the partial permutation we generate. When swapping p[i] and p[j], for i != j, look for the index j in h. If j is in h, use the value from the map as p[j], otherwise use j as the value of p[j]. Similarly, if i is in the hash map, use the value from the map as p[i], and if not, use i. Then record the swap, and in particular enter the new value of p[j] in the map. For Durstenfeld, we will never examine p[i] again, so we can record the new value of p[i] however we please. Though h is described as a hash table, one could use any associative structure to implement it. One could equally well implement h as content-addressable memory in hardware, for instance. © Pure Storage 2017 | 2 http://www.tdcommons.org/dpubs_series/430 3 Harriman: Sampling Without Replacement: Durstenfeld-Fisher-Yates Permutatio References Following documents were referred in preparing this document: Durstenfeld, Richard, “Algorithm 235: Random Permutation,” Communications of the ACM, volume 7 number 7, 1964, p. 420 Knuth, Donald E., The Art of Computer Programming, Volume 2: Seminumerical Algorithms, third edition, Boston, 1997. About the Author Martin Harriman has been happily generating pseudorandom permutations using Durstenfeld’s algorithm since 1973: many thanks to the Stanford Bookstore for displaying Knuth’s The Art of Computer Programming so prominently (and of course, many thanks to Professor Knuth for writing it). © Pure Storage 2017 | 3 Published by Technical Disclosure Commons, 2017 4 Defensive Publications Series, Art. 430 [2017] © 2017 Pure Storage, Inc. All rights reserved. Pure Storage and the "P" Logo registered trademarks of Pure Storage, Inc. in the U.S. and other countries. Any other trademarks are the property of their respective owners. THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. PURE STORAGE SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. Pure Storage, Inc. 650 Castro Street, Mountain View, CA 94041 http://www.purestorage.com Pure Storage, Inc. Twitter: @purestorage www.purestorage.com 650 Castro Street, Suite #260 Mountain View, CA 94041 T: 650-290-6088 F: 650-625-9667 Sales: [email protected] © Pure Storage 2017 | 4 Support: [email protected] Media: [email protected] General: [email protected] http://www.tdcommons.org/dpubs_series/430 5
© Copyright 2025 Paperzz