Presentation

Version 1.1 - September 5, 2013
1
rand() Considered Harmful
Stephan T. Lavavej ("Steh-fin Lah-wah-wade")
Senior Developer - Visual C++ Libraries
[email protected]
2
What's Wrong With This Code?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
srand(time(NULL));
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
}
3
What's Right With This Code?
#include <stdio.h> All required headers are included!
#include <stdlib.h> All included headers are required!
#include <time.h>
Headers are sorted!
int main() {
One True Brace Style!
srand(time(NULL));
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
%d is correct for int!
printf("\n");
}
Unnecessary argc, argv, return 0; omitted!
4
What's Wrong With This Code?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
srand(time(NULL));
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
}
5
What's Wrong With This Code?
ABOMINATION!
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
srand(time(NULL));
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
}
6
What's Wrong With This Code?
ABOMINATION!
#include <stdio.h>
#include <stdlib.h>
Frequency: 1 Hz!
#include <time.h>
int main() {
srand(time(NULL));
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
}
7
What's Wrong With This Code?
ABOMINATION!
#include <stdio.h>
#include <stdlib.h>
Frequency: 1 Hz!
#include <time.h>
int main() {
32-bit seed!
srand(time(NULL));
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
}
warning C4244:
'argument' :
conversion from
'time_t' to
'unsigned int',
possible loss
of data
8
What's Wrong With This Code?
ABOMINATION!
warning C4244:
#include <stdio.h>
'argument' :
#include <stdlib.h>
Frequency: 1 Hz! conversion from
#include <time.h>
'time_t' to
int main() {
'unsigned int',
32-bit seed!
possible loss
srand(time(NULL));
of data
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
}
Range: [0, 32767]
Linear congruential  low quality!
9
What's Wrong With This Code?
ABOMINATION!
warning C4244:
#include <stdio.h>
'argument' :
#include <stdlib.h>
Frequency: 1 Hz! conversion from
#include <time.h>
'time_t' to
int main() {
'unsigned int',
32-bit seed!
possible loss
srand(time(NULL));
of data
for (int i = 0; i < 16; ++i) {
printf("%d ", rand() % 100);
}
printf("\n");
Non-uniform distribution!
}
Range: [0, 32767]
Linear congruential  low quality!
10
Modulo  Non-Uniform Distribution
int src = rand(); // Assume uniform [0, 32767]
int dst = src % 100; // Non-uniform [0, 99]
//
[0, 99] src  [0, 99] dst
//
[100, 199] src  [0, 99] dst
// ...
// [32700, 32767] src  [0, 67] dst
• This is modulo's fault, not rand()'s
• Trigger: input range isn't exact multiple of output range
11
Floating-Point Treachery
int src = rand(); // Assume uniform [0, 32767]
int dst = static_cast<int>(
// As seen on
(src * 1.0 / RAND_MAX) * 99 // StackOverflow
); // Hilariously non-uniform [0, 99]
• Only one input produces the output 99:
static_cast<int>((32765 * 1.0 / 32767) * 99) == 98
static_cast<int>((32766 * 1.0 / 32767) * 99) == 98
static_cast<int>((32767 * 1.0 / 32767) * 99) == 99
12
Floating-Point Double Treachery
int src = rand(); // Assume uniform [0, 32767]
int dst = static_cast<int>(
(src * 1.0 / (RAND_MAX + 1)) * 100
); // Subtly non-uniform [0, 99]
• Less likely outputs (327/32768 vs. 328/32768):
3, 6, 9, 12, 15, 18, 21, 24, 28, 31, 34, 37, 40, 43,
46, 49, 53, 56, 59, 62, 65, 68, 71, 74, 78, 81, 84,
87, 90, 93, 96, 99
• Same problem as src % 100
• Nothing can uniformly map 32768 inputs to 100 outputs
13
Floating-Point Triple Treachery
• What if the input is [0, 232) or [0, 264)?
• Non-uniformity is reduced, but not eliminated, when the
input is much larger than the output
• What if IEEE runs out of bits?
• Example: [0, 264) input  [0, 1018 ≈ 259.8) output
• double has only 53 bits of significand precision
• Say you have a problem, so you use floating-point
• Now you have 2.000001 problems
• DO NOT MESS WITH FLOATING-POINT
14
<random> URNGs
(Uniform Random Number Generators)
• Engine templates:
• Engine (adaptor) typedefs:
• linear_congruential_engine
• mersenne_twister_engine
• subtract_with_carry_engine
• Engine adaptor templates:
• discard_block_engine
• independent_bits_engine
• shuffle_order_engine
• Non-deterministic:
• random_device
•
•
•
•
•
•
•
•
•
•
minstd_rand0
minstd_rand
mt19937
mt19937_64
ranlux24_base
ranlux48_base
ranlux24
ranlux48
knuth_b
default_random_engine
15
<random> Distributions
• Uniform distributions
• uniform_int_distribution
• uniform_real_distribution
• Poisson distributions
• poisson_distribution
• exponential_distribution
• gamma_distribution
• weibull_distribution
• extreme_value_distribution
• Sampling distributions
• discrete_distribution
• piecewise_constant_distribution
• piecewise_linear_distribution
• Bernoulli distributions
• bernoulli_distribution
• binomial_distribution
• geometric_distribution
• negative_binomial_distribution
• Normal distributions
• normal_distribution
• lognormal_distribution
• chi_squared_distribution
• cauchy_distribution
• fisher_f_distribution
• student_t_distribution
16
Hello, "Random" World!
#include <iostream>
#include <random>
int main() {
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
std::cout << std::endl;
}
17
Hello, "Random" World!
#include <iostream>
#include <random>
Deterministic 32-bit seed
int main() {
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
std::cout << std::endl;
}
18
Hello, "Random" World!
#include <iostream>
Engine: [0, 232)
#include <random>
Deterministic 32-bit seed
int main() {
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
std::cout << std::endl;
}
19
Hello, "Random" World!
#include <iostream>
Engine: [0, 232)
#include <random>
Deterministic 32-bit seed
int main() {
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
Distribution: [0, 99]
std::cout << std::endl;
}
20
Hello, "Random" World!
#include <iostream>
Engine: [0, 232)
#include <random>
Deterministic 32-bit seed
int main() {
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
Distribution: [0, 99]
std::cout << std::endl;
}
Note: [inclusive, inclusive]
21
Hello, "Random" World!
#include <iostream>
Engine: [0, 232)
#include <random>
Deterministic 32-bit seed
int main() {
std::mt19937 mt(1729);
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
Distribution: [0, 99]
std::cout << std::endl;
Run engine,
}
viewed through distribution
Note: [inclusive, inclusive]
22
Hello, Random World!
#include <iostream>
#include <random>
int main() {
std::random_device rd;
std::mt19937 mt(rd()); Non-deterministic 32-bit seed
std::uniform_int_distribution<int> dist(0, 99);
for (int i = 0; i < 16; ++i) {
std::cout << dist(mt) << " ";
}
std::cout << std::endl;
}
23
mt19937 vs. random_device
• mt19937 is:
• Fast (499 MB/s = 6.5 cycles/byte for me)
• Extremely high quality, but not cryptographically secure
• Seedable (with more than 32 bits if you want)
• Reproducible (Standard-mandated algorithm)
• random_device is:
• Possibly slow (1.93 MB/s = 1683 cycles/byte for me)
• Strongly platform-dependent (GCC 4.8 can use IVB RDRAND)
• Possibly crypto-secure (check documentation, true for VC)
• Non-seedable, non-reproducible
24
uniform_int_distribution
• Takes any Uniform Random Number Generator
• Usually [0, 232) or [0, 264) but [1701, 1729] works
• If your URNG does that, you are bad and you should feel bad
• Emits any desired range of integers [low, high]
• signed/unsigned short/int/long/long long
• Why not char/signed char/unsigned char? Standard Says SoTM
• Preserves perfect uniformity
• Requires obsessive implementers
• Uses bitwise/etc. magic, invokes URNG repeatedly (rare)
• Runs fairly quickly (34% raw speed for me)
• Deterministic, but not invariant
• Will vary across platforms, may vary across versions
25
random_shuffle() Considered Harmful
template <typename RanIt>
void random_shuffle(RanIt f, RanIt l);
• May call rand()
• C++ Standard Library, I trusted you!
template <typename RanIt, typename RNG>
void random_shuffle(RanIt f, RanIt l, RNG&& r);
• Not evil, but highly inconvenient
• Knuth shuffle needs r(n) to return [0, n)
26
shuffle() Considered Awesome
template <typename RanIt, typename URNG>
void shuffle(RanIt f, RanIt l, URNG&& g);
• Takes URNGs directly (e.g. mt19937)
• Shuffles perfectly
• All permutations are equally likely
• Invokes the URNG in-place (can't copy)
• Other algorithms can copy functors, like generate()
• Special exception: for_each() moves functors
27
Random <random> Notes
• Running mt19937 is fast, constructing/copying isn't
• Constructing/copying engines often is already undesirable
• URNG/distribution function call ops are non-const
• Multiple threads cannot simultaneously call a single object
• When is it safe to skip uniform_int_distribution?
• mt19937's [0, 232) or mt19937_64's [0, 264)  [0, 2N)
• In this case, masking is safe, simple, and efficient
• In all other cases, use uniform_int_distribution