Random number generators - Studentportalen

Introduction
Random number generators
Drawing from a given distribution
Typical uses
Statistical Methods in Physics: Monte Carlo I
David Boersma (Uppsala Universitet)
December 2013, Uppsala
1 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Outline I
Introduction
Overview
Classic example
Virtual experiment
Random number generators
Randomness
Properties of pseudo-random number generators
Examples of PRNG and QRNG algorithms
Seeding
RNG software
Drawing from a given distribution
Acceptance-rejection method
Transformation method
Combined method
Typical uses
Typical uses
2 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Monte Carlo techniques in physics
• Simulate a complete experiment
• Use random number generators to assign values to experimental parameters
• Calculate the observable (“measured”) values
• Repeat this many times
• Study distributions
•
•
•
•
Detector design: compare performances of different designs
Develop and test analysis procedure without “burning” the real data
Estimate (dominant contributions to) resolution of output variables
Compare distributions: simulated versus “real” data (improve understanding of detection
process)
• Hypothesis testing: are measured values compatible with a null hypothesis?
• Simulate parts of an experiment
• “Toy Monte Carlo”
• “Evaluate a multi-dimensional integral”
3 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Monte Carlo techniques in physics
• Simulate a complete experiment
• Use random number generators to assign values to experimental parameters
• Calculate the observable (“measured”) values
• Repeat this many times
• Study distributions
•
•
•
•
Detector design: compare performances of different designs
Develop and test analysis procedure without “burning” the real data
Estimate (dominant contributions to) resolution of output variables
Compare distributions: simulated versus “real” data (improve understanding of detection
process)
• Hypothesis testing: are measured values compatible with a null hypothesis?
• Simulate parts of an experiment
• “Toy Monte Carlo”
• “Evaluate a multi-dimensional integral”
3 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Classic example: “measuring the area of a disk”
100 points in d = 2 dimensions: grid or random
r
Nin /Ntotal
π
r
≈
πr 2 /(2r )2
Nin /Ntotal
≈
4Nin /Ntotal
π
=
4 ∗ 80/100 = 3.2
≈
πr 2 /(2r )2
≈
4Nin /Ntotal
=
4 ∗ 83/100 = 3.31999
4 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Generalized: “measuring the volume of an n-dimensional ball”
(or: measuring π)
Recipe:
• Generate Ntotal points “uniformly” distributed within the n-dimensional unit cube
{xi ∈ [0, 1]n }, for some n ≥ 2.
• Determine Ninside , the number of those points that are within the unit ball
(|xi | <= 1)
• The estimated volume of the (whole) unit ball is now Vd = 2d Ninside /Ntotal
• On the other hand V2k = 2 ∗ (2π)k /k ! and V2k +1 = π 2k +1 /(2k + 1)!!, so the
estimate for Vd is equivalent to an estimate for π.
Question: What is the best strategy for “uniformly distributing” the points xi ? Regular
grid, or random?
Answer 1: The books say (or: “It is easy to show”) that “at higher dimensions” the
random method is more efficient. Specifically: the estimated error for a d-fold
p
−2/d
integration using n points in a d scales as 1/ Ntotal and Ntotal for random and grid
methods, respectively.
Answer 2: Well, let’s try!
5 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Generalized: “measuring the volume of an n-dimensional ball”
(or: measuring π)
Recipe:
• Generate Ntotal points “uniformly” distributed within the n-dimensional unit cube
{xi ∈ [0, 1]n }, for some n ≥ 2.
• Determine Ninside , the number of those points that are within the unit ball
(|xi | <= 1)
• The estimated volume of the (whole) unit ball is now Vd = 2d Ninside /Ntotal
• On the other hand V2k = 2 ∗ (2π)k /k ! and V2k +1 = π 2k +1 /(2k + 1)!!, so the
estimate for Vd is equivalent to an estimate for π.
Question: What is the best strategy for “uniformly distributing” the points xi ? Regular
grid, or random?
Answer 1: The books say (or: “It is easy to show”) that “at higher dimensions” the
random method is more efficient. Specifically: the estimated error for a d-fold
p
−2/d
integration using n points in a d scales as 1/ Ntotal and Ntotal for random and grid
methods, respectively.
Answer 2: Well, let’s try!
5 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Regular grids versus random samplings
6 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Monte Carlo techniques in physics: virtual experiment
• Initial state
• Accelerator: beam profile, momentum distribution, timing profile, target material
• Cosmic rays: spectrum, angular distribution, atmospheric conditions
• Nuclear fuel: chemical/isotopic composition, spatial distribution/density
• Interactions, evolution
• Accelerator: collision process, particles propagate through detector
• Cosmic rays: interact with molecules in atmosphere, shower development
• Nuclear fuel: radiation processes, heat transfer, material response
• Detection
• Accelerator: particle energy losses, transformation to electrical signals, amplification,
digitization, processing
• Cosmic rays: similar
• Nuclear fuel: radiation and temperature readouts, flow rates, pressures
• Data Analysis
• Accelerator: distributions of Em , pT , . . .
• Cosmic rays: similar
• Nuclear fuel: radiation and temperature readouts, flow rates, pressures
7 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Randomness
• True randomness
• Non-deterministic: non-repeating and unpredictable
• Used for cryptography
• Not reproducible
• Needs source of entropy
• e.g. radioactive source, cosmic rays, electronic noise, one-time-pad,
http://www.random.org, /dev/urandom
• Pseudo-randomness
• Deterministic imitation of true randomness
• Used for simulations
• Reproducibility: seed value completely determines sequence
• Quasi-randomness
•
•
•
•
•
No imitation true randomness
A.k.a. “low discrepancy random numbers”
Determinstic, efficient coverage of n-dimensional phase space
Used for computing integrals
Reproducibility: fixed sequence, seed value completely determines sequence
8 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Properties of pseudo-random number generators
• Speed
• You don’t want a massive simulation slowed down by the RNG
• Reproducible
• Debugging (e.g. investigating rare crashes)
• Regeneration (in case of data loss)
• Long repetition period
• After one repetition period the RNG is worthless
• Entropy (does it look random?)
• The sequence 0, 1, 2, 3, . . . is fast to generate and never repeats, but not so useful as a
•
•
•
•
RNG.
Humans are not very good at judging randomness
Correlations between successive values should be minimal
n-tuples should be uniform
projections of n-tuples along any axis should result in uniform distributions (e.g.: no
hyperplanes)
• ....
9 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
http://dilbert.com/strips/comic/2001-10-25/
10 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Bo Allan’s analysis of PHP rand() on Windows
GOOD (MAYBE)
BAD
Pictures borrowed from http://www.boallen.com/random-numbers.html, via
http://www.random.org/analysis/
11 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Examples of PRNG and QRNG algorithms
• PRNG:
• linear congruential algorithm: ri+1 = (a × ri + b) mod m
•
•
•
•
•
Poor man’s random number generator
State vector length: 1
a and b must be carefully chosen
Repetition period maximum m
Sequence of points in n-dimensional space will form hyperplanes
• Mersenne Twister
• Better than LCA
• State vector length: 624
• Repetition period is a Mersenne prime, e.g. 219937 − 1 ≈ 106000 (most common, MT19937)
• Good uniformity sequence of points in n-dimensional space (for n < 624)
• When using on GPUs or multithreaded situations: read Saito & Matsumoto (arXiv:1005.4973)
• RANLUX
• Better than LCA, comparable to Mersenne Twister
• State vector length: 24
• Repetition period ∼ 10171
• Martin Lüscher and Frank James, e.g. arXiv:hep-lat/9309020
• Many others...
• QRNG
• Sobol sequence
• I.M. Sobol, 1967
• Available in MatLab, GSL; not in ROOT or scipy/numpy, but implementations exist, e.g.
http://people.sc.fsu.edu/~jburkardt/py_src/sobol/sobol.html which might
make it into a scipy in the future.
• Works well for n-tuples with n <= 40, for higher dimensions check the extensions/improvements
by Stephen Joe and Frances Kuo: http://web.maths.unsw.edu.au/~fkuo/sobol/.
• Others
• Niederreiter, Halton, Faure
12 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Sobol example in 2 dimensions
Pseudo-random
Color code: red=1,..,10, blue=11,..,100, green=101,..,256
Quasi-random
Picture by Jheald [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia
Commons
13 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Seeding
Common strategies (not necessarily good): base seed on
• pre-generated set of seed values
• date and time
• dataset number, file number
• process id (of the running instance of the simulation program)
• environmental input (CPU temperature, number of running processes on the
machine, number of free bytes on disk or in RAM, weather data)
• combination of several of the above
Common mistakes:
• reusing same seed for simulation datasets that are intended to be independent
(ruins statistical analyses).
• forgetting to store seed value with data set (makes simulation irreproducible)
• seeds for RNG are generated with that same RNG
SPRNG (Scalable Parallel Random Number Generators Library)
Random number generation library designed for large productions.
Library for fortran, C, C++.
14 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
RNG software
Some commonly implementations of RNG (incomplete list):
• ROOT:
• TRandom: repetition period: ∼ 109 do not use
• TRandom2: repetition period: ∼ 1026 do not use
• TRandom3: “Mersenne Twistor” [sic]
Use the special seed value 0 if the result should not be reproducible.
• CLHEP (used by ATLAS): RANLUX
• GSL: large number of algorithms (many of them legacy), including all PRNG and
QRNG listed previously.
• Scipy/numpy: Mersenne Twister
• C standard library (stdlib.h in GLIBC):
• drand48(), lrand48(), (etc.). Set seed with srand48(s). Use only when you can
afford to be lazy. Linear congruential algorithm with 48-bit integer arithmetic: period is
248 (≈ 2.8 · 1014 ), but the randomness is still mediocre.
• rand() and random(): uses a linear additive feedback method, a variation on LCA. Use
only when you can afford to be lazy. (The random() man page says that it’s a non-linear
additive feedback method, that’s a lie!)
15 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Drawing from a given distribution: rejection method
16 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Drawing from a given distribution: transformation method
17 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Drawing from a given distribution: combined method
18 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Importance sampling (weights)
19 / 20
Introduction
Random number generators
Drawing from a given distribution
Typical uses
Typical uses of MC simulations
• Design experiments
• Testing data analysis software
• Contamination estimates
• Geometrical correction factors
• Do theory and experiment agree?
• Parameter determination
20 / 20

Download Report

Random number generators - Studentportalen

Paperzz.com

Your Paperzz