Making Sense of Randomness: Probability Distributions

Making Sense of Randomness:
Probability Distributions
Many things that we observe in nature or in human affairs seem
to be random.
How can we understand things that occur randomly?
We look for patterns in how frequently we observe certain values
(or values within a specific interval).
This leads to a probability distribution.
1
Probability Distributions
Various kinds of random events follow different probability distributions.
For example, the time between eruptions of Old Faithful geyser
in Yellowstone National Park varies from just a few minutes to
a couple of hours.
It is the subject of continuous monitoring by the National Park
Service.
During the year 2011, the number of times corresponding to
each time interval measured in 12 second intervals were counted
and then converted to percentages.
Here is a histogram of the percentages:
2
Data collected by Ralph Taylor.
3
Other Probability Distributions
Other kinds of random events follow different probability distributions.
Some probability distributions are similar to each other, and we
may even give them names, for example the binomial distribution,
as we discussed last time.
We flip a coin 10 times, and count how many heads we get,
somewhere between 0 and 10.
The binomial distribution is actually a family of distributions
that depend on how many “trials”, and on the probability of a
“success”.
If, instead of 10 trials, we flip the coin 20 times, we would get a
similar pattern of numbers between 0 and 20.
The probability of a success on each trial is still 1/2.
4
The Binomial Probability Distributions
What about if we roll a die several times and count how many
times we get a 3?
The probability of a success on each trial is now 1/6.
If we roll the die 10 times and count the 3’s, then roll it 10 times
again, then again over and over, each time counting the number
of 3’s, we might get a histogram that looks like this:
5
150
100
50
0
Frequency
200
250
300
Number of Times a "Three" Appeared in 10 Trials Repeated 1,000 Times
0
1
2
3
4
5
6
Number of Times a "Three" Appeared in Ten Trials
The experiment was run 1,000 times.
6
We see that there were no “threes” about 175 times out of a
total of 1,000 times the die was rolled 10 times.
(That sentence is a little hard to read; note the different things
that the word “times” refers to.)
We also see that a “three” never occurred more than 6 times
out of 10 trials repeated 1,000 times.
According to the binomial probability function, the probability of
getting a “three” 7 times out of 10 trials is
10
(1/6)7(5/6)3
7
This is approximately 0.00024807.
(Check it out in Matlab. Use binopdf(7,10,1/6); see lecture notes
of March 24.)
7
Other Probability Distributions
Other kinds of random events follow “symmetric” probability
distributions that are concentrated near the center and fall off
toward the ends.
For example, if we have a hundred or so female hamsters (or
some other small animal) of about the same age, and we weight
each one and count how many are less than 18 grams, how many
are between 18 and 20 grams, and so on, we might get counts
that look something like this:
8
10
5
0
Frequency
15
20
Histogram of Weights
18
20
22
24
26
Weights
9
Normal Probability Distribution
The frequencies of a normal probability distribution have a
general shape of a “bell curve”.
0.2
0.1
0.0
Frequency
0.3
0.4
Bell−Shaped Curve
10
Normal Probability Distribution
Lots of random things follow a frequency distribution that is
similar to a normal probability distribution.
We study the random processes in nature by using a probability
distribution model, and then we study the probability model.
There are various parameters that describe a particular probability distribution.
For example, normal probability distributions have two properties
that distinguish one from another:
mean – the “average value”
variance – how spread out are the values.
11
Theoretical Probability Distributions and Data
We make mathematical models of probability distributions.
These probability models describe idealized frequencies of “populations”.
These probability models have parameters that distinguish one
population from another.
12
Parameters
The meaning of the parameters depends on the type of probability distribution.
As I mentioned, the normal probability distribution has two parameters that distinguish one from another, the mean and the
variance.
A uniform distribution also has two parameters, but these are
the minimum and maximum limits of the distribution.
Other distributions may have a parameter that tells how skewed
they are.
13
Studying Random Events and the Probability
Distributions that Govern Them
In science, we often study random phenomena by use of samples
of data.
The science of statistics provides us with tools for estimating
parameters of distributions.
To estimate a mean of a population, for example, we may just
use the mean of a sample of data.
This is pretty obvious, but there are many more interesting questions that arise when we use a sample to make inferences about
a population.
Statisticians address these questions.
14
The Variance and the Standard Deviation
The variance is an average of the square of the difference of each
point and the mean value.
The square root of the variance is called the standard deviation.
15
M=0,V=1
M=5,V=1
M=0,V=9
M=5,V=9
0.2
0.1
0.0
Frequency
0.3
0.4
Normal Probability Distributions
−10
−5
0
5
10
15
16
Generating Random Numbers
There is a wealth of statistical theory that guides the scientist
in collecting data and in using that data for making inferences
about the probability distribution of a full population.
In this course, we will consider some simple aspect fo this process
– but with a twist.
Instead of collecting “real” data, we will simulate random data
on the computer.
How to do this??
17
Generating Random Numbers
The computer performs computations (almost) exactly – and
deterministically.
That is, it computes the same thing every time.
There are, however, ways of generating “random numbers” on
the computer.
Because they are not really random, we call them “pseudorandom” numbers.
These ways have been studied and evaluated by mathematicians
and statisticians for several years now.
We will not go into the details of how this is done, but we want
to learn to use some of them in Matlab.
How it gets different numbers each time is by use of the system
clock.
18
Generating Random Numbers in Matlab
There are four simple functions in Matlab to generate random
numbers:
• rand - “pseudorandom” uniform values between 0 and 1
• randi - “pseudorandom” integers
• randn - “pseudorandom” normal values
• randperm - “pseudorandom” permutations
There is another more general one called random, which can generate random numbers from several specific probability distributions.
It is in the Simulink Toolbox.
19
Generating Random Numbers in Matlab
X = rand(n) returns an n-by-n matrix.
X = rand(n,1) returns a column matrix with n elements.
X = rand(1,n) returns a row matrix with n elements.
X = randn(n) returns an n-by-n matrix of pseudorandom normal
values.
X = randn(n,1) returns a column matrix with n elements.
X = randn(1,n) returns a row matrix with n elements.
X = randi([imin imax],m,n])
p = randperm(n) returns a row vector containing a random permutation of the integers from 1 to n inclusive.
p = randperm(n,k) returns a row vector containing k unique integers selected randomly from 1 to n inclusive.
20
Generating Random Numbers in Matlab
There is also a Matlab function that controls how the random
number generators work:
rng
rng(seed)
rng(’shuffle’)
rng(seed, generator)
rng(’shuffle’, generator)
rng(’default’)
scurr = rng
rng(s)
sprev = rng(...)
21
Simulation
In applied mathematics and science and engineering, we often
study random processes by simulating the random events.
For example, we might simulate a population of some type of wild
animal by beginning with a population with some given size, and
then assuming that some random proportion of the population
dies each year and that there is some random number of births
proportional to the number of females in the population each
year.
Obviously, such a simple simulation model would not be very
good.
(In the context of a simulation model,
“very good” means corresponding closely to reality.)
22
Simulation Models
We might add components to the model that help to account
for the random differences in weather and/or food supply.
We might add components to the model that represent populations of some other types of wild animals that live in the same
area (predators or prey).
... and so on and on ...
23
Monte Carlo Simulation
“Monte Carlo” methods are simple techniques that use simulation to answer difficult questions.
0.2
0.1
0.0
y
0.3
0.4
For example, what’s the area under the curve shown here:
0
1
2
3
4
x
24
Monte Carlo Simulation
We can easily figure out the area of the rectangle: 4.0 × 0.4, or
1.6 square units.
Suppose we could simulate a bunch on random points that are
uniformly distributed over the rectangle.
The ratio of the area under the curve to the area of the rectangle
should be about the same as the ratio of the points that are under
the curve to the total number of random points.
25
0.4
Monte Carlo Simulation
+
*
+
+
+
+
+
0.3
+
++
++
**
*
+
*
*
*
*
**
**
*
+
+
+
+
+
+
+
*
*
0.2
*
0.1
*
*
*
*
**
*
*
***
*
* *
*
*
*
* *
**
* *
*
*
0
+
+
+
+
+ + ++
+
+ +
+
+
+
+
+
+
+
+
1
* ++
+
++
+
+
+
+
*
*
**
+
+
+
*
*
*
*
*
2
++ +
+
+
+
+
+
+
**
+
+
+
+
+
+
++
+
*
+
+ +
+
++
+ +
*
*
*
+
+
+
++
*
+
+
++
+
+
+
+
+
+
+
+
+
+ ++ +++
**
**
+
+
+
*
*
*
0.0
y
*
+
+
++
+
+
+
+
+
+
+
+
+
+
+ +
+ +
+
+ + +
++
+ +
3
4
x
26
Monte Carlo Simulation
I generated a total of 200 points and 66 of them were below the
curve.
We therefore estimate that area as (4.0 × 0.4) × 66/200.
27