Probability distributions

To answer this type of question:
You can use the following
probability distribution(s):
Remarks / assumptions / restrictions
What is the probability of getting
x times a particular result in a series
of n consecutive trials* ?
Binomial
-Each trial has only two possible outcomes
-The intrinsic probabilities p and q of each result are known
-Use the Binomial approximation when the probability of
each new result is unaffected by results of previous trials
(p and q do not vary: sample size is very large relative to n)
or
Hypergeometric
-Each trial has only two possible outcomes
-The initial probabilities p and q of each result are known
-Use the Hypergeometric approximation when the probability
of each new result is influenced by results of previous trials
(p and q change: sample size is limited relative to n)
What is the probability of getting
a particular result for the xth time
after n consecutive trials ?
Negative binomial
-The intrinsic probability p of that specific result is known
What is the probability that a
certain event (or result) will occur
x times in a certain interval t (of
time, or of space) ?
Poisson distribution
-The average rate of occurence of that particular result
(or event) is known and is assumed to be constant in time.
or space.
What is the probability that a
certain interval t of time (or space)
will pass between two consecutive
events (or results) ?
Exponential
-The average rate of occurence of that particular result
(or event) is known and is assumed to be constant in time.
or space.
* A 'trial' can mean a test (example: does this object have a certain quality ?), a measurement, an observation, etc.
To answer this type of question:
you can use the following
probability distribution(s):
Remarks / assumptions / restrictions
What is the probability that a trial
will produce a particular result, if the
result of this trial is subject to random
(stochastic) variations (or errors) that
are equally likely to be positive or
negative ?
Normal (Gaussan)
-Use the Gaussian approximation when the estimate
of the mean result is based on a large number of
trials (sample size is large)
or
Student t
or
Lognormal
-Use the Student t approximation when the estimate
of the mean result is based on a limited number of
trials (sample size is limited)
-Use the Lognormal approximation if it is the log of
the result that is subject to random variations.
Probability distribution(s):
Typical examples of usage in the Earth and environmental sciences
Normal (Gaussian)
-Experimental (instrumental) measurements are often subject to random, independent
errors, therefore, the probability distribution of quantities that are measured can often
be assumed to be Normal (Gaussian), and the mean of the distribution represents
the most likely result.
-Many meterological and hydrological phenomena are subject to random variations,
for example the air temperature on a certain day of the year can vary from year to
year around a certain mean value. The probability distribution of such variations is
very often a Normal one.
Lognormal
-This is often used to describe the probability distribution of certain physical quantities
that are subject to cumulative (dependent) random variations. Very often with such
quantities, the higher the value, the less frequent it is. A typical example is the particle size
distribution of sediments like colluvium (produced by mass wasting deposits) in which
there are relatively few large blocks and comparatively much more finer material.
The relative concentration of analytes (ions, elements, etc.) in natural media such as soils or
water is very often log-normally distributed.
-The Normal and Lognormal distributions are closely related to each other:
If the probability distribution of certain variable x is Lognormal, then the probability
distribution of the log of that variable (i.e., log(x)) is Normal.
Binomial
Hypergeometric
-These probability distributions are used to estimate the probability of getting a
specific result (example: finding a gold nugget) in a succession of n trials (example: sieving
a river sediment sample). They have obvious, important applications for geological exploration
and mining, among other fields. The main difference between the Binomial and the
Hypergeometric is that the Binomial is used when the probability of a certain result is
always the same for each separate trial, whereas the Hypergeometric is used when the
probability of a certain result in a trial is influenced by the previous results.
Binomial
Hypergeometric
(continued)
-Whether the probability of successive trials are independent, or not, depends on the sample
size examined in the trials. If the sample size (n) is small relative to the total population size (N),
then the most appropriate probability distribution to use is the Binomial. But if n is large relative
to N, the Hypergeometric is the more appropriate distribution to use*.
The Negative Binomial distribution is related to the Binomial, but it is used instead to estimate
the probability that a certain result will be obtained x times after n trials. This can be used,
for example, to figure out how many tests are needed to ensure a certain sucess rate of
getting a certain result (an important thing to know, if performing many tests is expensive !)
*For a good clarification, see: Wroughton & Cole (2013) Journal of Statistics Education 21 (1),
pp. 1-16. doi: http://www.amstat.org/publications/jse/v21n1/wroughton.pdf.
Student t
Fisher F
-These two distributions are used in many statistical tests. The Student t probability distribution
is often used when the mean of groups of observations are compared. The Fisher F probability
distribution is used when it is the variance of the groups of observations that are compared.
-In both cases, there is an assumption that the observations, taken from a limited sample, are
a subset of a larger, Normally-distributed population.