EDSA: Lecture 2 • Further info on probability distributions

•
EDSA: Lecture 2
•
Further info on probability distributions
•
We typically use probability distributions because they work (i.e., they fit lots of data in real world)
•
Remember Discrete vs Random
•
Bernoulli Random Variables
–
–
•
Variable with only two possible outcomes
•
Success (S) or death (D) etc.
•
Failure (F) or alive (A) etc.
Examples
•
Heads or Tails
•
Newborn baby sex (male or female)…ok not a good example
•
Live or die (although funny story about this…)
Lets define binomial random variable X to be the # of successful results in n independent
Bernoulli trials (i.e. replicated)
–
Let p = probability of successes
–
Let q = 1-p = probability of failure
–
If n=1, then binomial random variable X is = to Bernoulli trial
•
What is the probability of obtaining x successes in n trials?
•
Example
–
What is the probability of obtaining 2 tails from a coin that was tossed 5 times?
5
P(TTHHH) = (1/2) = 1/32
•
But there are more possibilities:
TTHHH THTHH THHTH THHHT
HTTHH HTHTH HTHHT
HHTTH HHTHT
HHHTT
P(2 tails) = 10 × 1/32 = 10/32
•
In general, if trials result in a series of success and failures,
FFSFFFFSFSFSSFFFFFSF…
Then the probability of x successes in that order is
P(x)
=qqpq
n–x
=p q
x
•
However, if order is not important, then
where
is the number of ways to obtain x
successes in n trials, and i! = i  (i – 1)  (i – 2)  …  2  1
•
Where X ~ Bin(n,p)
–
P(X) =
n!
x!(n-x)!
–
where P(X) is the probability of exactly x successes, n is the # of trials, p is the probability
of success in any one trial
x
n-x
•
The probability of obtaining x successes and (n-x) failures is p (1-p)
•
n!/(x!(n-x)!) is the number of ways to obtain x success in n trials, accounts for controls double
counting, e.g. (1,0)=(0,1)
•
Example
•
We shall use the tails example
•
We will generate X ~ Bin (11, 0.5)
•
Now lets play in excel.
–
Muck with sample size and n.
–
What happens to the distributions when p ≠ 0.5
–
Lets do this in excel.
–
(8, 11) (1, 11) (9, 11) (3, 11) (5, 11)
•
We have now purposefully skewed the data:
•
Skew: describes distribution asymmetry
•
Two peaks
•
Leptukurtic: more observations (scores) in tails than predicted
•
Platykurtic: less observations (scores) in tails than predicted
•
Biology deals with tail probabilities
•
What is the probability of obtaining 9 or 10 tails out of 19 coin flips?
–
Answer = 17.6%
–
All P-values for statistical tests are tail probabilities
•
Other types of distribution
•
Poisson -- # of instances of an event recorded in a sample of fixed intervals or areas.
–
Usually distribution of rare events or counts
•
Poisson distribution is applied where random events in space or time are expected to occur
•
Deviation from Poisson distribution may indicate some degree of non-randomness in the events
under study
•
Investigation of cause may be of interest
–
•
X ~ Poisson(λ)
P(x) = λ
x
x!
•
–
where λ is the average value of the # of occurrences of the event in each sample
–
e is a constant, the base of the natural logarithm (~2.71828)
If the average # of seedlings found in a quadrat is 0.75, what are the chances a quadrat will have
4 seedlings?
–
4
P(4 seedlings) = (0.75 x e
-0.75
)/4! = 0.0062
•
Poisson Distributions
•
Example
Emission of -particles
•
Observed vs Expected Poisson
•
A unweighted arithmetic mean can be misleading
–
–
e.g. Binomial variable that can take on values of 0 and 50 with probabilities of 0.1 and
0.9, respectively
•
Arithmetic mean = (0 + 50)/2 = 25
•
But, the most probably value is 50
•
Arithmetic mean is not be useful for skewed distributions
•
Expected value of a discrete random variable X that can take on values a1
…an, with probabilities p1….pn, respectively, and accurately describes the
average or central tendency of the distribution
E(X) = ∑aipi = a1p1 + a2p2 +...+ anpn
•
Limits of Central Tendencies
•
Averages, or central tendencies, give no insight into spread, or variation. Need both.
–
e.g. Binomial variable that can take on -10 or 10 vs one that can take on -1000 or +1000,
each with a p=0.5
•
Both have E(X)=0, but in neither distribution will 0 ever be generated
•
The observed values in the first case are closer to E(X) than in the second
•
•
Variance of a random variable =
2
2
2
–
σ (X) = E[X – E(X)] = ∑ pi (ai - ∑ aipi)
–
E(X) is the expected value of X, the ai’s are the different possible values of X, each of
which occurs with probability pi
Interpretation
–
We calculate E(X), subtract this from X, and square this difference
–
Because there are many possible values of X, we repeat this process for each value
–
Each squared deviate is weighted by its probability of occurrence, pi, and then they are
all summed
•
Binomial vs Poisson
•
Binomial depends on both n and p and Poisson depends on only λ
•
Binomial is always bounded by 0 and n, whereas the right-hand tail of the Poisson is not bounded
•
E(X) = σ (X) for Poisson whereas E(X) typically does not equal σ (X) for a binomial
•
Normal Distribution
Length of Fish
•
A sample of rock cod in Monterey Bay suggests that the mean length of these fish is  = 40 in.
2
and  = 4 in.
•
Assume that the length of rock cod is a normal random variable
•
If we catch one of these fish in Monterey Bay,
2
–
What is the probability that it will be at least 41 in. long?
–
That it will be no more than 42 in. long?
–
That its length will be between 36 and 39 inches?
2