probability distribution

Probability Distributions
Marina Santini
[email protected]
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Spring 2016
Acknowledgements
• Wikipedia and many math and statistics websites.
• Tamhane A. and Dunlop D. (2000). Statistics and Data
Analysis. Prentice Hall.
• Ross S. (2014). A first course in Probability, Pearson, 9th
edition.
• Kracht M. (2005). Introduction to Probability Theory and
Statistics for Linguistics. Dpt of Linguistics, UCLA
• Mollevan J. (2008). Introduction to Probability Theory and
Statistics.
• Introduction to STAT 414/415: PennState Uni
– https://onlinecourses.science.psu.edu/stat414/node/3
2
Required Reading for this lecture
• Handouts
– (see course website:
http://stp.lingfil.uu.se/~santinim/math_stats/2016/Math
4LTechnologists.html .
• Lane (2016). Online Statistics Education: A Multimedia
Course of Study. pp. 203-211.
3
Outline of the section
• Definition of probability distribution
• Discrete probability distributions
–
–
–
–
–
–
Bernoulli
Binomial
Multinomial
Hypergeometric
Poisson
Zipf’s Distribution
• Continuous probability distributions
– Uniform
– Normal
– Standard Normal
4
Purpose of the section
• The purpose of this section is to start memorizing
the names of some very common probability
distributions.
• Later in the course we will talk more extensively
about some of them and work with statistical
formulas.
• For the time being, it is important to understand
their characteristics, and we will not go into
mathematical details.
5
What’s a distribution?
• Simply put, a distribution is...
– an arrangement of values of a variable showing
their observed or theoretical frequency of
occurrence
– Distributions can be plotted
Ex: a plotted distribution
of some values
6
Probability Distribution
• A probability distribution can be
seen as a table or an equation
that links each outcome of a
statistical experiment with its
probability of occurrence.
• Consider a simple experiment in
which we flip a coin twice. An
outcome of the experiment might
be the number of heads that we
see in two coin flips.
This table associates
each possible outcome
with its probability.
Suppose the random variable X is defined as the number of
heads that result from two coin flips. Then, this table represents
the probability distribution of the random variable X.
7
Random Variables and Probability Distributions (i)
Repetition:
• A variable is a symbol (A, B, x, y, etc.) that can take on
any of a specified set of values.
• When the value of a variable is the outcome of
a statistical experiment, that variable is a random
variable.
• X represents the random variable X.
• P(X) represents the probability of X.
• P(X = x) refers to the probability that the random
variable X is equal to a particular value, denoted by x.
– Ex: P(X = 1) refers to the probability that the random
variable X is equal to 1.
8
Random Variables and Probability
Distributions (ii)
What’s the relationship betw random variables and a
probability distribution? See this example:
•
•
•
•
•
•
Suppose you flip a coin twice. This simple statistical
experiment can have four possible outcomes: HH, HT, TH,
and TT.
Now, let the variable X represent the number of Heads
that result from this experiment.
The variable X can take on the values 0, 1, or 2. In this
example, X is a random variable; because its value is
determined by the outcome of a statistical experiment.
A probability distribution is a table or an equation (that
can be plotted) that links each outcome of a statistical
experiment with its probability of occurrence.
The table we saw before associates each outcome with
its probability.
This table is an example of a probability distribution:
represents the probability distribution of the random
variable X.
9
Cumulative Probability Distributions
• A cumulative probability refers
to the probability that the value
of a random variable falls within
a specified range.
P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75
Ex: If we flip a coin twice,
we might ask: What is the
probability that the coin
flips would result in one or
fewer heads? The answer
would be a cumulative
probability. It would be the
probability that the coin flip
experiment results in zero
heads plus the probability
that the experiment results
in one head.
10
Probability Distribution: Definition
• A statistical function that describes all the
possible values and probabilities that a
random variable can take within a given range.
11
Distributions have names...
• ...and are characterized by different behaviours
• Discrete probability distributions
–
–
–
–
–
–
–
Bernoulli
Binomial
Multinomial
Hypergeometric
Poisson
Zipf’s Distribution
etc.
• Continuous probability distributions
–
–
–
–
Uniform
Normal
Standard Normal
etc.
12
Repetition: p.m.f. and p.d.f.
• Probability mass function...
– … is the assignment of a probability to each
possible outcome. this outcome is discrete
random variable.
• Probability density function
– The density of a continuous random variable is
a function that describes the likelihood for the
random variable to take on a given value.
13
break
14
The Probability Distribution of discrete
random variables
• … is any table, graph, or function that takes
each possible value and the probability of that
value.
• Note: The total of all probabilities across the
distribution must be 1, and each individual
probability must be between 0 and 1,
inclusive.
15
Bernoulli Distribution
• A rv that can take only 2 values, say 0 and 1, is called Bernoulli rv.
• The Bernoulli distribution is a useful model for dichotomous (= that
has 2) outcomes. Ex: head and tail, female or male, success or
failure
• An experiment with a dichotomous outcome is called a Bernoulli
trial.
• The Bernoulli distribution takes value 1 with probability p and value
0 with probability q = 1 - p.
• Dichotomous=divided into 2 parts
16
Binomial Distribution
• If there is a fixed number n of trials that are
independent and each trial has the same
probability p of success, then the sum of these
Bernoulli trials is referred to as binomial
distribution.
• Ex: the distribution of the number of
successes in a series of independent Yes/No
questions.
17
Multinomial distribution
• … can be used to compute the probabilities in situations in which
there are more than two possible outcomes.
• For example, suppose that two chess players had played numerous
games and it was determined that the probability that Player A
would win is 0.40, the probability that Player B would win is 0.35,
and the probability that the game would end in a draw is 0.25.
• The multinomial distribution can be used to answer questions such
as: "If these two chess players played 12 games, what is the
probability that Player A would win 7 games, Player B would win 2
games, and the remaining 3 games would be drawn?"
18
Hypergeometric Distribution
• … is used to calculate probabilities when sampling without replacement.
– When the population is finite, sampling without replacement creates
dependence among successive Bernuolli trials, and the probability of success
changes as successive items are drawn.
– In this case we have to derive a new distribitution that is called
hypergeometric.
Ex:
• suppose you first randomly sample one card from a deck of 52.
• Then, without putting the card back in the deck you sample a second and
then (again without replacing cards) a third.
• Given this sampling procedure, what is the probability that exactly two of
the sampled cards will be aces (4 of the 52 cards in the deck are aces).
19
Poisson Distribuion
• … is a limiting form of the binomial distribution.
• The Poisson distribution can be used to model the
number of occurrences of a rare event (eg. the number
o defective screws in a car), when the number of
opportunities for the event is very large, but the
probability that the event occurs is very small.
• Ex: The mean number of defective products produced
in a factory in one day is 21. What is the probability
that in a given day there are exactly 12 defective
products?
20
Zipf Distribution
• The Zipf distribution (sometimes referred to as
the zeta distribution) is a discrete distribution
commonly used in linguistics.
• A simple description of data that follows a Zipf
distribution is that they have few elements
that score very high a medium number of
elements with middle-of-the-road scores a
huge number of elements that score very low
21
Example
Zipf distributions have been shown to characterize use of words in a
natural language (like English) and the popularity of library books.
• Typically a language has a few words ("the", "and", etc.) that are used
extremely often, and a library has a few books that everybody wants
to borrow (current bestsellers)
• a language has quite a lot of words ("dog", "house", etc.) that are
used relatively much, and a library has a good number of books that
many people want to borrow (crime novels and such)a language has
an abundance of words that are almost never used, and a library has
piles and piles of books that are only checked out every few years
(reference manuals for Apple II word processors, etc.)
• Much available data suggests that Web use follows a Zipf’s
distribution
22
Typical Zipf’s ditribution shapes
23
break
24
The Probability Distribution of
continuous random variables
• A random variable X is continuous if its set of
possible values is an entire interval of
numbers.
• In other words, a rv is continuous if it can take
on any real value in an interval.
25
Uniform distribution
• A uniform distribution arises in situations
where all continuous values are equally likely
over an interval.
• Ex: continuous random variable X restricted to
a finite interval [a, b] and has f(x) has constant
value over the interval.
26
Normal Distribution (Gaussian) Distribution
• The normal distribution is used to model
many real-life phenomena such as
measurements of body pressure, weight, etc.
• A large body of statistics is based on the
assumption that data follows a normal
distribution.
27
Standard normal distribution
• The standard normal distribution is a special
case of the normal distribution. It is the
distribution that occurs when a normal
random variable has a mean of zero and a
standard deviation of one.
28
End of the section
29