Empirical and Discrete Distributions

© 2009 Winton
1
Empirical and Discrete Distributions
© 2009 Winton
2
E i i l Distributions
Empirical
Di t ib ti
• An empirical distribution is one for which each
possible event is assigned a probability derived
f
from
experimental
i
l observation
b
i
– It is assumed that the events are independent and the
sum of the probabilities is 1
• An empirical distribution may represent either a
continuous or a discrete distribution
– If it represents a discrete distribution, then sampling is
done “on step”
p
– If it represents a continuous distribution, then sampling
is done via “interpolation”
© 2009 Winton
3
Discrete vs. Continuous Sampling
• The way the empirical table is described usually
d
determines
i
if an empirical
i i l distribution
di ib i is
i to be
b
handled discretely or continuously
discrete description
value probability
10
.1
20
.15
35
.4
40
.3
60
.05
continuous description
value
probability
0 – 10.1
10 – 20.15
20 – 35.4
35 – 40.3
40 – 60.05
© 2009 Winton
4
Empirical Distribution Linear Interpolation
rsample
discrete case continuous case
value prob
value
prob
10 .1
0 – 10- .1
20 .15
10 – 20- .15
35 .4
4
20 – 3535 .44
40 .3
35 – 40- .3
60 .05
40 – 60- .05
60
50
40
30
20
10
x
0
0
.55
1
• To use linear interpolation for continuous sampling, the
discrete points on the end of each step need to be
connected by line segments
– This is represented in the graph by the green line segments
– The steps are represented in blue
© 2009 Winton
5
Sampling On Step
rsample
discrete case continuous case
value prob
value
prob
10 .1
0 – 10- .1
20 .15
10 – 20- .15
35 .4
4
20 – 3535 .44
40 .3
35 – 40- .3
60 .05
40 – 60- .05
60
50
40
30
20
10
x
0
0
.55
1
• In the discrete case, sampling on step is accomplished by accumulating
probabilities from the original table
– For x = 0.4, accumulate probabilities until the cumulative probability
exceeds 0.4
– rsample is the event value at the point this happens
• the cumulative probability 0.1+0.15+0.4 is the first to exceed 0.4,
– the rsample value is 35
© 2009 Winton
6
Sampling By Linear Interpolation
rsample
discrete case continuous case
value prob
value
prob
10 .1
0 – 10- .1
20 .15
10 – 20- .15
35 .4
4
20 – 3535 .44
40 .3
35 – 40- .3
60 .05
40 – 60- .05
60
50
40
30
20
10
x
0
0
.55
1
• In the continuous case, the end points of the probability accumulation are needed
– For x = 0.4, the values are x=0.25 and x=0.65 which represent the points
(.25,20) and (.65,35) on the graph
– From basic college algebra, the slope of the line segment is
(35-20)/(.65-.25) = 15/.4 = 37.5
– slope = 37.5 = (35-rsample)/(.65-.4) so
rsample = 35 - (37.5.25) = 35 – 9.375 = 25.625
© 2009 Winton
Discrete Distributions
• Historical perspective behind the names used with discrete
distributions
– James Bernoulli (1654-1705) was a Swiss mathematician whose book
Ars Conjectandi
j
(published pposthumously
(p
y in 1713)) was the first
significant book on probability
• It gathered together the ideas on counting, and among other things
provided a proof of the binomial theorem
– Siméon-Denis Poisson (1781-1840) was a professor of mathematics at
the Faculté des Sciences whose 1837 text Recherchés sur la probabilité
des jugements en matière criminelle et en matière civile introduced the
discrete distribution now called the Poisson distribution
• Keep in mind that scholars such as these evolved their theories
with the objective of providing sophisticated abstract models of
p
real-world phenomena
– An effort which, among other things, gave birth to the calculus as a
major modeling tool
7
© 2009 Winton
I. Bernoulli Distribution
Bernoulli events, trials, and processes
• Bernoulli event
– One for which the probability the event occurs is p and
the probability the event does not occur is 11-p
p
• The event is has two possible outcomes (usually viewed as
success or failure) occurring with probability p and 1-p,
respectively
• Bernoulli trial
– An instantiation of a Bernoulli event
• Bernoulli process
– A sequence of Bernoulli trials where probability p of
success remains the same from trial to trial
– This means that for n trials,, the pprobabilityy of n
consecutive successes is pn
8
© 2009 Winton
9
Bernoulli Distribution
• Bernoulli distribution
– Given by the pair of probabilities of a Bernoulli event
• Too simple to be interesting in isolation
– Implicitly used in “yes-no” decision processes where the
choice occurs with the same pprobabilityy from trial to trial
• e.g., the customer chooses to go down aisle 1 with
probability p
– It can be cast in the same kind of mathematical notation
used to describe more complex distributions
© 2009 Winton
10
Bernoulli Distribution pdf
pz((1-p)
p)1-z for z = 0,1
,
p(z) =
0
p(z)
otherwise
1

Note: this seemingly
bizarre description is
chosen since it mimics
the case of the
binomial distribution
when n=1 
p

1-p
0
1
z
• The expected value of the distribution is given by
E(X)
( )  (1
( -p))  0  p 1  p
• The standard deviation is given by
σ  (1 - p)(0 - p) 2  p(1 - p) 2  p  (1-p)
• Notational overkill, but useful for understanding other distributions
© 2009 Winton
11
Sampling
• Sampling from a discrete distribution,
distribution requires a function that
corresponds to the distribution function of a continuous distribution f
x
given by
F(x)
( )   f(z)dz
( )

• This is given by the mass function F(x) of the distribution, which is
the stepp function obtained from the cumulative (discrete)
(
) distribution
x
given by the sequence of partial sums
p(z)

0
• For
F th
the B
Bernoulli
lli distribution,
di t ib ti F(x)
F( ) has
h the
th construction
t ti
0
for -  x < 0
F(x)
( ) = 1-pp for 0  x < 1
1
for x  1
– A non-decreasing function (an so can be sampled on step)
© 2009 Winton
12
Graph of F(x) and Sampling Function
F(x)
• Distribution function F(x)
1

p

1-p
0
• Sampling function
l
– for random value x drawn from [0,1), rsample
0 if 0  x < 1-p
rsample =
1
1 if 1-pp  x < 1
– demonstrates that sampling from a
discrete distribution, even one as
simple as the Bernoulli distribution,
can be viewed
ie ed in the same manner as
for continuous distributions
0
z
1
1-p
1
x
© 2009 Winton
13
II Bi
II.
Binomial
i l Distribution
Di t ib ti
• Th
The B
Bernoulli
lli distribution
di ib i represents the
h success or failure
f il
of a single Bernoulli trial
• The Binomial Distribution represents the number of
successes and failures in n independent Bernoulli trials for
some given value of n
– For example, if a manufactured item is defective with probability
p, then the binomial distribution represents the number of
successes and failures in a lot of n items
• Unlike a barrel of apples, where one bad apple can cause
others to be bad
– Sampling
p g from this distribution ggives a count of the number of
defective items in a sample lot
– Another example is the number of heads obtained in tossing a coin
n times
© 2009 Winton
14
Binomial
o a Theorem
eo e (1)
( )
• The binomial distribution gets its name from the
bi
binomial
i l theorem
h
n
 n  k n -k
n
n!
n
a  b      a b where   
0 k
 k  k!(n - k)!
• It is worth pointing out that if a = b = 1,
1 this becomes
n
n
n
n
1 1  2   
0 k
• If S is a set of size nn, the number of k element subsets
of S is given by
n
n!
  
k!(n
( - k)!  k 
© 2009 Winton
15
Binomial
o a Theorem
eo e (2)
( )
• This formula is the result of a simple counting
analysis:
l i there
h are
n!
n  (n - 1)  ...  (n - k  1) 
(n - k)!
ordered ways to select k elements from n
– n ways to
t choose
h
the
th 1st item,
it
(n-1)
( 1) th
the 2nd , and
d so on
– Any given selection is a permutation of its k elements, so
the underlying subset is counted k! times
– Dividing by k! eliminates the duplicates
• C
Consequence: 2n counts
t the
th total
t t l number
b off subsets
b t off
an n-element set
© 2009 Winton
16
Binomial Distribution pdf (1)
• For n independent
p
Bernoulli trials the ppdf of the
binomial distribution is given by
p(z) =
n z
  p (1  p) n  z for z  0, 1, ..., n
z
0 otherwise
h
i
• By the binomial theorem
n
 p(z)  (p  (1 - p))
0
verifying that p(z) is a pdf
n
1
© 2009 Winton
17
Binomial Distribution pdf (2)
• Wh
When choosing
h i z items
it
from
f
among n items
it
with
ith
probability p for an item being defective, the term
n z
  p (1  p) n  z
z
represents the probability that z are defective (and
concomitantly that (n
(n-z)
z) are not defective)
© 2009 Winton
18
E
Expected
t d Value
V l andd Variance
V i
• E(X) = np for a binomial distribution on n items
where probability of success is p
term in common
• The calculation is accomplished by
n
n z
n!
n -z
E(X)   p(z i )  z i     p (1  p)  z  
p z (1  p) n -z  z
1
0 z
1 z! (n - z)!
n
n
 n - 1 z -1
 n - 1 z -1
n -z
 p (1 - p)  np  np   
 p (1 - p) n -z  np(p  1 - p) n -1  np
  
1  z -1
1  z -1
n
n
present in every summand
apply the binomial theorem to this
((note that n-z = ((n-1)) - ((z-1)) )
• It can be similarly shown that the standard
deviation is npp  (1  p)
© 2009 Winton
19
Graph of pdf and Mass Function F(x)
• The binomial distribution with
n=10 and p=0.7 appears as
f ll
follows:
(
(mean
= np = 7)
• Its
It corresponding
di mass function
f ti
F(z) is given by
© 2009 Winton
20
Sampling
Sa
p g Function
u c o
rsample 10
9
8
7
6
5
4
3
2
1
0
0
1
x
• A ttypical
i l sampling
li tactic
t ti is
i to
t accumulate
l t the
th sum
rsample
( )
 p(z)
0
increasing rsample until the sum's value exceeds the random value
between 0 and 1 drawn for x
– The final rsample summation limit is the sample value
– In contrast to a continuous pdf described by some formula, the
function
u ct o for
o a finite
te discrete
d sc ete pd
pdf has
as to be given
g ve in its
ts relational
e at o a
form by a table of pairs, which in turn mandates the kind of
"search" algorithm approach used above to obtain rsample
© 2009 Winton
21
III Poisson Distribution
III.
(values z = 0, 1, 2, . . .)
• With a little work it can be shown that the Poisson
di ib i is
distribution
i the
h limiting
li i i case off the
h binomial
bi
i l
distribution using p= /n  0 as n  
• The expected value E(X) = 
• The standard deviation is λ
z
λ
• The pdf is given by
λ e
p(z)) 
p(
z!
© 2009 Winton
III. Poisson Distribution
history and use
• This distribution dates back to Poisson's 1837 text
regarding
di civil
i il andd criminal
i i l matters, in
i effect
ff scotching
hi the
h
tale that its first use was for modeling deaths from the
kicks of horses in the Prussian army
• In addition to modeling the number of arrivals over some
interval of time the distribution has also been used to
model the number of defects on a manufactured article
– Recall the relationship to the exponential distribution; a Poisson
process has exponentially distributed interarrival times
• In general the Poisson distribution is used for situations
where
h the
th probability
b bilit off an eventt occurring
i is
i very small,
ll
but the number of trials is very large (so the event is
expected to actually occur a few times)
– Less cumbersome than the binomial distribution
22
© 2009 Winton
23
Graph
p of pdf
p and Sampling
p g Function
p(z)
• With  = 2
mean
0.3
0.25
0.2
rsample
0.15
9
0.1
8
0.05
7
0
z
0 1
6
5
4
3
2
1
0
x
0
1
2 3
4 5 6
7 8 9 10 . . .
© 2009 Winton
24
IV Geometric Distribution
IV.
• The geometric distribution gets its name from the geometric series:

1
for r  1,  r n 
,
1- r
0

r
n
n

r

,
0
2
(1 - r)

 (n  1)  r n 
0
1
(1 - r) 2
various
i
flavors
fl
off the
h geometric
i series
i
• The pdf for the geometric distribution is given by
p(z) =
(1  p) z-1  p for z  1,
1 2,
2 ...
otherwise
• The geometric distribution is the discrete analog of the exponential
distribution
– Like the exponential distribution, it is "memoryless"; i.e.,
P(X > a+b | X > a) = P(X > b)
– The geometric distribution is the only discrete distribution with this
property just as the exponential distribution is the only continuous
one behaving in this manner
© 2009 Winton
25
Memoryless
e o y ess Property
ope y
• Being memoryless is characterized by
P(X > a+b | X > a) = P(X > b)
• The interpretation is that if we haven’t had an arrival in “a” seconds, the
probability of an arrival in the next “b” seconds is the same as if the “a”
seconds had not elapsed
– Only the geometric and exponential distributions have this property
• This is NOT a property associated with independent events,
events since among
other things there are three events involved: X > a+b, X > a, and X > b
– Recall that A and B are independent if P(A ∩ B) = P(A)P(B)
– So long as P(B) is non-zero, P(A | B) = P(A ∩ B )/P(B)
• If effect, if A and B are independent events and B has already
occurred then by this equation P(A) is unchanged; i.e.,
occurred,
i e as one would
expect P(A|B) = P(A) when A and B are independent
– The events X > a + b and X > a are certainly NOT independent since
independence would infer P(X > a+b | X > a) = P(X > a+b) – NOT!
© 2009 Winton
26
Expected Value and Standard
Deviation
• With expected value is given by

1
1
E(X)   z(1 - p)  p  p 

2
(1 - 1  p)
p
1
z -1
– by applying the 3rd form of the geometric series
• The standard deviation is ggiven byy
1 p
p
© 2009 Winton
27
Graph
• A plot of the geometric distribution with p = 0.3
0 3 is given by
p(z)
0 35
0.35
0.3
mean
0 25
0.25
0.2
0 15
0.15
0.1
0 05
0.05
z
0
1
2
3
4
5
6
7
8
9
10
© 2009 Winton
28
Utili ti
Utilization
• A ttypical
i l use off the
th geometric
t i distribution
di t ib ti is
i for
f modeling
d li
the number of failures before the first success in a
sequence of independent Bernoulli trials
• This is the scenario for making sales
– Suppose
pp
that the probability
p
y of makingg a sale is 0.3
– Then
• p(1) = 0.3 is the probability of success on the 1st try
• p(2) = (1-p)p = 0.70.3 = 0.21 which is the probability of
failing on the 1st try (with probability 1-p) and succeeding on
y p)
the 2nd ((with pprobability
• p(3) = (1-p)(1-p)p = 0.15 is the probability that the sale takes 3
tries, and so forth
– A random
d
sample
l from
f
the
h di
distribution
ib i represents the
h number
b off
attempts needed to make the sale