Pascal Triangle and Bernoulli Trials

1.
Sample
Space
and
Probability
Part
IV:
Pascal
Triangle
and
Bernoulli
Trials
ECE
302
Fall
2009
TR
3‐4:15pm
Purdue
University,
School
of
ECE
Prof.
Ilya
Pollak
ConnecMon
between
Pascal
triangle
and
probability
theory:
Number
of
successes
in
a
sequence
of
independent
Bernoulli
trials
•  A
Bernoulli
trial
is
any
probabilisMc
experiment
with
two
possible
outcomes,
e.g.,
–  Will
CiMgroup
become
insolvent
during
next
12
months?
–  Democrats
or
Republicans
in
the
next
elecMon?
–  Will
Dow
Jones
go
up
tomorrow?
–  Will
a
new
drug
cure
at
least
80%
of
the
paMents?
•  Terminology:
someMmes
the
two
outcomes
are
called
“success”
and
“failure.”
•  Suppose
the
probability
of
success
is
p.
What
is
the
probability
of
k
successes
in
n
independent
trials?
Probability
of
k
successes
in
n
independent
Bernoulli
trials
•  n
independent
coin
tosses,
P(H)
=
p
Probability
of
k
successes
in
n
independent
Bernoulli
trials
•  n
independent
coin
tosses,
P(H)
=
p
•  E.g.,
P(HTTHHH)
=
p(1‐p)(1‐p)p3
=
p4(1‐p)2
Probability
of
k
successes
in
n
independent
Bernoulli
trials
•  n
independent
coin
tosses,
P(H)
=
p
•  E.g.,
P(HTTHHH)
=
p(1‐p)(1‐p)p3
=
p4(1‐p)2
•  P(specific
sequence
with
k
H’s
and
(n‐k)
T’s)
=
pk
(1‐p)n‐k
Probability
of
k
successes
in
n
independent
Bernoulli
trials
• 
• 
• 
• 
n
independent
coin
tosses,
P(H)
=
p
E.g.,
P(HTTHHH)
=
p(1‐p)(1‐p)p3
=
p4(1‐p)2
P(specific
sequence
with
k
H’s
and
(n‐k)
T’s)
=
pk
(1‐p)n‐k
P(k
heads)
=
(number
of
k‐head
sequences)
∙
pk
(1‐p)n‐k
Probability
of
k
successes
in
n
independent
Bernoulli
trials
• 
• 
• 
• 
n
independent
coin
tosses,
P(H)
=
p
E.g.,
P(HTTHHH)
=
p(1‐p)(1‐p)p3
=
p4(1‐p)2
P(specific
sequence
with
k
H’s
and
(n‐k)
T’s)
=
pk
(1‐p)n‐k
P(k
heads)
=
(number
of
k‐head
sequences)
∙
pk
(1‐p)n‐k
An
interesMng
property
of
binomial
coefficients
Since P(zero H's) + P(one H) + P(two H's) + … + P(n H's) = 1,
n  
n k
it follows that ∑  p (1− p) n−k = 1.
k
k= 0  
Another way to show the same thing is to realize that
n  
n k
n−k
n
n
p
(1−
p)
=
(
p
+
(1−
p))
=
1
= 1.
∑ k 
k= 0
Binomial
probabiliMes:
illustraMon
Binomial
probabiliMes:
illustraMon
Comments
on
binomial
probabiliMes
and
the
bell
curve
•  Summing
many
independent
random
contribuMons
usually
leads
to
the
bell‐shaped
distribuMon.
•  This
is
called
the
central
limit
theorem
(CLT).
•  We
have
not
yet
covered
the
tools
to
precisely
state
the
CLT,
but
we
will
later
in
the
course.
•  The
behavior
of
the
binomial
distribuMon
for
large
n
shown
above
is
a
manifestaMon
of
the
CLT.
InteresMngly,
we
get
the
bell
curve
even
for
asymmetric
binomial
probabiliMes
This
tells
us
how
to
empirically
esMmate
the
probability
of
an
event!
•  To
esMmate
the
probability
p
based
on
n
flips,
divide
the
observed
number
of
H’s
by
the
total
number
of
experiments:
k/n.
•  To
see
the
distribuMon
of
k/n
for
any
n,
simply
rescale
the
x‐axis
in
the
distribuMon
of
k.
•  This
distribuMon
will
tell
us
–  What
we
should
expect
our
esMmate
to
be,
on
average,
and
–  What
error
we
should
expect
to
make,
on
average
Note:
o  for 50 flips, the most likely outcome is the correct one, 0.8
o  it’s also close to the “average” outcome
o  it’s very unlikely to make a mistake of more than 0.2
If p=0.8, when estimating based on 1000 flips,
it’s extremely unlikely to make a mistake of
more than 0.05.
If p=0.8, when estimating based on 1000 flips,
it’s extremely unlikely to make a mistake of
more than 0.05.
•  Hence, when the goal is to forecast a two-way
election, and the actual p is reasonably far from
1/2, polling a few hundred people is very likely
to give accurate results.
If p=0.8, when estimating based on 1000 flips,
it’s extremely unlikely to make a mistake of
more than 0.05.
•  Hence, when the goal is to forecast a two-way
election, and the actual p is reasonably far from
1/2, polling a few hundred people is very likely
to give accurate results.
•  However,
o  independence is important;
o  getting a representative sample is important
(for a country with 300M population, this is
tricky!)
o  when the actual p is extremely close to 1/2
(e.g., the 2000 presidential election in Florida or
the 2008 senatorial election in Minnesota 2008),
pollsters’ forecasts are about as accurate as a
random guess.
Franken‐Coleman
elecMon
•  Franken
1,212,629
votes
•  Coleman
1,212,317
votes
•  In
our
analysis,
we
will
disregard
third
party
candidate
who
got
437,505
votes
(he
actually
makes
pre‐elecMon
polling
even
more
complicated)
•  EffecMvely,
p
≈
0.500064
ProbabiliMes
for
fracMons
of
Franken
vote
in
pre
‐elecMon
polling
based
on
n=2.5M
(more
than
all
Franken
and
Coleman
votes
combined)
•  Even though we are unlikely to make
an error of more than 0.001, this is not
enough because p-0.5=0.000064!
•  Note: 42% of the area under the bell
curve is to the left of 1/2.
•  When the election is this close, no poll
can accurately predict the outcome.
•  In fact, the noise in the voting process
itself (voting machine malfunctions,
human errors, etc) becomes very
important in determining the outcome.
EsMmaMng
the
probability
of
success
in
a
Bernoulli
trial:
summary
•  As
the
number
n
of
independent
experiments
increases,
the
empirical
fracMon
of
occurrences
of
success
becomes
close
to
the
actual
probability
of
success,
p.
•  The
error
goes
down
proporMonately
to
n1/2.
I.e.,
error
aler
400
trials
is
twice
as
small
as
aler
100
trials.
•  This
is
called
the
law
of
large
numbers.
•  This
result
will
be
precisely
described
later
in
the
course.