Voter sample

Mathematics for Computer Science
MIT 6.042J/18.062J
Binomial
Distributions &
Sampling
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.1
Don’t expect the Expectation!
Toss 101 fair coins.
How many heads do we
“expect” to see?
E[#Heads] = 50.5
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.2
Exactly the Mean?
Pr{exactly 50.5 Heads} == 0?
Pr{exactly 50 Heads} < 1/13
Pr{50.5 §1 Heads} < 1/7
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.3
Very near the mean?
Toss 1001 fair coins.
E[#Heads] = 500.5
Pr{#H = 500} = smaller (< 1/39)
Pr{#H = 500.5§1 } = still small
(< 1/19)
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.4
Very near the mean?
Toss 1001 fair coins.
Pr{#H = 500 § 1% (of # flips)}
= Pr{#H = 500 § 10} = not bad
(very close to even)
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.5
Jacob D. Bernoulli (1659 –1705)
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.6
Jacob D. Bernoulli (1659 – 1705)
Even the stupidest man---by some instinct of
nature per se and by no previous instruction
(this is truly amazing) -- knows for sure that
the more observations ...that are taken, the
less the danger will be of straying from the
mark.
---Ars Conjectandi (The Art of Guessing), 1713*
*taken from Grinstead \& Snell,
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html
Introduction to Probability, American Mathematical Society, p. 310.
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.7
Deviation from the Mean
Pr{observed value far from
expected value}
is SMALL
How small?
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.8
Weak Law of Large Numbers
An ::= Avg. of n independent trials
::= E[single trial]
?
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
{
{
{
lim  P r{ An  μ  ε}  0
far
distance
n
small
L15-1.9
Jacob D. Bernoulli (1659 – 1705)
Therefore, this is the problem which I
now set forth and make known after I
have pondered over it for twenty years.
Both its novelty and its very great
usefulness, coupled with its just as
great difficulty, can exceed in
weight and value all the remaining
chapters of this thesis.
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.10
Polling & Sampling
Estimate % contaminated fish in
Charles River?
??
Procedure: catch n fish, test each, use
% contaminated in catch as estimate
of % contaminated in whole river
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.11
Sampling Questions
Catch 100 fish; what is
probability that our estimate
is within 10% of actual %?
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.12
Model as Coin Tosses
p ::= fraction contaminated in river
test a fish: toss a coin with bias p
Catching n fish: tossing n coins
An,p ::= fraction contaminated
in sample of n
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.13
Bernoulli Trials
Fi ::= indicator for contamination
of ith fish caught
Bn,p = F1 +  + Fn
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.14
Bernoulli Trials
Bn,p = F1 +  + Fn
has binomial density
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.15
Use the Binomial PDF
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.17
Compute the exact probabilities?
For n = 100,
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.18
Probability of within 10%
Circularity: estimate p using p??
Theorem: Worst case is at p =1/2. So
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.19
Probability of within 10%
That is, probability of being
within 10% of actual % of
contaminated fish using
sample of 100 is ¸ 96%
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.20
Voter Preference
Poll n voters; let Sn be the number
that prefer Kerry to Bush.
How big should n be so that 95%
of the time, Sn/n is within 0.04 of
the actual fraction of voters who
prefer Kerry?
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.21
Voter sample
Choose n so that
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.22
Voter sample
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.23
Voter sample
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.24
Voter sample
Worst case is p = 1/2, so we want
n such that
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.25
Binomial Approximation
for  < 1/2:
Fn,1/ 2 ( n) 
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
a2
 bn
n
L15-1.26
Binomial Approximation
where
1
1
a ::

2 1  2
b :: 1  H ( )
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.27
Binomial Approximation
the Entropy Function.
(Derivation from Stirling's Formula;
in the Notes.)
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.28
The Entropy Function
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.29
Binomial Approximation
Somewhat messy formulas, but
easy to compute.
Exact answers for n  100 not easy
to compute (causes overflow).
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.30
How Big a Sample?
Use formula to find n that ensures
95% confidence: need to poll
662 voters
That's all, no matter how
many voters are registered
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.31
Team Problem
Problem 1
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.32
Binomial Approximation
We have a,b > 0 s.t.
Fn,1/ 2 ( n) 
a2
for all  < 1/2
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
 bn
n
L15-1.33
Binomial Approximation
Fn ,1/ 2 ( n)  o  2
 bn

1
for   ,
2
Cor: The (Weak) Law of Large Numbers
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.34
Distribution of An,p
1
2 p(1  p)n
p (1  p )
n
Pr{An,p=}
= 0
Copyright © Albert R. Meyer, 2004. All rights reserved.
p
May 10, 2004
1
L15-1.35
Confidence  not Probable Reality
OK, sample 662 voters and discover 300
prefer Kerry: so estimate of p is 300/662
Tempting to say
Prfjp – 300/632j · 0.04g ¸ 0.95
but that is a misstatement:
can’t talk about the probability
that p has a particular value.
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.36
Confidence  not Probable Reality
p is the actual fraction of voters
who prefer Kerry.
p is unknown,
but not a random variable!
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.37
Confidence  not Probable Reality
Our random polling method defines
a random variable.
We can talk about the probability
(confidence) that our estimate based
on polling will be correct.
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.38
Team Problem
Problem 2
Copyright © Albert R. Meyer, 2004. All rights reserved.
May 10, 2004
L15-1.39