Independence

Mathematics for Computer Science
MIT 6.042J/18.062J
Sampling &
Confidence
Albert R Meyer, December 9, 2009
lec 14W.1
Sampling
Estimate % contaminated fish in
Charles River?
??
Procedure: catch n fish, test each,
use %contaminated in catch as
estimate of %contaminated in
whole river
Albert R Meyer, December 9, 2009
lec 14W.2
Sampling Questions
Catch 500 fish; what is
probability that estimate
is within 0.1 of the actual
fraction?
Albert R Meyer, December 9, 2009
lec 14W.3
Model as Coin Tosses
p ::= fraction contaminated in river
test a fish
catch n fish
toss bias p coin
toss n coins
An ::= fraction contaminated
in the sample of n
Albert R Meyer, December 9, 2009
lec 14W.4
Pairwise Independent Sampling

2

 
11 1/2
PrPrA500

An --p  0.1

  
500
n  0.1 
n = 500,
 = p,  = 0.1
worst  =
1
2
Albert R Meyer, December 9, 2009
2
Pairwise Independent Sampling

2

 
11 1/2
PrPrA500

An --p  0.1

  
500
n  0.1 
n = 500,

 = p,  = 0.1

Pr A500 - p  0.1  0.95
Albert R Meyer, December 9, 2009
2
Confidence in our estimate
With probability 0.95 our
estimated fraction will be
within 0.1 of the actual
fraction of contaminated
fish in the whole river.
Albert R Meyer, December 9, 2009
lec 14W.7
Sampling using Binomial PDF
Better estimate:
A n is
{
B n,p
Pr A n - p £ 
n
}
{
}
= Pr B n,p - np £  n
Albert R Meyer, December 9, 2009
lec 14W.8
Sampling using Binomial PDF
Better estimate:
n = 500,
{
 = 0.06
( )} }
- np£ £0.06
30
n 500
Pr Pr
B 500,pB- 500p
{
n,p
Albert R Meyer, December 9, 2009
lec 14W.10
Sampling using Binomial PDF
How to bound this probability
when we don’t know p?
Lemma: Pr B
{
}
np
£

n
n,p
is min when p = 1/2
Albert R Meyer, December 9, 2009
lec 14W.11
Sampling using Binomial PDF
Pr 220 £ B500,1/2 £ 280
{
}
Pr B 500,p - 500p
£ 0.06
500
250
30
1/2
{
( ) }
Albert R Meyer, December 9, 2009
lec 14W.12
Sampling using Binomial PDF
Pr 220 £ B500,1/2 £ 280
{
}
Ê
ˆ
280 Á
500˜˜ - 500
= Â i= 220 Á
2
˜
Á
˜
Á
Ë i ˜¯
0.99
Albert R Meyer, December 9, 2009
lec 14W.13
Confidence in our estimate
We can actually be 99%
confident that our
estimated fraction is with
0.06 of the true fraction
of contaminated fish in the
whole river.
Albert R Meyer, December 9, 2009
lec 14W.14
Confidence
not Probable Reality
Now suppose we sample 500 fish and
discover 230 are contaminated.
So we estimate p is 230/500 = 0.46
It’s tempting to say
“the probability that
p = 0.46± 0.06
is at least 0.99”
--technically wrong!
Albert R Meyer, December 9, 2009
lec 14W.15
Confidence
p is the actual fraction of
bad fish in the river.
p is unknown,
but not a random variable!
Albert R Meyer, December 9, 2009
lec 14W.16
Confidence
The possible outcomes of our
sampling procedure is a random
variable. We can say that the
“probability that our sampling
process will yield a fraction
that is ± 0.06 of the
true fraction at least 0.99”
Albert R Meyer, December 9, 2009
lec 14W.17
Confidence
for simplicity we say that
p = 0.46 ± 0.06
at the
99% confidence level
Albert R Meyer, December 9, 2009
lec 14W.20
Confidence
Moral: when you are told that
some fact holds at a high
confidence level, remember
that a random experiment
lies behind this claim. Ask
yourself “what experiment?”
Albert R Meyer, December 9, 2009
lec 14W.21
Team Problems
Problems
1&2
Albert R Meyer, December 9, 2009
lec 14W.22