Mathematics for Computer Science MIT 6.042J/18.062J Sampling & Confidence Albert R Meyer, December 9, 2009 lec 14W.1 Sampling Estimate % contaminated fish in Charles River? ?? Procedure: catch n fish, test each, use %contaminated in catch as estimate of %contaminated in whole river Albert R Meyer, December 9, 2009 lec 14W.2 Sampling Questions Catch 500 fish; what is probability that estimate is within 0.1 of the actual fraction? Albert R Meyer, December 9, 2009 lec 14W.3 Model as Coin Tosses p ::= fraction contaminated in river test a fish catch n fish toss bias p coin toss n coins An ::= fraction contaminated in the sample of n Albert R Meyer, December 9, 2009 lec 14W.4 Pairwise Independent Sampling 2 11 1/2 PrPrA500 An --p 0.1 500 n 0.1 n = 500, = p, = 0.1 worst = 1 2 Albert R Meyer, December 9, 2009 2 Pairwise Independent Sampling 2 11 1/2 PrPrA500 An --p 0.1 500 n 0.1 n = 500, = p, = 0.1 Pr A500 - p 0.1 0.95 Albert R Meyer, December 9, 2009 2 Confidence in our estimate With probability 0.95 our estimated fraction will be within 0.1 of the actual fraction of contaminated fish in the whole river. Albert R Meyer, December 9, 2009 lec 14W.7 Sampling using Binomial PDF Better estimate: A n is { B n,p Pr A n - p £ n } { } = Pr B n,p - np £ n Albert R Meyer, December 9, 2009 lec 14W.8 Sampling using Binomial PDF Better estimate: n = 500, { = 0.06 ( )} } - np£ £0.06 30 n 500 Pr Pr B 500,pB- 500p { n,p Albert R Meyer, December 9, 2009 lec 14W.10 Sampling using Binomial PDF How to bound this probability when we don’t know p? Lemma: Pr B { } np £ n n,p is min when p = 1/2 Albert R Meyer, December 9, 2009 lec 14W.11 Sampling using Binomial PDF Pr 220 £ B500,1/2 £ 280 { } Pr B 500,p - 500p £ 0.06 500 250 30 1/2 { ( ) } Albert R Meyer, December 9, 2009 lec 14W.12 Sampling using Binomial PDF Pr 220 £ B500,1/2 £ 280 { } Ê ˆ 280 Á 500˜˜ - 500 =  i= 220 Á 2 ˜ Á ˜ Á Ë i ˜¯ 0.99 Albert R Meyer, December 9, 2009 lec 14W.13 Confidence in our estimate We can actually be 99% confident that our estimated fraction is with 0.06 of the true fraction of contaminated fish in the whole river. Albert R Meyer, December 9, 2009 lec 14W.14 Confidence not Probable Reality Now suppose we sample 500 fish and discover 230 are contaminated. So we estimate p is 230/500 = 0.46 It’s tempting to say “the probability that p = 0.46± 0.06 is at least 0.99” --technically wrong! Albert R Meyer, December 9, 2009 lec 14W.15 Confidence p is the actual fraction of bad fish in the river. p is unknown, but not a random variable! Albert R Meyer, December 9, 2009 lec 14W.16 Confidence The possible outcomes of our sampling procedure is a random variable. We can say that the “probability that our sampling process will yield a fraction that is ± 0.06 of the true fraction at least 0.99” Albert R Meyer, December 9, 2009 lec 14W.17 Confidence for simplicity we say that p = 0.46 ± 0.06 at the 99% confidence level Albert R Meyer, December 9, 2009 lec 14W.20 Confidence Moral: when you are told that some fact holds at a high confidence level, remember that a random experiment lies behind this claim. Ask yourself “what experiment?” Albert R Meyer, December 9, 2009 lec 14W.21 Team Problems Problems 1&2 Albert R Meyer, December 9, 2009 lec 14W.22
© Copyright 2024 Paperzz