Statistics NatSci102, Professors Rieke Due April 7, 2010 Lots of numbers we encounter might come out differently if we could determine them again. For example, Derrick Williams made 158 free throws in 232 tries this basketball season. If he has the same number of tries next season, it is not likely that he will make exactly 158 of them again. From 1997 to 2000 the number of fatal auto accidents in the US was 37324, 37107, 37140, and 37526. We should not believe that this means drivers were more careful in 1998 and 1999 than in the previous and following years – we expect changes in the accident rate due to good and bad luck. We can estimate these changes accurately if they follow “normal” statistics. This term means: 1.) the events occur completely randomly with no bias; and 2.) each event is independent of each other event. That is, we have to assume that Williams does not get down on himself and miss more free throws if he feels he is having a bad streak – he approaches each shot with a clean mind and equal determination that he will make it. Although complex mathematics can modify the rules a little, the square root law describes normal statistics pretty well in nearly all situations. It states that if I have a likely number of successes of N, then the “uncertainty” in N is the square root of N. For example, if I toss a coin 100 times, then the likely number of times it will come up heads is 50 times and the uncertainty is 7 (the square root of 50 is very close to 7). What does “uncertainty” mean? It means that the number of heads will be between 50-7 = 43 and 50+7 = 57 two thirds of the time, so if I were to toss the coin 100 times for sixty experiments (6000 tosses total), then I expect the number of heads to be inside the 43 – 57 range for 40 times (2/3 X 60) and OUTSIDE that range for 20 (what’s left, 1/3 X 60). Suppose I want to be more sure. The result is expected to be within two times the square root of N 19 times out of 20, 95% of the time. So I would expect to get either less than 50 – (2 X 7) = 36 heads OR more than 50 + (2 X 7) = 64 heads in three of my 60 experiments. Now let’s try some examples. The “margin of error” in a political poll is based on the square root logic, just a slightly more complex form mathematically. Suppose you took a poll of 1000 people (who did not know each other) and 490 answered the way you expected. 1. You could report that 49% were thinking correctly, but what is the margin of error? a. 3.2% b. 2.2% c. 4.5% d. Not enough information to tell 2. Your friend, who disagrees with you all the time, took a poll on the same question of 1000 different people and 530 answered the way she wanted. a. Can she claim to have more people on her side, for sure? b. Or would you say she can’t make any claims at all – the difference is still totally undecided c. Or can she claim that it is pretty likely that more people agree with her, but still there is a small chance it isn’t true? 3. Suppose that your polls were taken of groups – still a total of 1000 people, but they were organized into groups of five and the view of each group was recorded. Would this change your answers to questions 1 and 2? a. Yes b. No 4. Defend your answer to question 3: a. There would be no change because the poll is still taken with 1000 people b. But they agreed based on groups of 5, so there were only 200 independent views! c. There were only 200 different opinions that could be recorded anyway d. Both b. and c. e. There isn’t any way you could learn anything useful from groups 5. To have the same margin or error as for question 1, how many people in groups of five would need to be polled? a. 1000 b. 200 c. 5000 d. You could never get the same margin of error The following table shows the number of home runs Babe Ruth hit in each year (excluding two when he missed a lot of games) 1920 54 1921 59 1922 reduced time 1923 41 1924 46 1925 reduced time 1926 47 1927 60 1928 54 1929 46 1930 49 1931 46 1932 41 6. How many home runs did he hit on average? a. 45.52 b. 47.91 c. 49.36 d. 51.75 e. 53.02 Fill in the following table. The “homer difference” column is the number of homers he hit minus the average. The homer uncertainty is the square root of the number of homers he hit. The relative difference is the homer difference divided by the homer uncertainty, so it lets you judge quickly how large the difference is compared with the uncertainty. homer homer difference uncertainty relative difference 1920 1921 1922 reduced time 1923 1924 1925 reduced time 1926 1927 1928 1929 1930 1931 1932 7. According to the rules about uncertainties, how many years would Ruth have hit a number of homers MORE THAN ONE TIMES the uncertainty off the average? a. 2/3 x 11 ~ 7 b. 1/3 x 11 ~ 4 c. Not at all d. All the time 8. How many times did he actually come up more than one times the uncertainty off the average? a. 7 b. 4 c. Not at all d. All the time 9. Was hitting 60 homers a. Just normal luck, given how well he was hitting for the whole period 1920 – 1932? b. Something that took an exceptional boost in his hitting ability? The scores for the games of the Green Bay Packers (sorry, I am a fan of theirs) for the 2008 season are: week GB score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 score difference score uncertainty relative difference 24 48 16 21 24 27 34 16 27 37 29 31 21 16 17 31 Compute the average score and then the difference from the average for each game, just as you did for Babe Ruth. Then compute the score uncertainty as the square root of the score. Finally, fill in the last column with the score difference divided by the score uncertainty. 10. Why are the relative differences so much larger than for Babe Ruth’s homers? a. The Packers were just plain erratic b. Some weekends, the Packers played teams with really good defense c. It is the effect of Lambeau Field and the Green Bay weather on the scores d. Football points are not independent – they come in bunches of 3, 6, or even 7 11. In balancing your checking account, you find that you think it has $420, while the bank says it has only $400 and that you owe them the $20 difference. Since 20 is the square root of 400, a. you really owe the bank $20 b. the difference is within the uncertainties and they should be satisfied c. the uncertainty should be computed on the pennies, so square root (400 X 100) = 200 pennies = $2 and you owe them $18 12. Why might the example with the bank be different from the rest? a. Banks like to make a profit b. Because of the stimulus money, banks can be very liberal with funds c. Bank deposits do not qualify as random events d. Banks charge for overdrafts 13. Next year, if Derrick Williams by some chance got exactly 232 foul shots again (see introduction to this assignment) but he made 182, raising his percentage from 68 to 78%, would you conclude that he had gotten better? a. No, just luckier b. Yes, that’s ten percentage points c. Probably, but it could still be a big swing of luck 14. Nic Wise hit 137 of 156 tries or 88%, while Kyle Fogg hit 83 of 110 tries for 75.5%. Who is the better free throw shooter? a. Wise b. Fogg c. Can’t tell 15. If you had just guessed the answers for the three hourly exams in this course, you would expect 20% correct, of 10 out of 50. You get to replace the lowest score at the time of the final. What is that score likely to be? a. 10 right b. 7 right c. 4 right d. there is no way to predict
© Copyright 2026 Paperzz