Statistics NatSci102, Professors Rieke Due April 7, 2010 Lots of

Statistics
NatSci102, Professors Rieke
Due April 7, 2010
Lots of numbers we encounter might come out differently if we could determine them
again. For example, Derrick Williams made 158 free throws in 232 tries this basketball
season. If he has the same number of tries next season, it is not likely that he will make
exactly 158 of them again. From 1997 to 2000 the number of fatal auto accidents in the
US was 37324, 37107, 37140, and 37526. We should not believe that this means drivers
were more careful in 1998 and 1999 than in the previous and following years – we expect
changes in the accident rate due to good and bad luck. We can estimate these changes
accurately if they follow “normal” statistics. This term means: 1.) the events occur
completely randomly with no bias; and 2.) each event is independent of each other event.
That is, we have to assume that Williams does not get down on himself and miss more
free throws if he feels he is having a bad streak – he approaches each shot with a clean
mind and equal determination that he will make it.
Although complex mathematics can modify the rules a little, the square root law
describes normal statistics pretty well in nearly all situations. It states that if I have a
likely number of successes of N, then the “uncertainty” in N is the square root of N. For
example, if I toss a coin 100 times, then the likely number of times it will come up heads
is 50 times and the uncertainty is 7 (the square root of 50 is very close to 7). What does
“uncertainty” mean? It means that the number of heads will be between 50-7 = 43 and
50+7 = 57 two thirds of the time, so if I were to toss the coin 100 times for sixty
experiments (6000 tosses total), then I expect the number of heads to be inside the 43 –
57 range for 40 times (2/3 X 60) and OUTSIDE that range for 20 (what’s left, 1/3 X 60).
Suppose I want to be more sure. The result is expected to be within two times the square
root of N 19 times out of 20, 95% of the time. So I would expect to get either less than 50
– (2 X 7) = 36 heads OR more than 50 + (2 X 7) = 64 heads in three of my 60
experiments.
Now let’s try some examples. The “margin of error” in a political poll is based on the
square root logic, just a slightly more complex form mathematically. Suppose you took a
poll of 1000 people (who did not know each other) and 490 answered the way you
expected.
1. You could report that 49% were thinking correctly, but what is the margin of error?
a. 3.2%
b. 2.2%
c. 4.5%
d. Not enough information to tell
2. Your friend, who disagrees with you all the time, took a poll on the same question of
1000 different people and 530 answered the way she wanted.
a. Can she claim to have more people on her side, for sure?
b. Or would you say she can’t make any claims at all – the difference is still totally
undecided
c. Or can she claim that it is pretty likely that more people agree with her, but still there
is a small chance it isn’t true?
3. Suppose that your polls were taken of groups – still a total of 1000 people, but they
were organized into groups of five and the view of each group was recorded. Would this
change your answers to questions 1 and 2?
a. Yes
b. No
4. Defend your answer to question 3:
a. There would be no change because the poll is still taken with 1000 people
b. But they agreed based on groups of 5, so there were only 200 independent views!
c. There were only 200 different opinions that could be recorded anyway
d. Both b. and c.
e. There isn’t any way you could learn anything useful from groups
5. To have the same margin or error as for question 1, how many people in groups of five
would need to be polled?
a. 1000
b. 200
c. 5000
d. You could never get the same margin of error
The following table shows the number of home runs Babe Ruth hit in each year
(excluding two when he missed a lot of games)
1920
54
1921
59
1922 reduced
time
1923
41
1924
46
1925 reduced
time
1926
47
1927
60
1928
54
1929
46
1930
49
1931
46
1932
41
6. How many home runs did he hit on average?
a. 45.52
b. 47.91
c. 49.36
d. 51.75
e. 53.02
Fill in the following table. The “homer difference” column is the number of homers he hit
minus the average. The homer uncertainty is the square root of the number of homers he
hit. The relative difference is the homer difference divided by the homer uncertainty, so it
lets you judge quickly how large the difference is compared with the uncertainty.
homer
homer
difference uncertainty
relative
difference
1920
1921
1922 reduced time
1923
1924
1925 reduced time
1926
1927
1928
1929
1930
1931
1932
7. According to the rules about uncertainties, how many years would Ruth have hit a
number of homers MORE THAN ONE TIMES the uncertainty off the average?
a. 2/3 x 11 ~ 7
b. 1/3 x 11 ~ 4
c. Not at all
d. All the time
8. How many times did he actually come up more than one times the uncertainty off the
average?
a. 7
b. 4
c. Not at all
d. All the time
9. Was hitting 60 homers
a. Just normal luck, given how well he was hitting for the whole period 1920 – 1932?
b. Something that took an exceptional boost in his hitting ability?
The scores for the games of the Green Bay Packers (sorry, I am a fan of theirs) for the
2008 season are:
week
GB score
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
score
difference
score
uncertainty
relative
difference
24
48
16
21
24
27
34
16
27
37
29
31
21
16
17
31
Compute the average score and then the difference from the average for each game, just
as you did for Babe Ruth. Then compute the score uncertainty as the square root of the
score. Finally, fill in the last column with the score difference divided by the score
uncertainty.
10. Why are the relative differences so much larger than for Babe Ruth’s homers?
a. The Packers were just plain erratic
b. Some weekends, the Packers played teams with really good defense
c. It is the effect of Lambeau Field and the Green Bay weather on the scores
d. Football points are not independent – they come in bunches of 3, 6, or even 7
11. In balancing your checking account, you find that you think it has $420, while the
bank says it has only $400 and that you owe them the $20 difference. Since 20 is the
square root of 400,
a. you really owe the bank $20
b. the difference is within the uncertainties and they should be satisfied
c. the uncertainty should be computed on the pennies, so square root (400 X 100) =
200 pennies = $2 and you owe them $18
12. Why might the example with the bank be different from the rest?
a. Banks like to make a profit
b. Because of the stimulus money, banks can be very liberal with funds
c. Bank deposits do not qualify as random events
d. Banks charge for overdrafts
13. Next year, if Derrick Williams by some chance got exactly 232 foul shots again (see
introduction to this assignment) but he made 182, raising his percentage from 68 to 78%,
would you conclude that he had gotten better?
a.
No, just luckier
b.
Yes, that’s ten percentage points
c.
Probably, but it could still be a big swing of luck
14. Nic Wise hit 137 of 156 tries or 88%, while Kyle Fogg hit 83 of 110 tries for 75.5%.
Who is the better free throw shooter?
a.
Wise
b.
Fogg
c.
Can’t tell
15. If you had just guessed the answers for the three hourly exams in this course, you
would expect 20% correct, of 10 out of 50. You get to replace the lowest score at the time
of the final. What is that score likely to be?
a.
10 right
b.
7 right
c.
4 right
d.
there is no way to predict