LAST NAME (Please Print): KEY FIRST NAME (Please Print)

LAST NAME (Please Print): KEY
FIRST NAME (Please Print):
HONOR PLEDGE (Please Sign):
Statistics 111
Midterm 3
• This is a closed book exam.
• You may use your calculator and a single page of notes.
• The room is crowded. Please be careful to look only at your own exam. Try to sit
one seat apart; the proctors may ask you to randomize your seating a bit.
• Report all numerical answers to at least two correct decimal places or (when appropriate) write them as a fraction.
• All question parts count for 1 point.
1
1. What are the assumptions needed for linear regression? (6 points)
(1) There is a linear relationship between X and Y. The errors are (2) normal with
(3) mean zero and (4) constant standard deviation (or variance). (5) The errors are
independent. (6) The X values are measured without error.
2. 25 You want to test whether the average breaking strength of threads manufactured
by Dufarge Industries is less than 0.3 pounds. You know (from years of experience)
that the variance in thread strength is 0.01 pounds. Suppose that the true mean
breaking strength is 0.25 pounds. How large a sample do you need in order for the
Type I error to be 0.05 and the Type II error to be 0.2?
p
0.8 =pIP[ts < −1.645] = IP[(X̄ − 0.3)/ 0.01/n < −1.645]
p = IP[(X̄ − 0.25 + 0.25 −
0.3)/ p0.01/n < −1.645] = IP[Z < −1.645−(0.25−0.3)/ 0.01/n] = IP[Zp< −1.645+
0.05/ 0.01/n]. Since P [Z < 0.84] = 0.8, then 0.84 = −1.645 + 0.05/ 0.01/n and
thus the smallest sample size n = 25.
3. 0.19 Two urns each contain an infinite number of marbles. Urn A has 50% white
marbles and 50% black marbles. Urn B has 30% white marbles and 70% black marbles.
You are blindfolded, and draw five marbles from one of the urns; your guide tells you
that you drew from Urn A, but you suspect he may be lying. After the blindfold
is removed, you see that 4 of your marbles are black. What is your significance
probability?
The significance probability is the chance of getting something as or more supportive
of the alternative than what you observed, when the null hypothesis is true. In this
case, the null is Urn A, and the alternative is Urn B. Getting data that supports Urn
B as more strongly than 4 black marbles is seeing 4or 5black marbles.
The chance
of 4 or 5 black marbles in five draws from Urn A is
5
4
.55 +
5
5
.55 = 0.1875.
4. You want to set a 90% lower confidence interval on the median number of collisions
per day in Durham. You sample 50 days, and the sample median is 15.5. Then you
draw 20 bootstrap samples from the data and find their medians to be:
7, 11, 15, 18, 4, 6, 10, 14, 9, 22, 5, 18, 12, 17, 21, 23, 8, 12, 15, 20
2
9.5 What is the one-sided lower 90% CI bound from the pivot bootstrap?
2 * 15.5 - 21.5 = 9.5, where 21.5 is the midpoint between the second and third largest
numbers, 22 and 21.
0.36 What is the probability that a specific sample day is not used in calculating the
first bootstrap median?
(1 − 1/50)50 = 0.3642.
5. To make more money in casino, you attempt to “shave” a die so that the sides marked
2 and 6 each have probability 0.2, and the other faces are equally likely. Shaving this
precisely is difficult; You test your result by rolling the die 600 times, getting 85 ones,
130 twos, 85 threes, 95 fours, 90 fives, and 115 sixes.
In practical words, what is the alternative hypothesis?
It is not true that P[2]=P[6]=.2 and P[1]=P[3]=P[4]=P[5]=.6/4; at least some of these
equalities are wrong.
1.87 or 1.88 What is the value of the test statistic?
ts = (85 − 90)2 /90 + (130 − 120)2 /120 + (85 − 90)2 /90 + (95 − 90)2 /90 + (90 − 90)2 /90 +
(115 − 120)2 /120 = 1.875.
χ25 What distribution do you use to find the P-value? (Include degrees of freedom.)
What is the P-value? (Give a range if appropriate.) (0.8, 0.9)
Between 0.9 and 0.8.
11.07 What is the critical value for a 0.05 level test?
3
State your conclusion for a 0.05 level test in words.
No reason to reject the null; this was a good job of shaving.
Suppose your test leads you to believe that your shaving effort was successful. But
you want further confirmation, and now toss the shaved die 6 billion times. What
conclusion will you reach and why? (2 points)
We would reject the null hypothesis of perfect shaving (1 pt); very small errors will
be found with such a large sample (1 pt).
6. 60.92 You sample 15 children from a class of 30 and administer a standardized test.
Their average score is 60, with a sample variance of 8. What is the upper bound for
a two-sided 90% confidence interval on the mean score for the entire class?
The point estimate is 60. The
p standard
p error must be adjusted by the finite population
correction factor, so it is 8/15 ∗ (30 − 15)/(30 − 1) = 0.5252. And since this is a
small sample situation and the population sd is estimated by the sample sd, we must
use a t-distribution with df = 15 − 1 = 14. For a two-sided 90% interval, the critical
values are 1.76 and -1.76. Thus L = 60 − (0.5252)(1.76) = 59.0756 and U = 60.9244.
7. In most states, about 10% more men than women vote Republican. You sample 50
men and 100 women in NC, and find that 30 men and 55 women will vote Republican.
You want to show, at the 0.05 level, that the gender gap in NC is smaller than the
national gap. Subtract women from men.
In symbols, what is your null hypothesis?
H0 : pm − pw ≥ 0.1.
-0.59 What is your test statistic?
ts = (p̂m −p̂w −0.1)/
q
pˆm (1−p̂m
nm
+
pˆw (1−p̂w
nw
−0.5862.
0.28 What is your significance probability?
4
p
= (0.6−0.55−0.1)/ (0.6)(0.4)/50 + (.55)(.45)/100 =
Go to the z-table. The probability of a result below -0.59 under the null is 0.278.
What is your conclusion?
The significance probability is much larger than the α = 0.05 level, so we fail to reject.
There is no evidence that the gender gap in NC is less than the national difference.
Suppose you had sampled married couples. Name (a) one good thing about that plan
and (b) one bad thing about that plan.
Note to Grader: I’ve given the two most obvious answers, but there will be a wide
range of response. At a minimum, the first answer should certainly say that pairing
controls for important factors. For the second answer, if there is anything that is sort
of sensible, give it to them.
(a) The pairing controls for income, location, etc.
(b) The sample will be biased towards older heterosexuals.
8. 0.58 There are three coins, with probabilities of Heads equal to 0.4, 0.6 and 0.7. You
draw one at random and toss it 5 times, getting three Heads. What is the probability
that the next toss will be Heads?
Make a table as in the RU-486 example. The probability of 3 Heads wtih the 0.4
coin is 0.2304, with the 0.6 coin is 0.3456, and with the 0.7 coin is 0.3087. So the
posterior probabilities of those coins are 0.2604, 0.3906, and 0.3489, respectively. So
the probability that a sixth toss is a Heads is 0.4 * 0.2604 + 0.6 * 0.3906 + 0.7 *
0.3489.
9. List all and only the true statements. B, E, F
A. As points cluster more tightly around a regression line, correlation increases.
B. If you make many tests, your overall alpha increases.
C. The percentile bootstrap is better than the pivot.
D. For fixed n, as the alpha level decreases, power increases.
E. For fixed α, as the sample size increases, power increases.
5
F. As confidence increases, the width of the interval increases.
10. The computer output on the last page shows a regression of predicting fat content (in
grams) from the number of calories for 75 brands of candy bars; each brand averages
information from 5 random bars. The mean number of calories is 243 and the sd of
the number of calories is 61.58.
8.67 What is the estimated number of fat grams in a 200-calorie candy bar?
Plug 200 into the regression equation for X.
5.33 What is the residual for a candy bar that has 200 calories and 14 grams of fat?
The residual is the observed Y minus the estimated value for Y, or 14-8.673.
0.81 What is the correlation?
The square root of the R2 value; in this case, since the slope is positive, we take
the positive square root.
17.17 What is the upper bound of a two-sided 95% confidence interval on the averge
grams of fat in 300 calorie candy bars?
(−6.247 + 300 ∗ 0.07456) ± 3.4050 ∗
p
(1/75) + (300 − 243)2 /(74 ∗ 61.582 ) ∗ 1.96
22.88 What is the upper bound of a two-sided 95% prediction interval on the grams of
fat in the 300 calorie Mars bar you just ate?
p
(−6.247+300∗0.07456)±3.4050∗ 1 + (1/75) + (300 − 243)2 /(74 ∗ 61.582 )∗1.96
What does the R2 value tell you?
The proportion of the variation in Y explained by knowledge of X.
6
Look at the residual plot. Is there any problem? If so, describe it.
It is not shaped like a cigar. There is more spread for smaller values of X.
Yes Is this an ecological correlation?
Yes. Each point is the average of five candy bars—we don’t know whether, for
that dot, the the high fat bars tend to have higher calories, or whether the low
fat have the higher calories.
7
8