pdf

Math 243 – Summer 2012
PS 15
1
1 Conducting a hypothesis test can be considered a four-step coef(lm(width length + shuffle(sex), data =
~
process:
KidsFeet))
a) State the null and alternative hypotheses.
b) Compute a test statistic.
The test statistics should be a number computed from
data that measures the evidence for or against the null
hypothesis.
(Intercept)
2.95915
length shuffle(sex)G
0.24560
-0.07982
coef(lm(width ~ length + shuffle(sex), data =
KidsFeet))
(Intercept)
2.861253
length shuffle(sex)G
0.247925
0.003256
c) Determine a p-value.
Since we are randomly shuffling the sex of the kids, sex can’t
have any meaningful relationship with width in these shuffled
The p-value is a probability that helps us determine if
data sets. Let’s try a few more.
our test statistic would be unusually large or small if null
hypothesis were true.
r.squared(lm(width ~ length + shuffle(sex),
data = KidsFeet))
d) Draw a conclusion.
[1] 0.4147
If the p-value is small, it says that data like ours would
be unusual if the null hypothesis were true. The smaller r.squared(lm(width length + shuffle(sex),
~
the p-value, the stronger the evidence against the null
data = KidsFeet))
hypothesis.
[1] 0.4298
In modeling, a sensible null hypothesis is that one or more explanatory variables are unrelated to the response variable.
r.squared(lm(width ~ length + shuffle(sex),
data = KidsFeet))
For our test statistic, we need a number that takes on one sort
of value when the null hypothesis is true and another sort of
value when the null hypothesis is false. There are several possi- [1] 0.4362
bilities, but here we will use R2 from the model since we would
expect R2 to be near 0 when the null hypothesis is true and
By computing many such trials, we construct the sampling dislarger when the null hypothesis is false.
tribution under the null hypothesis — that is, the distribution
of the test statistic in a world in which the null hypothesis holds
Let’s compute our test statistic on our data.
true. We can automate this process using do():
r.squared <- function(model) {
summary(model)$r.squared
}
r.squared(lm(width ~ length + sex, data = KidsFeet))
nullDist = do(1000) * r.squared(lm(width ~
length + shuffle(sex), data = KidsFeet))
head(nullDist)
result
1 0.4173
2 0.4111
3 0.4131
So, how unusual would it be to get a test statistic this large 4 0.4588
– assuming the null hypothesis is true? One way to find out 5 0.4432
is to simulate a situation in which the null hypothesis is true 6 0.4112
by shuffling the variables. For example, here are two trials of
a simulation of the null hypothesis in a model of the KidsFeet
The p-value is the probability of seeing a value of the test statisdata:
tic from the null hypothesis simulation that is more extreme
than our actual value. The meaning of “more extreme” depends on what the test statistic is. In this example, since a
[1] 0.4595
PS 15
Math 243 – Summer 2012
better fitting model will always have a larger R2 we check the
probability of getting a larger R2 squares from our simulation
than from the actual data.
2
a) lm(width ~ length + shuffle(sex), data =
KidsFeet)
table(nullDist >= 0.4595)
FALSE
910
b) lm(width ~ shuffle(length) + shuffle(sex),
data = KidsFeet)
TRUE
90
prop(~ result >= 0.4595, data = nullDist)
c) lm(width ~ shuffle(length), data = KidsFeet)
TRUE
0.09
statTally(0.4595, nullDist)
Null distribution appears to be asymmetric. (p =
9.75e-12)
Test statistic applied to sample data = 0.4595
Quantiles of test statistic applied to random
data:
50%
90%
95%
99%
0.4188 0.4577 0.4712 0.5091
Of the random samples
0 ( 0 % ) had test stats = 0.4595
90 ( 9 % ) had test stats > 0.4595
d) lm(width ~ shuffle(sex), data = KidsFeet)
e) lm(width ~ length + sex, data =
shuffle(KidsFeet))
• Foot width is unrelated to foot length or to sex.
a b c d e
• Foot width is unrelated to sex, but it is related to foot
length.
a b c d e
40
Density
30
• Foot width is unrelated to sex, and we won’t consider any
possible relationship to foot length.
a b c d e
20
10
0
0.45
0.50
0.55
stat
Our p-value is about 9%. What do we conclude from this?
Our R2 value of 0.4595 is a bit on the large side, but not extremely unusual. We would get R2 values at least this large
from roughly 9% of samples even if the null hypothesis were
true. That probably isn’t enough evidence to reject the null
hypothesis.
• Foot width is unrelated to foot length, and we won’t consider any possible relationship to sex.
a b c d e
• This isn’t a hypothesis test; the randomization won’t
change anything from the original data.
a b c d e
Here are various computer modeling statements that implement 2 Often we are interested in whether two groups are differpossible null hypotheses. Connect each computer statement to ent. For example, we might ask if girls have a different mean
the corresponding null hypothesis.
footlength than do boys. We can answer this question by constructing a suitable model.
PS 15
Math 243 – Summer 2012
3
summary(lm(length ~ sex, data = KidsFeet))
Here is the report from a related, but slightly different model:
Call:
lm(formula = length ~ sex, data = KidsFeet)
summary(lm(length ~ sex - 1, data = KidsFeet))
Residuals:
Min
1Q Median
-2.721 -0.713 -0.121
3Q
0.795
Max
2.395
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
25.105
0.285
88.18
<2e-16
sexG
-0.784
0.408
-1.92
0.062
Call:
lm(formula = length ~ sex - 1, data = KidsFeet)
Residuals:
Min
1Q Median
-2.721 -0.713 -0.121
3Q
0.795
Max
2.395
Coefficients:
Estimate Std. Error t value Pr(>|t|)
sexB
25.105
0.285
88.2
<2e-16
Residual standard error: 1.27 on 37 degrees of freedom
sexG
24.321
0.292
83.3
<2e-16
Multiple R-squared: 0.0908,Adjusted R-squared: 0.0662
F-statistic: 3.69 on 1 and 37 DF, p-value: 0.0623
Residual standard error: 1.27 on 37 degrees of freedom
Multiple R-squared: 0.997,Adjusted R-squared: 0.997
F-statistic: 7.35e+03 on 2 and 37 DF, p-value: <2e-16
Interpret this report, keeping in mind that the foot length is
reported in centimeters. (The reported value <2e-16 means
p < 2 × 10−16 .)
Note that the p-values for both coefficients are practically zero,
p < 2 × 10−16 .
a) What is the point estimate of the difference between the
lengths of boys and girls feet.
What is the null hypothesis tested by the p-value on sexG?
Girls’
feet
are,
on
average,
25
centimeters
A
A Girls’ feet have a different length, on averlong.
age, than boys’.
B Girls’ feet are 0.4079 cm shorter than boys’.
B Girls’ feet are no different in length, on avC Girls’ feet are 0.7839 cm shorter than boys’.
erage, than boys’.
D Girls’ feet are 1.922 cm shorter than boys’.
C Girls’ footlengths are, on average, zero.
D Girls’ footlengths are, on average, greater
b) The confidence interval can be written as a point estimate
than zero.
plus-or-minus a margin of error: P ±M . What is the 95%
margin of error, M , on the difference between boy’s and
girl’s foot lengths. -0.78 0.28 0.41 0.60 0.80
c) What is the null hypothesis being tested by the reported
p-value 0.0623?
A Boys’ feet are, on average, longer than girls’
feet.
B Girls’ feet are, on average, shorter than boys’
feet.
C All boys’ feet are longer than all girls’ feet.
D No girl’s foot is shorter than all boys’ feet.
E There is no difference, on average, between
boys’ footlengths and girls’ footlengths.
3 P-values concern the “statistical significance” of evidence for
a relationship. This can be a different thing from the real-world
importance of the observed relationship. It’s possible for a weak
connection to be strongly statistically significant (if there is a
lot of data for it) and for a strong relationship to lack statistical
significance (if there is not much data).
Consider the data on the times it took runners to complete the
Cherry Blossom ten-mile race in March 2005:
names(TenMileRace)
[1] "state" "time"
d) What is the null hypothesis being tested by the p-value
on the intercept?
A Boys’ and girls’ feet are, on average, the
same length
B The length of kids’ feet is, on average, zero.
C The length of boys’ feet is, on average, zero.
D The length of girls’ feet is, on average, zero.
E Girls’ and boys’ feet don’t intercept.
"net"
"age"
"sex"
Consider the net variable, which gives the time it took the
runners to get from the start line to the finish line.
Answer each of the following questions, giving both a quantitative argument and also an everyday English explanation.
Assessing statistical significance is a technical matter, but to
interpret the substance of a relationship, you will have to put
it in a real-world context.
Math 243 – Summer 2012
PS 15
a) What is the relationship between net running time and
the runner’s age? Is the relationship significant? Is it
substantial?
4
• Why does domhand have 1 degree of freedom in the first
ANOVA report, but 2 degrees of freedom in the second?
b) What is the relationship between net running time and 5 Here is an ANOVA table (with the “intercept” term included)
the runner’s sex? Is the relationship significant? Is it from a fictional study of scores assigned to various flavors, textures, densities, and chunkiness of ice cream. Some of the values
substantial?
in the table have been left out. Figure out from the rest of the
table what they should be.
c) Is there an interaction between sex and age? Is the relationship significant? Is it substantial?
Df Sum-Sq Mean-Sq F-value p-value
(intercept)
1
_A_
200 _B_
_C_
flavor
8
640
80 _D_
0.134
density
_E_
100
100
2
0.160
fat_content
1
300
_F_
6
0.015
4 Consider the following analysis of the kids’ feet data looking
chunky
1
200
200
4
0.048
for a relationship between foot width and whether the child is
Residuals
100
5000
50
left or right handed. The variable domhand gives the handedness, either L or R. We’ll construct the model in two different
ways. There are 39 cases altogether.
anova(lm(width ~ domhand, data = KidsFeet))
Analysis of Variance Table
(a) The value of A:
1 2 100 200 400 600
(b) The value of B:
1 2 3 4 5 6 7 8 10 20 200
Response: width
Df Sum Sq Mean Sq F value Pr(>F)
domhand
1
0.33
0.325
1.26
0.27
Residuals 37
9.54
0.258
(c) The value of C: (Hint: There’s enough information in the
table to find this.)
0.00 0.015 0.030 0.048 0.096 0.134 0.160 0.320 0.480
anova(lm(width ~ domhand - 1, data = KidsFeet))
(d) The value of D:
0.0 0.8 1.6 3.2 4.8
Analysis of Variance Table
(e) The value of E:
0 1 2 3 4 5 6
Response: width
Df Sum Sq Mean Sq F value Pr(>F)
domhand
2
3154
1577
6115 <2e-16
Residuals 37
10
0
(f) The value of F:
100 200 300 400 500
(g) How many cases are involved altogether?
50 100 111 112 200 5000
• Explain why, in the first case, the p-value is not signifi- (h) How many different flavors were tested?
1 3 5 8 9 10 12 100
cant, but in the second case it is.