Solutions 9

Homework 9 Solutions
(1) What is the purpose of an ANOVA?
(A) To test if the means of two populations are the same against the alternative that
they are different.
(B) To test if the means of more than two populations are same against the alternative
that at least one is different.
(C) To test if the variances of more than two populations are the same against the
alternative that at least one is different.
B
(2) A meterologist wants to understand whether the mean temperature in College Station
is the different to the mean temperature in Snook (which is about 20 miles away from
College Station). In order to test this hypothesis she measures the temperature at
10am in College Station and Snook between April 6th and April 15th (10 readings
from both towns).
(a) What test would you recommend she does to test equality of the mean temperatures against the alternative that they are different?
(A)
(B)
(C)
(D)
(E)
A two sample independent t-test.
A paired t-test.
A Wilcoxon sign rank test.
A Wilcoxon sum rank test.
ANOVA.
– C - Since College Station and Snook are close to each other there is
likely to be a dependence in their daily data, thus a Paired test is
required. However, the sample size (10 observations) is small, so it
seems sensible to do a Wilcoxon sign rank test.
(b) Considering how the data was collected, what fundemental assumption (that all
the above tests require) is likely to have been violated.
These observations are take over a period of 10 days, thus it is highly
likely that there is a dependency between the observations (today’s
temperature has an influence on tomorrow’s temperature). Therefore
the assumption of independence of the observations (a major assumption in all the tests we have been done so far) is unlikely to hold.
(3) In a few weeks time local elections will take place in Sutown. Mr. Big is contesting
against Dr. Small. In a recent poll of 700 people, it was found that 380 were in favour
of Dr. Small and 320 were in favour of Mr. Big.
(i) Test the hypothesis that Dr. Small will win the election (this will be the alternative hypothesis) (state precisely the null and alternative and do the test at the
5% level).
Let p = the proportion of the vote Dr. Small will
q get. H0 : p ≤ 0.5
2
against HA : p > 0.5. p̂ = 380/700 = 0.542, the s.e. = 0.5
= 0.0188. The
700
z-transform is Z = (0.542 − 0.5)/0.0188 = 2.23. Therefore P(Z > 2.23) =
0.0127 = 1.27%. The p-value is 1.27%, as this is smaller than 5% we reject
the null at the 5% level. Therefore, given the data is seems likely that
Dr. Small will win the election.
(ii) How many people need to be in the sample such that the 95% confidence interval
for the proportion who will vote for Dr. Small will have length 3% (look at lecture
28)?
q
. We want this length to be
The length of the CI is 2 × 1.96 × π(1−π)
n
q
0.03. Thus we solve 2 × 1.96 × π(1−π)
= 0.03. However, the proportions
n
are unknown, however as a polling company we want to be sure at least
95% confident this interval contains the mean, so we use the π 0 s that
make this CI the longest, that is π = 0.05 (a different π and the interval
will be narrower) - note that this choice of π has nothing to do with
part (i) of the question, always in the case of proportions you should
use π = 0.5 in constructing the CI to construct a confidence interval
which is at least at least a 95% CI.
p
Solving 2 × 1.96 × 0.25/n = 0.03 gives n = 4269.
(4) Castaneda v Partida is an important court case in which statistical methods were used
as part of the legal argument.
In Castaneda the plantiffs alleged that the method for selecting juries in a county in
Texas as biased against Mexican Americans. For the period of time of issue, there were
181,535 persons eligible for jury duty, of whom 143,611 were Mexican Americans. Of
the 870 people selected for jury duty, 339 were Mexican American.
(a) We can do a one-sample proportion test, by using p = 143,611
= 0.79 as the overall
181,535
proportion of mexicans in the county and comparing the proportion of Mexican
american on jury to this.
Figure 1:
(i) Explain why we are testing H0 : p = 0.79 against HA : p < 0.79?
The platiff alleges that the method of selection is biased against Mexican Americans. This means they want to see if there is evidence
to suggest that Mexican Americans are under-represented in Juries.
We articulate this in the alternative as the proportion of Mexican
Americans doing jury service being less than the total proportion
in the Mexican Americans in the county, ie. HA : p < 0.79.
(ii) Using the output in Figure 1 calculate the z-transform and the the corresponding p-value.
= −29.7. The alternative is pointing to the left, so the
z = 0.38−0.79
0.138
p-value is the area to the left of z = −29.7. If we use the tables we
see that it only goes up to z = −3.49, which corresponds to the area
of 0.02%. Therefore the p-value is a lot less than 0.02% and there
is evidence to suggest that the proportion of Mexican Americans
who did jury service is less than 0.79.z = 0.38−0.79
= −29.7. The
0.138
alternative is pointing to the left, so the p-value is the area to the
left of z = −29.7. If we use the tables we see that it only goes up
to z = −3.49, which corresponds to the area of 0.02%. Therefore
the p-value is a lot less than 0.02% and there is evidence to suggest
that the proportion of Mexican Americans who did jury service is
less than 0.79.
(iii) Is there any evidence to support the view that Mexican Americans are being
underrepresented in jury duty?
Yes, since the p-value is a lot smaller than 5%. If the p-value was
quite large the defence would argue that the proportion of mexican
americans on juries in less than the proportion of non-Mexican
Americans, however this difference can be explained by random
chance.
(b) An alternative method is to do a two-sided test and compare proportions of Mexican Americans who did jury duty with the proportion of non-Mexican American
who did jury duty.
Figure 2:
(i) What hypothesis are you testing using this method?
H0 : pA − pM = 0 against HA : pA − pM > 0 (where pM = proportion of Mexican Americans and pA = proportion of non-Mexican
Americans).
(ii) Using the output in Figure 2 calculate the z-transform and the the corresponding p-value.
The z-transform is z = 0.0116/(3.98 × 10−4 ) = 29.1. Since the alternative is pointing to the right, the p-value is the area to the right of
29.1. Using the same arguments as those in (aii) this corresponds
to the a p-value which is less than 0.02%.
(iii) Is there any evidence to support the view that Mexican Americans are being
underrepresented in jury duty?
As the p-value is less than 0.02%, which is less than the 5% significance level, there is evidence to suggest that the proportion of
Mexican Americans who did jury service is less than the proportion
of non-Mexican Americans who did jury service.
(5) Three instructors gave STAT 651 exams last week. The number of As and Bs are
recorded below. Students want to know whether the instructor has an influence on the
grade a student obtains.
(a) State the null and alternative hypothesis to investigate.
H0 :There is no association between grade and instructor HA :There is
an association between grade and instructor.
(b) State the test you would do and why?
A chi-squared test for independence.
A
B
Instructor 1
20
30
50
Instructor 2
40
30
70
Instructor 3
20
30
50
80
90
170
(c) Do the test at the 5% level, and report your findings.
Since χ2 = 4.85 ≯ 5.99 = χ20.05,2 , we cannot reject H0 . There is no evidence
to assume the dependence between grade and instructor.
A
B
Instructor 1
80
× 50
170
90
× 50
170
50
Instructor 2
80
× 70
170
90
× 70
170
70
Instructor 3
80
× 50
170
90
× 50
170
50
80
90
170
(6) A gaming magazine is conducting a survey on the number of different gaming consoles
an individual owns. There are currrently three different gaming consoles on the market
(Nintendo, Playstation and Xbox) and it thought that the probability an individual
should own any one of these consoles is 0.4 (ie, If I randomly pick a person and ask
them do you own a Nintendo the probability is 40%. If I randomly pick a person and
ask do you own a Playstation the probability is 40%. I randomly pick a person and
ask them if they own a Xbox it is 40%).
(a) Assuming that ownership of one system is completely independent of ownership
of another system calculate the probability that an individual owns none, one,
two or all three systems (hint: use the Binomial distribution).
No consoles
1 console
2 consoles
3 consoles
0.216
0.432
0.288
0.064
Probability
(b) The gaming magazine interviews 220 people, and asks them how many consoles
they own (if any). The results are given below. Based on the above survey data,
0 consoles
Number
120
0.545
1 console
70
0.318
2 consoles
20
0.091
3 consoles
10
0.046
is there evidence to suggest that the probabilities derived in part A are not the
correct probabilities? State the test you will do, do the test at the 5% level and
give the conclusions of the test.
Use Chi-squared test. Based on the expected and observed probability
we can get χ2 = 148.01 > 7.815 = χ20.05,3 and therefore we can reject the
null. That is, there is evidence to suggest the probabilities derived in
part A are not correct.