Class 14 Assignment Unless otherwise stated, assume the

Class 14 Assignment
Unless otherwise stated, assume the underlying population (probability distribution) to be normal.
1. (EMBS 19, p 372). In 2001, the US Department of Labor reported the mean hourly earnings for US
production workers to be $14.32. A random sample of 75 production workers in 2003 showed a
sample mean of $14.68 per hour. Can we be certain that mean earnings are higher in 2003? Assume a
normal population with standard deviation of $1.45.
Value
Calculation
mu
14.32
sigma
1.45
n
75
standard deviation of sample mean
0.167432 =1.45/75^0.5
Observed sample mean
14.68
=1Pr(sample mean>14.68 given H0)
0.015772 normdist(14.68,14.32,.167,true)
We would reject Ho: mu=14.32 because the p-value calculated using the normal is less than 0.05. Ha:
mu>14.32. BTW, the Z-value is (14.68-14.32)/0.167 = 2.15.
2. (EMBS 29, p 380 modified). Diamond Sources USA thinks $5,600 is the “correct average” price for a
one-carat, V2 clarity, H-color diamond. In an effort to see whether this assumption was correct, they
made calls to 25 randomly selected dealers in the diamond district of NY city to get their prices.
a. Formulate a null and alternative hypothesis.
b. Use the data in the accompanying spreadsheet to test your hypothesis. Be certain to provide the
test statistic, a p-value, and your conclusion.
H0: mean = $5600. Ha: mean is not equal to $5,600
Average
$5,784
count
25
stdev
376.0319135
std
error
75.20638271
mu
$5,600
t-stat
2.446600852 (avg - mu)/ std error
pvalue
0.022123619 =t.dist.2t(t-stat,24)
The calculated t-stat is 2.45 which is significant in a 2T test as noted by the p-value of 0.022. We can
reject H0. We must use the t (and not the normal Z) because 376.03 is a SAMPLE standard
deviation…calculated from the data.
3. Summary statistics for the length (in minutes) of randomly selected games from major league
baseball are as follows:
̅
s
n
2002 game length
172.1
12.2
61
2003 game length
165.9
13.7
51
In 2003 the league implemented rules to speed up the game. Did the rules work? (You know, of
course, that I want a formal hypothesis, a test statistic, a p-value, and a conclusion.) (This question
comes from Ken Kelley lecture notes.)
H0: mean length is equal for the two groups. Ha: mean length is lower for 2003. This is a two-sample
one-tailed t-test of means.
pooled variance
Numerator of t
Denominator of
t
t-stat
one-tailed
pvalue
166.4990909 =(60*12.2^2+50*13.7^2)/(60+50)
6.2 =172.1 - 165.9
2.448301728 =166.5^.5*(1/61+1/51)^.5
2.532367612
0.006370281 =t.dist.rt(2.53,110)
the difference in average minutes per game is statistically significant.
We reject H0.
4. (EMBS 39, p 443). A well-known automobile magazine measured the Miles per gallon (MPG) for five
cars each from three mid-sized US models. The 15 cars were driven on an identical course by a single
driver. The test took 3 days to complete, and the 15 cars were tested in random order. The idea was
to discover any meaningful differences in the MPGs of the three models.
a. State an appropriate null and alternative hypothesis.
b. Use the results (found in the assignment spreadsheet) to test your hypothesis. Be certain to provide
a p-value and state your conclusion.
c. Is the driver married?
a. H0 is that the mean MPG is equal for all three models. Ha is that they are not all equal. Note this is a
2-tailed hypothesis. Differences among the sample means will be the evidence we will use to reject H0.
Because there are more than 2 groups, the t-test will not work. Instead, we use an ANOVA-single factor
test…available as the first option under excel’s data analysis tool.
b. Step 1, use a pivot table to get the data into three columns. Step 2, Data/Data Analysis/Anova Single
Factor. Step 3, interpret the results.
Notice how nicely the results came out. The sample means are all integers! The three sample means
were 20, 21, and 25. It looks like model C distinguished itself from the other two.
The test statistic is the F-statistic. The bigger the F, the rarer is the result. You might think of the F as
the sum of squares of all possible 2-sample t-stats. However you think of the F and what little we know
about its calculation, the F statistics is but a means to calculate the p-value. The calculated p-value here
is 0.003. Our conclusion is that the differences in sample means is statistically significant. We reject H0.
c. The drive is single, as noted in the question.