Unit 1 Review Researchers are interested in whether or not

Unit 1 Review
1) Researchers are interested in whether or not administering a motivational interview prior to
starting an exercise program affects the completion rate. 120 participants from an exercise
center in Des Moines, Iowa were randomly divided into two groups. The intervention group
received the motivational interview and the control group participated in a short class on an
unrelated topic. Researchers reported the following: participant ID#, group assignment (1:
intervention, 2: control), age, gender, height, weight, and completion status (completed
program, failed to complete program) for each participant.
a) Identify the W’s
Who-Exercise Program participants
What-(variables) Participant ID#, group assignment, age, gender, height, weight,
completion status
When-not given
Where-Des Moines, Iowa
Why-to see if the motivational interview affects completion rates.
How-120 participants were randomly assigned to motivational interview or control
group. Completion status was observed.
b) For the variables above, indicate whether it should be treated as categorical or
quantitative. If quantitative, identify the units in which it was measured.
Participant ID# - Categorial
Group assignment – Categorical
Age-quantitative (presumably years)
Gender – categorical
Height – quantitative (units not given)
Weight-quantitative (presumable lbs or kilograms)
Completion status - categorical
2) The boxplots below summarize the distribution of salaries for four different states based on
the same occupation.
a) In what state are “typical” salaries lowest?
NV
b) Which states’ salaries appear to be skewed to the right?
MA and NV
c) Supposing cost of living was the same for all four states (we know that isn’t true), in
which state would you prefer to work and why?
MA because “typical” salaries are highest
3) The table below shows whether students in an introductory statistics class like dogs and/or
cats. (De Veaux et al., 2012)
Like Dogs
a) What percent of students like dogs?
304
90.75%
335
Like
Cats
Yes
No
Total
Yes
194
110
304
No
21
10
31
b) What percent of students like dogs and cats?
194
57.91%
335
c) What percent of students like dogs but not cats?
110
32.84%
335
d) What is the marginal distribution (counts and percents) of “liking dogs”?
Like Dogs:
Yes
No
Total
304
31
335
90.75% 9.25% 100%
e) What percent of students who like dogs also like cats?
194
63.82%
304
f) What percent of students who like cats do not like dogs?
21
9.77%
215
g) What percent of students like dogs or cats?
194  110  21 215  304  194 325


97.01%
335
335
335
Like Cats
h) What is the conditional distribution of “liking dogs” for students who like cats?
Like Dogs:
Yes
194
No
21
Total
215
90.2%
9.8%
100%
Total
215
120
335
Do Not
Like Cats
i) What is the conditional distribution of “liking dogs” for students who do not like cats?
Like Dogs:
Yes
110
No
10
Total
120
91.7%
8.3%
100%
j) Do “liking dogs” and “liking cats” appear to be independent? Give statistical evidence to
support your conclusion.
The conditional distributions of liking dogs are very similar for students who like and
don’t like cats. Therefore, I would say that these variables appear to be independent.
4) The following data represent unit exam scores from a group of 20 students:
85 97 70 76 0 92 65 91 84 85
77 93 90 81 68 78 87 95 89 72
a) Find the 5 number summary for the distribution.
from TI: Min
Q1
Med
Q3
Max
0
74
84.5
90.5
97
b) Find the mean, standard deviation, range and IQR.
x  78.75
range=97
sx  20.74
1QR=90.5-74=16.5
c) Create a boxplot for these data. Calculate upper and lower fences. Label the plot.
Upper fence: Q3  1.5(1QR)  90.5  1.516.5  115.25
Lower fence: Q1  1.5(1QR)  74  (1.5)(16.5)  49.25
74
Plot:
0
*
0
65
25
50
75
84.5 90.5
97
100
d) What would be the most appropriate measure of center and spread for this distribution?
Why?
Center: Median
Spread: 1QR
Why? Because the distribution has an outlier
(and a slight skew to the left).
5) Suppose one of the authors of our textbook collected the times (in minutes) it took him to run
4 miles on various courses during a 10-year period. Here is a histogram of the times: (De
Veaux et al., 2014)
a) Describe this histogram’s shape. Be sure to mention modality, symmetry, and unusual
features.
This distribution is unimodel, skewed to the right and has a gap between 33.5 and 34
minutes.
b) Based on this shape, would you expect the mean or median to be higher? Why?
Mean. The mean will be pulled in the direction of the skew.
c) What would be the most appropriate measure of center and spread for this distribution?
Why?
Median, 1QR because the distribution is skewed.
d) Approximately how many times did the author run 4 miles in 33 minutes or more?
Approximately
5  4  1  1  11 times
 we would know more precisely if the bar frequency counts were labeled  .
5) Jennifer took the ACT and scored a 29. She also took the SAT and scored a 1350 (Critical
Reading + Mathematics). ACT scores have a mean of 21.0 and a standard deviation of
approximately 4.7. SAT scores have a mean of 1026 and a standard deviation of 209. On
which test did she perform relatively better?
ACT
SAT
x  29
x  1350
x  21.0
x  1026
s  4.7
s  209
Z ACT 
x  x 29  21

s
4.7
Z ACT 
 1.70
x  x 1350  1026

s
209
 1.55
She performed relatively better on the ACT.
7) The costs for standard veterinary services at a local animal hospital follow a Normal model
with a mean of $90 and a standard deviation of $30. (De Veaux et al., 2012)
  90
Diagrams are recommended.
  30
N  90, 30 
a) What percentage of standard veterinary bills will be greater than $175?
Normalcdf (175,1  99, 90, 30)  .0023
0.23%
90
175
b) What percentage of standard veterinary bills will be between $80 and $150?
Normalcdf (80,150, 90, 30)  .6078
60.78%
80 90
150
c) What would be the veterinary bill amount that separates the cheapest 25% of visits for
standard services?
invNorm(.25,90,30)  69.765
$69.77
.25
d) What is the range of costs for the middle 70% of standard visits?
invNorm (.15, 90, 30)  58.91
invNorm (.85, 90, 30)  121.09
the range is $58.91 to $121.09
15%
70%
e) What is the IQR for the costs of standard veterinary services?
invNorm (.25, 90, 30)  69.765
invNorm (.75, 90, 30)  110.235
1QR=Q3-Q1=110.235-69.765=$40.47
25%
50%
8) A machine fills boxes of pasta according to a Normal model with mean 16.2 ounces and
standard deviation of 0.1 ounces.
N(16.2,0.1)
a) If the boxes claim to have 16 ounces (1 lb) of pasta each, what percent of boxes are
under-filled?
Normalcdf ( 1  99,16,16.2, 0.1)  .02275
2.28% are underfilled
16
16.2
b) What is the z-score of a box of pasta that contains only 15.92 ounces?
z
x


15.92  16.2
 2.8
0.1
c) How many ounces of pasta does a box contain if its z-score is 1.7?
x  16.2
1.7 
x  16.37oz
0.1
d) Explain in your own words: What does a z-score indicate?
A z-score tells us how many standard deviations a value is above (+) or below (-) the
mean.
9) Describe the scatterplots in terms of shape (linear/non-linear), direction, and strength.
linear
negative
moderate/weak
linear
positive
strong
10) Several scatterplots are given with calculated correlations. Which is which?
c) 0.004
d) 0.753
a) -0.944
b) -0.435
11) Georgetown University reports the average GPA of graduating seniors for a selection of
years:
1974
3.05
1980
3.13
1987
3.22
1994
3.29
1999
3.29
2002
3.37
2005
3.42
2006
3.42
Source: Committee on Intellectual Life, Georgetown University, March 2007
a) What would be appropriate choices for the explanatory and response variables? why?
explanatory-year
response-GPA
It is not possible for GPA to affect the year, but it is possible that GPA’s have changed
over time due to a variety of factors
b) Create and describe the scatterplot in terms of form, direction, and strength. Is a linear
model appropriate?
Strong, positive and linear. Yes, a linear model is appropriate.
c) Compute the correlation coefficient. In what ways is the correlation coefficient
consistent with your description in part b?
r  .988 - this is consistent with a strong, positive linear association.
d) Compute the linear model to 5 decimal places.
y  0.01112 x  18.90183 * if you round too much this will affect f)
e) Interpret the slope and y-intercept in the context of the scenario.
Slope: The model predicts that for every additional year, average GPA will increase
by 0.01112
y-intercept: The model suggests that in the year 0, GPA would have been -18.90.
Clearly the model does not apply that far back in time.
f) Use the linear model to predict the average GPA for the year 2015. Round to 2 decimal
places.
y  0.01112  2015  18.90183  3.50
g) Suppose the actual average GPA of graduating seniors in the year 2015 is 3.48. Compute
and interpret the residual.
e  y  y  3.48  3.5  .02
Residual:
Interpretation: The model over-estimated the GPA of graduation seniors by 0.02