Module 3 Review - Seattle Central College

Module 3 Review
Directions: Answer the following questions on another sheet of paper
Questions 1-16 refer to the following situation: Is there a relationship between crime rate and the
number of unemployment rate among young men? The data below is for several US cities, collected by
the FBI's Uniform Crime Report and other government agencies. The first column is the unemployment
rate (as a percentage) for men aged 16-24. The second column is the crime rate (number of offenses
reported to police per million population)
city
unemployment rate
crime rate
Allentown
23.1
195
Busytown
9.4
105
Charleston
14.3
187
Daisyville
12.5
165
Easyville
11.2
124
1. What is the explanatory variable?
2. What is the response variable?
3. Make a scatterplot, then describe it.
4. Compute (with a calculator) the mean and standard deviation of the unemployment rate
5. Compute (with a calculator) the mean and standard deviation of the crime rate
6. Compute the correlation (r), then explain what the result means.
7. Compute the least-squares regression line
8. In the least-squares regression line, (from problem 7), which number represents the slope? Explain the
meaning of this number, in the context of the problem. Include units in your answer.
9. In the least-squares regression line, (from problem 7), which number represents the y-intercept?
Explain the meaning of this number, in the context of the problem. Include units in your answer.
10. Suppose the unemployment rate for young men of another city, Fairview, is 13.6. What would you
predict the crime rate to be, according to the least-squares regression line?
11. How confident are you of your prediction in question 10? Explain your answer.
12. Suppose Fairview's actual crime rate is 154 incidents per million people. What is the residual?
Explain what a negative residual would mean. Explain what a positive residual would mean.
13. If you had to compute the SSE by hand, show what the first two rows of the table would look like.
14. Does the scatterplot, regression line, etc. indicate that the unemployment rate among young males
*causes* crime? Why or why not? If it doesn't, what does the scatterplot tell you about the relationship
between unemployment and crime?
15. What are some confounding variables in this situation?
16. Give an example of extrapolation in this situation. Explain why the extrapolation is not necessarily
valid.
17. (multiple choice) Measurements on young children in Mumbai, India, found this least-squares
regression line for predicting the height y from armspan x: y = 6.4 + 0.93x. All measurements are in
centimeters. How much, on the average, does height increase for each additional centimeter of armspan?
a) 0.15 cm
b) 6.40 cm
c) 2.00 cm
d) 0.93 cm
e) 7.33 cm
Questions 18-19 refer to the following scatterplot:
100
90
80
70
60
50
40
30
20
10
0
0
5
10
15
18. Describe the scatterplot:
19. In the scatterplot from the previous question, suppose r = 0.7. If an outlier was added to the
scatterplot, at the point (10, 10), would r increase, decrease or stay about the same?
20. A researcher wanted to find the relationship between a weight and height of middle-aged women.
Suppose the mean height was 168 cm, the standard deviation of the height was 4.5 cm. Suppose the mean
weight was 58 kg and the standard deviation of the weight was 5.1 kg. Suppose r = 0.6153.
a) Find the equation of the best-fit line. Let x = height and y = weight
b) If a woman was 174 cm tall, use the best-fit line from part (a) to predict her weight.
c) How reliable is your prediction from part (b)? Explain using concepts about correlation.
21. Match the following graphs to the following correlations (one value of r will not be used)
a) r = 0.97
b)r = -0.52
c) r = 0.76
d) r = 0.04
e) r = -0.96
1
2
3
4
22. (multiple choice) The points on a scatterplot lie very close to the line y = 4 - 3x. The correlation
between x and y is close to...
a) -3
b) -1
c) 0
d) 1
e) 4
23. Below is a scatterplot.
a)Use the scatterplot below to estimate the slope of the least squares regression line
b) Use the scatterplot below to estimate the y-intercept
c) use parts (a) and (b) to write out the equation of least squares regression line
Answers
1. Unemployment rate. This is the variable being used to try to explain and/or predict the crime rate.
2. crime rate. This is the variable in response to the unemployment rate.
3.
The direction is positive. There is a moderate amount of scatter. The form looks curvilinear, but there are only 5
points, so it’s hard to establish any definite trends.
4. Mean unemployment rate: x = 14.1 percent
SD of unemployment rate: sx = 5.34 percentage points
5. Mean of crime rate: y =155.2 incidents per million people
SD of crime rate: sy = 39.32 incidents per million people
6. r = 0804. (rounded off). This means the correlation is fairly strong and the direction is positive. Below ae the
calculations:
unemployment crime
rate (x)
rate (y)
23.1
9.4
14.3
12.5
11.2
195
105
187
165
124
9
-4.7
0.2
-1.6
-2.9
39.8
-50.2
31.8
9.8
-31.2
358.2
235.94
6.36
-15.68
90.48
675.3 total
divide by 5.34, then by 39.32, then
0.804 by 4
s 
  0.804
s 
7. b  r 
y
x
 
39.32
 5.92
5.34
a  y  bx  155.2  (5.92)(14.1)  71.7
The least-squares regression line is (using symbols) yˆ  71.7  5.92 x
Or (using words) predicted crime rate = 71.1 + 5.92(unemployment rate)
8. The slope is 5.92. It means each time the unemployment increases by one percentage point, the crime rate will
increase by 5.92 incidents per million people.
9. The y-intercept is 71.1. This means if the unemployment rate was zero, then the predicted crime rate would be
71.1 incidents per million people.
10. yˆ  71.7  5.92 x = 71.7 + (5.92)(13.6) = 152.2
The predicted crime rate would be 152.2 incidents per million people.
11. By looking at the scatterplot, there is a moderate to amount of scatter in the plot. The prediction would be
somewhat accurate. However (this is just a side comment) the sample size of only 5 cities is not very large. With a
larger sample we could have a more confident prediction.
12. Residual = y  yˆ = 154.0 – 152.2 = 1.8 incidents per million people.
The residual is positive, which means:
The actual data is 1.8 units higher than the predicted amount.
A negative residual would mean the actual data is lower than the predicted amount.
13.
x
23.1
195
208.5
-13.5
Residual
squared
182.25
105
127.4
-22.4
501.76
̂
𝒚
y
9.4
residual
14. No, because correlation does not imply causation. There could be other variables that cause both the
unemployment rate and the crime rate to increase together. However, the scatterplot DOES tell us that there is some
kind of relationship between unemployment rate and crime rate.
15. There are many possible answers. Here are some examples: level of education of the young men, drug use,
disabilities. Anything that increases both the unemployment and the crime rate would be a confounding variable.
16. Any example where the unemployment rate is outside the range of the given data (less than 9% or more than
23%) would be extrapolation. This is not necessarily valid because there is no data to show the crime rate will
continue to increase in the same way.
17. d
18. Direction is positive, strength is moderate to strong, form is linear
19. It would decrease because the outlier would create a greater amount of scatter and a weaker correlation. r is
sensitive to outliers.
20. a) b  r
sy
 0.6153
sx
 
5.1
 0.69734
4.5
a  y  ax  58  0.69734(168)  59.15
yˆ  59.15  0.69734 x
b) yˆ  59.15  0.69734 x = -59.15+0.69734*174 = 62.18 kg
c) This is not too reliable because r is low.
21.
1. e
2. d
3. a
4. c
22. d
23. a) slope is about -0.8
b) y-intercept is about 85
c) yˆ  85  0.8 x