Module 3 Review Directions: Answer the following questions on another sheet of paper Questions 1-16 refer to the following situation: Is there a relationship between crime rate and the number of unemployment rate among young men? The data below is for several US cities, collected by the FBI's Uniform Crime Report and other government agencies. The first column is the unemployment rate (as a percentage) for men aged 16-24. The second column is the crime rate (number of offenses reported to police per million population) city unemployment rate crime rate Allentown 23.1 195 Busytown 9.4 105 Charleston 14.3 187 Daisyville 12.5 165 Easyville 11.2 124 1. What is the explanatory variable? 2. What is the response variable? 3. Make a scatterplot, then describe it. 4. Compute (with a calculator) the mean and standard deviation of the unemployment rate 5. Compute (with a calculator) the mean and standard deviation of the crime rate 6. Compute the correlation (r), then explain what the result means. 7. Compute the least-squares regression line 8. In the least-squares regression line, (from problem 7), which number represents the slope? Explain the meaning of this number, in the context of the problem. Include units in your answer. 9. In the least-squares regression line, (from problem 7), which number represents the y-intercept? Explain the meaning of this number, in the context of the problem. Include units in your answer. 10. Suppose the unemployment rate for young men of another city, Fairview, is 13.6. What would you predict the crime rate to be, according to the least-squares regression line? 11. How confident are you of your prediction in question 10? Explain your answer. 12. Suppose Fairview's actual crime rate is 154 incidents per million people. What is the residual? Explain what a negative residual would mean. Explain what a positive residual would mean. 13. If you had to compute the SSE by hand, show what the first two rows of the table would look like. 14. Does the scatterplot, regression line, etc. indicate that the unemployment rate among young males *causes* crime? Why or why not? If it doesn't, what does the scatterplot tell you about the relationship between unemployment and crime? 15. What are some confounding variables in this situation? 16. Give an example of extrapolation in this situation. Explain why the extrapolation is not necessarily valid. 17. (multiple choice) Measurements on young children in Mumbai, India, found this least-squares regression line for predicting the height y from armspan x: y = 6.4 + 0.93x. All measurements are in centimeters. How much, on the average, does height increase for each additional centimeter of armspan? a) 0.15 cm b) 6.40 cm c) 2.00 cm d) 0.93 cm e) 7.33 cm Questions 18-19 refer to the following scatterplot: 100 90 80 70 60 50 40 30 20 10 0 0 5 10 15 18. Describe the scatterplot: 19. In the scatterplot from the previous question, suppose r = 0.7. If an outlier was added to the scatterplot, at the point (10, 10), would r increase, decrease or stay about the same? 20. A researcher wanted to find the relationship between a weight and height of middle-aged women. Suppose the mean height was 168 cm, the standard deviation of the height was 4.5 cm. Suppose the mean weight was 58 kg and the standard deviation of the weight was 5.1 kg. Suppose r = 0.6153. a) Find the equation of the best-fit line. Let x = height and y = weight b) If a woman was 174 cm tall, use the best-fit line from part (a) to predict her weight. c) How reliable is your prediction from part (b)? Explain using concepts about correlation. 21. Match the following graphs to the following correlations (one value of r will not be used) a) r = 0.97 b)r = -0.52 c) r = 0.76 d) r = 0.04 e) r = -0.96 1 2 3 4 22. (multiple choice) The points on a scatterplot lie very close to the line y = 4 - 3x. The correlation between x and y is close to... a) -3 b) -1 c) 0 d) 1 e) 4 23. Below is a scatterplot. a)Use the scatterplot below to estimate the slope of the least squares regression line b) Use the scatterplot below to estimate the y-intercept c) use parts (a) and (b) to write out the equation of least squares regression line Answers 1. Unemployment rate. This is the variable being used to try to explain and/or predict the crime rate. 2. crime rate. This is the variable in response to the unemployment rate. 3. The direction is positive. There is a moderate amount of scatter. The form looks curvilinear, but there are only 5 points, so it’s hard to establish any definite trends. 4. Mean unemployment rate: x = 14.1 percent SD of unemployment rate: sx = 5.34 percentage points 5. Mean of crime rate: y =155.2 incidents per million people SD of crime rate: sy = 39.32 incidents per million people 6. r = 0804. (rounded off). This means the correlation is fairly strong and the direction is positive. Below ae the calculations: unemployment crime rate (x) rate (y) 23.1 9.4 14.3 12.5 11.2 195 105 187 165 124 9 -4.7 0.2 -1.6 -2.9 39.8 -50.2 31.8 9.8 -31.2 358.2 235.94 6.36 -15.68 90.48 675.3 total divide by 5.34, then by 39.32, then 0.804 by 4 s 0.804 s 7. b r y x 39.32 5.92 5.34 a y bx 155.2 (5.92)(14.1) 71.7 The least-squares regression line is (using symbols) yˆ 71.7 5.92 x Or (using words) predicted crime rate = 71.1 + 5.92(unemployment rate) 8. The slope is 5.92. It means each time the unemployment increases by one percentage point, the crime rate will increase by 5.92 incidents per million people. 9. The y-intercept is 71.1. This means if the unemployment rate was zero, then the predicted crime rate would be 71.1 incidents per million people. 10. yˆ 71.7 5.92 x = 71.7 + (5.92)(13.6) = 152.2 The predicted crime rate would be 152.2 incidents per million people. 11. By looking at the scatterplot, there is a moderate to amount of scatter in the plot. The prediction would be somewhat accurate. However (this is just a side comment) the sample size of only 5 cities is not very large. With a larger sample we could have a more confident prediction. 12. Residual = y yˆ = 154.0 – 152.2 = 1.8 incidents per million people. The residual is positive, which means: The actual data is 1.8 units higher than the predicted amount. A negative residual would mean the actual data is lower than the predicted amount. 13. x 23.1 195 208.5 -13.5 Residual squared 182.25 105 127.4 -22.4 501.76 ̂ 𝒚 y 9.4 residual 14. No, because correlation does not imply causation. There could be other variables that cause both the unemployment rate and the crime rate to increase together. However, the scatterplot DOES tell us that there is some kind of relationship between unemployment rate and crime rate. 15. There are many possible answers. Here are some examples: level of education of the young men, drug use, disabilities. Anything that increases both the unemployment and the crime rate would be a confounding variable. 16. Any example where the unemployment rate is outside the range of the given data (less than 9% or more than 23%) would be extrapolation. This is not necessarily valid because there is no data to show the crime rate will continue to increase in the same way. 17. d 18. Direction is positive, strength is moderate to strong, form is linear 19. It would decrease because the outlier would create a greater amount of scatter and a weaker correlation. r is sensitive to outliers. 20. a) b r sy 0.6153 sx 5.1 0.69734 4.5 a y ax 58 0.69734(168) 59.15 yˆ 59.15 0.69734 x b) yˆ 59.15 0.69734 x = -59.15+0.69734*174 = 62.18 kg c) This is not too reliable because r is low. 21. 1. e 2. d 3. a 4. c 22. d 23. a) slope is about -0.8 b) y-intercept is about 85 c) yˆ 85 0.8 x
© Copyright 2026 Paperzz