Unit 1 Review 1) Researchers are interested in whether or not administering a motivational interview prior to starting an exercise program affects the completion rate. 120 participants from an exercise center in Des Moines, Iowa were randomly divided into two groups. The intervention group received the motivational interview and the control group participated in a short class on an unrelated topic. Researchers reported the following: participant ID#, group assignment (1: intervention, 2: control), age, gender, height, weight, and completion status (completed program, failed to complete program) for each participant. a) Identify the W’s Who-Exercise Program participants What-(variables) Participant ID#, group assignment, age, gender, height, weight, completion status When-not given Where-Des Moines, Iowa Why-to see if the motivational interview affects completion rates. How-120 participants were randomly assigned to motivational interview or control group. Completion status was observed. b) For the variables above, indicate whether it should be treated as categorical or quantitative. If quantitative, identify the units in which it was measured. Participant ID# - Categorial Group assignment – Categorical Age-quantitative (presumably years) Gender – categorical Height – quantitative (units not given) Weight-quantitative (presumable lbs or kilograms) Completion status - categorical 2) The boxplots below summarize the distribution of salaries for four different states based on the same occupation. a) In what state are “typical” salaries lowest? NV b) Which states’ salaries appear to be skewed to the right? MA and NV c) Supposing cost of living was the same for all four states (we know that isn’t true), in which state would you prefer to work and why? MA because “typical” salaries are highest 3) The table below shows whether students in an introductory statistics class like dogs and/or cats. (De Veaux et al., 2012) Like Dogs a) What percent of students like dogs? 304 90.75% 335 Like Cats Yes No Total Yes 194 110 304 No 21 10 31 b) What percent of students like dogs and cats? 194 57.91% 335 c) What percent of students like dogs but not cats? 110 32.84% 335 d) What is the marginal distribution (counts and percents) of “liking dogs”? Like Dogs: Yes No Total 304 31 335 90.75% 9.25% 100% e) What percent of students who like dogs also like cats? 194 63.82% 304 f) What percent of students who like cats do not like dogs? 21 9.77% 215 g) What percent of students like dogs or cats? 194 110 21 215 304 194 325 97.01% 335 335 335 Like Cats h) What is the conditional distribution of “liking dogs” for students who like cats? Like Dogs: Yes 194 No 21 Total 215 90.2% 9.8% 100% Total 215 120 335 Do Not Like Cats i) What is the conditional distribution of “liking dogs” for students who do not like cats? Like Dogs: Yes 110 No 10 Total 120 91.7% 8.3% 100% j) Do “liking dogs” and “liking cats” appear to be independent? Give statistical evidence to support your conclusion. The conditional distributions of liking dogs are very similar for students who like and don’t like cats. Therefore, I would say that these variables appear to be independent. 4) The following data represent unit exam scores from a group of 20 students: 85 97 70 76 0 92 65 91 84 85 77 93 90 81 68 78 87 95 89 72 a) Find the 5 number summary for the distribution. from TI: Min Q1 Med Q3 Max 0 74 84.5 90.5 97 b) Find the mean, standard deviation, range and IQR. x 78.75 range=97 sx 20.74 1QR=90.5-74=16.5 c) Create a boxplot for these data. Calculate upper and lower fences. Label the plot. Upper fence: Q3 1.5(1QR) 90.5 1.516.5 115.25 Lower fence: Q1 1.5(1QR) 74 (1.5)(16.5) 49.25 74 Plot: 0 * 0 65 25 50 75 84.5 90.5 97 100 d) What would be the most appropriate measure of center and spread for this distribution? Why? Center: Median Spread: 1QR Why? Because the distribution has an outlier (and a slight skew to the left). 5) Suppose one of the authors of our textbook collected the times (in minutes) it took him to run 4 miles on various courses during a 10-year period. Here is a histogram of the times: (De Veaux et al., 2014) a) Describe this histogram’s shape. Be sure to mention modality, symmetry, and unusual features. This distribution is unimodel, skewed to the right and has a gap between 33.5 and 34 minutes. b) Based on this shape, would you expect the mean or median to be higher? Why? Mean. The mean will be pulled in the direction of the skew. c) What would be the most appropriate measure of center and spread for this distribution? Why? Median, 1QR because the distribution is skewed. d) Approximately how many times did the author run 4 miles in 33 minutes or more? Approximately 5 4 1 1 11 times we would know more precisely if the bar frequency counts were labeled . 5) Jennifer took the ACT and scored a 29. She also took the SAT and scored a 1350 (Critical Reading + Mathematics). ACT scores have a mean of 21.0 and a standard deviation of approximately 4.7. SAT scores have a mean of 1026 and a standard deviation of 209. On which test did she perform relatively better? ACT SAT x 29 x 1350 x 21.0 x 1026 s 4.7 s 209 Z ACT x x 29 21 s 4.7 Z ACT 1.70 x x 1350 1026 s 209 1.55 She performed relatively better on the ACT. 7) The costs for standard veterinary services at a local animal hospital follow a Normal model with a mean of $90 and a standard deviation of $30. (De Veaux et al., 2012) 90 Diagrams are recommended. 30 N 90, 30 a) What percentage of standard veterinary bills will be greater than $175? Normalcdf (175,1 99, 90, 30) .0023 0.23% 90 175 b) What percentage of standard veterinary bills will be between $80 and $150? Normalcdf (80,150, 90, 30) .6078 60.78% 80 90 150 c) What would be the veterinary bill amount that separates the cheapest 25% of visits for standard services? invNorm(.25,90,30) 69.765 $69.77 .25 d) What is the range of costs for the middle 70% of standard visits? invNorm (.15, 90, 30) 58.91 invNorm (.85, 90, 30) 121.09 the range is $58.91 to $121.09 15% 70% e) What is the IQR for the costs of standard veterinary services? invNorm (.25, 90, 30) 69.765 invNorm (.75, 90, 30) 110.235 1QR=Q3-Q1=110.235-69.765=$40.47 25% 50% 8) A machine fills boxes of pasta according to a Normal model with mean 16.2 ounces and standard deviation of 0.1 ounces. N(16.2,0.1) a) If the boxes claim to have 16 ounces (1 lb) of pasta each, what percent of boxes are under-filled? Normalcdf ( 1 99,16,16.2, 0.1) .02275 2.28% are underfilled 16 16.2 b) What is the z-score of a box of pasta that contains only 15.92 ounces? z x 15.92 16.2 2.8 0.1 c) How many ounces of pasta does a box contain if its z-score is 1.7? x 16.2 1.7 x 16.37oz 0.1 d) Explain in your own words: What does a z-score indicate? A z-score tells us how many standard deviations a value is above (+) or below (-) the mean. 9) Describe the scatterplots in terms of shape (linear/non-linear), direction, and strength. linear negative moderate/weak linear positive strong 10) Several scatterplots are given with calculated correlations. Which is which? c) 0.004 d) 0.753 a) -0.944 b) -0.435 11) Georgetown University reports the average GPA of graduating seniors for a selection of years: 1974 3.05 1980 3.13 1987 3.22 1994 3.29 1999 3.29 2002 3.37 2005 3.42 2006 3.42 Source: Committee on Intellectual Life, Georgetown University, March 2007 a) What would be appropriate choices for the explanatory and response variables? why? explanatory-year response-GPA It is not possible for GPA to affect the year, but it is possible that GPA’s have changed over time due to a variety of factors b) Create and describe the scatterplot in terms of form, direction, and strength. Is a linear model appropriate? Strong, positive and linear. Yes, a linear model is appropriate. c) Compute the correlation coefficient. In what ways is the correlation coefficient consistent with your description in part b? r .988 - this is consistent with a strong, positive linear association. d) Compute the linear model to 5 decimal places. y 0.01112 x 18.90183 * if you round too much this will affect f) e) Interpret the slope and y-intercept in the context of the scenario. Slope: The model predicts that for every additional year, average GPA will increase by 0.01112 y-intercept: The model suggests that in the year 0, GPA would have been -18.90. Clearly the model does not apply that far back in time. f) Use the linear model to predict the average GPA for the year 2015. Round to 2 decimal places. y 0.01112 2015 18.90183 3.50 g) Suppose the actual average GPA of graduating seniors in the year 2015 is 3.48. Compute and interpret the residual. e y y 3.48 3.5 .02 Residual: Interpretation: The model over-estimated the GPA of graduation seniors by 0.02
© Copyright 2024 Paperzz