Monday, February 13: 3.1 Scatterplots Candy Grab Activity Read 143–144 Why do we study relationships between two variables? What is the difference between an explanatory variable and a response variable? Read 145–149 How do you know which variable to put on which axis of a scatterplot? Where do you start each axis? What is the easiest way to lose points when making a scatterplot? (xkcd.com/833) 102 What four characteristics should you consider when interpreting a scatterplot? The following scatterplot shows the amount of sodium (in milligrams) and amount of fat (in grams) in salads from McDonalds (with no dressing). Describe the relationship between sodium and fat. HW #16 p159 (1, 2, 3, 4, 5, 7, 9, 11) Tuesday, February 14: 3.1 Correlation Read 150–151 What is the correlation r? What are some characteristics of the correlation? 103 Which association is more linear: one with r = 0.50 or one with r = 0.90? Is correlation a resistant measure of strength? Read 152–157 How do you calculate correlation? Here are data on the lengths in centimeters of the femur (a leg bone) and the humerus (a bone in the upper arm) for the five specimens of archaeopteryx that preserve both bones: Femur (x ): 38 56 59 64 74 Humerus (y ): 41 63 70 72 84 (a) Make a scatterplot. Describe what you see. (b) Find the correlation for these data. Does it agree with your description in part (a)? What are some additional facts about correlation? HW #16: page 159 (15–18, 21–24) 104 Wednesday, February 15: 3.2 Least-Squares Regression Lines Read 164–168 What is the general form of a regression equation? What is the difference between y and ? Don’t you hate it when you open a can of soda and some of the contents spray out of the can? Two AP Statistics students, Kerry and Danielle, wanted to investigate if tapping on a can of soda would reduce the amount of soda expelled after the can has been shaken. For their experiment, they vigorously shook 40 cans of soda and randomly assigned each can to be tapped for 0 seconds, 4 seconds, 8 seconds, or 12 seconds. Then, after opening the can and cleaning up the mess, the students measured the amount of soda left in each can (in ml). Here is a scatterplot with the least-squares regression line Predict the amount remaining for a can that has been tapped 10 seconds. Predict the amount remaining for a can that has been tapped for 60 seconds. How confident are you in this prediction? What is extrapolation? Is it a good idea to extrapolate? Interpret the slope and y intercept of the regression line. Read 168–172 What is a residual? How do you interpret a residual? 105 Calculate and interpret the residual for the can that was tapped for 4 seconds and had 260 ml of soda remaining. Here are data on some of McDonald’s beef sandwiches x= Carbs (g) y = Fat (g) 31 9 33 12 34 23 37 19 40 26 40 42 45 29 37 24 38 28 How can we determine the “best” regression line for a set of data? Is the least-squares regression line resistant to outliers? Calculate the equation of the least-squares regression line using technology. Make sure to define variables! Sketch the scatterplot with the graph of the least-squares regression line. Interpret the slope and y-intercept in context. Calculate and interpret the residual for the Big Mac, with 45g of carbs and 29g of fat. HW #17: page 163 (27–32), page 193 (39–42, 45, 46) 106 Friday, February 17: 3.2 Assessing a Regression Line Read 172–176 What is a residual plot? What is the purpose of a residual plot? What do you look for in a residual plot? How can you tell if a linear model is appropriate? Construct and interpret a residual plot for the Candy Grab data. Read page 177–181 In the Candy Grab activity, we started with only one variable: number of starburst grabbed. To predict how many candies a randomly selected student would grab, we decided it would be easier if we used another variable to help us make the prediction (hand span) rather than just the mean number of starburst. How much better will our predictions be if we use hand span to predict number of starburst rather than just the mean number of starburst? We can answer this question two ways: s and r2 . What is the standard deviation of the residuals s ? How do you calculate and interpret it? For the candy grab data, the standard deviation of the residuals is s = _______. Interpret this value. 107 Instead of comparing the standard deviations, we can also compare the sum of squared residuals using each model. How do you interpret r2, the coefficient of determination? How is r 2 related to r? How is r 2 related to s? HW #18: page 194 (47, 49, 51, 56, 57) Monday, February 20: Interpreting Computer Output, Regression to the Mean (90 min) Read 181–182 When Mentos are dropped into a newly opened bottle of Diet Coke, carbon dioxide is released from the Diet Coke very rapidly, causing the Diet Coke to be expelled from the bottle. Will more Diet Coke be expelled when there is a larger number of Mentos dropped in the bottle? Two statistics students, Brittany and Allie, decided to find out. Using 16 ounce (2 cup) bottles of Diet Coke, they dropped either 2, 3, 4, or 5 Mentos into a randomly selected bottle, waited for the fizzing to die down, and measured the number of cups remaining in the bottle. Then, they subtracted this measurement from the original amount in the bottle to calculate the amount of Diet Coke expelled (in cups). 1.45 0.10 1.40 0.05 Residual Amount Expelled 1.35 1.30 1.25 0.00 1.20 -0.05 1.15 1.10 -0.10 1.05 2.0 2.5 3.0 3.5 Mentos 4.0 4.5 5.0 1.15 1.20 1.25 Fitted Value 1.30 1.35 108 (a) What is the equation of the least-squares regression line? Define any variables you use. Predictor Constant Mentos Coef 1.00208 0.07083 S = 0.0672442 SE Coef 0.04511 0.01228 R-Sq = 60.2% T 22.21 5.77 P 0.000 0.000 R-Sq(adj) (b) Interpret the slope of the least-squares regression line. (c) What is the correlation? (d) Is a linear model appropriate for this data? Explain. (e) Would you be willing to use the linear model to predict the amount of Diet Coke expelled when 10 mentos are used? Explain. (f) Calculate and interpret the residual for bottle of diet coke that had 2 mentos and lost 1.25 cups. (g) Interpret the values of r 2 and s. (h) If the amount expelled was measured in ounces instead of cups, how would the values of r 2 and s be affected? Explain. 109 Read 182–185 How can you calculate the equation of the least-squares regression line using summary statistics? What happens to the predicted value of y for each increase of 1 standard deviation in x? What is regression to the mean? HW #19 page 195 (59*, 61, 63, 65, 71–78) Predictor Coef Constant 266.07 Breeding pairs -6.650 S = 7.76227 *output below for ebook users SE Coef T 52.15 1.736 R-Sq = 74.6% P 5.10 -3.83 0.004 0.012 R-Sq(adj) = 69.5% Tuesday, February 21: Putting it all Together: Regression and Correlation / QUIZ (90 min) Read 185–191 Does it matter which variable is x and which is y? Which of the following has the greatest correlation? 110 How do outliers affect the correlation, least-squares regression line, and standard deviation of the residuals? Are all outliers influential? Here is a scatterplot showing the cost in dollars and the battery life in hours for a sample of netbooks (small laptop computers). What effect do the two netbooks that cost $500 have on the equation of the least-squares regression line, correlation, standard deviation of the residuals, and ? Explain. Here is a scatterplot showing the relationship between the number of fouls and the number of points scored for NBA players in the 2010-2011 season. a) Describe the association. b) Should NBA players commit more fouls if they want to score more points? Explain. HW #20 page 202 Chapter 3 Review Exercises 111 Monday, February 27: Chapter 12 Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement, or do better students simply choose to sit in the front? To investigate, an AP Statistics teacher randomly assigned students to seat locations in his classroom for a particular chapter and recorded the test score for each student at the end of the chapter. The explanatory variable in this experiment is which row the student was assigned (Row 1 is closest to the front and Row 7 is the farthest away). Do these data provide convincing evidence that sitting closer causes students to get higher grades? Row 1: 76, 77, 94, 99 Row 2: 83, 85, 74, 79 Row 3: 90, 88, 68, 78 Row 4: 94, 72, 101, 70, 79 Row 5: 76, 65, 90, 67, 96 Row 6: 88, 79, 90, 83 Row 7: 79, 76, 77, 63 100 Score 90 80 70 Predictor Constant Row 60 1 2 3 4 Row 5 6 7 Coef 85.706 -1.1171 S = 10.0673 SE Coef 4.239 0.9472 R-Sq = 4.7% T 20.22 -1.18 P 0.000 0.248 R-Sq(adj) = 1.3% 1. Describe the association shown in the scatterplot. 2. Using the computer output, determine the equation of the least-squares regression line. 3. Calculate the value of the correlation. 4. Calculate and interpret the residual for the student who sat in Row 1 and scored 76. 5. Interpret the slope of the least-squares regression line. 112 6. Interpret the standard deviation of the residuals. 7. Interpret the value of r 2 . 8. Explain why it was important to randomly assign the students to seats rather than letting each student choose his or her own seat. 9. What other variables did the teacher control? What is the purpose of keeping these variables the same? 10. Does the negative slope provide convincing evidence that sitting closer causes higher achievement, or is it plausible that the association is due to the chance variation in the random assignment? Let’s do a simulation to find out! Ask me about the Flash Cards! 113 Tuesday, February 28: 12.1 Significance Tests for β Read 742–746 What is the regression model when the conditions for inference are met? Suppose that the true regression line for the seating chart study is ŷ = 87 – 1x with σ = 10 and that the regression model is valid. What is the probability that someone sitting in the second row will get at least 90 on the test? What are the conditions for regression inference? How do we check them? Read 753–757 What is the standard error of the slope? Is this on the formula sheet? How do you interpret it? For the seating chart experiment, state and interpret the standard error of the slope. 114 What is the standardized test statistic for a significance test for the slope? Is this formula on the formula sheet? What degrees of freedom should you use? What is the evidence for Ha and the two explanations in the crying and IQ example? Do customers who stay longer at buffets give larger tips? Charlotte, an AP statistics student who worked at an Asian buffet, decided to investigate this question for her second semester project. While she was doing her job as a hostess, she obtained a random sample of receipts, which included the length of time (in minutes) the party was in the restaurant and the amount of the tip (in dollars). Output from a linear regression analysis on these data is shown below. Predictor Constant Time (minutes) S = 1.77931 Coef 4.535 0.03013 SE Coef 1.657 0.02448 R-Sq = 13.2% T 2.74 1.23 P 0.021 0.247 R-Sq(adj) = 4.5% Time (minutes) 23 39 44 55 61 65 67 70 74 85 90 99 Tip (dollars) 5.00 2.75 7.75 5.00 7.00 8.88 9.01 5.00 7.29 7.50 6.00 6.50 (a) What is the equation of the least-squares regression line for predicting the amount of the tip from the length of the stay? Define any variables you use. (b) Interpret the slope and y intercept of the least-squares regression line in context. 115 (c) Do these data provide convincing evidence that customers who stay longer give larger tips? Can you use your calculator to conduct a test for the slope? What is p-hacking? http://fivethirtyeight.com/features/science-isnt-broken/ HW #21: page 761 (1, 2, 13, 15) 116 Wednesday, March 1: Confidence Intervals for β (half-day) Read 747–751 What is the formula for constructing a confidence interval for a slope? Where do you get the value of t*? How many degrees of freedom should you use? For their second-semester project, two AP Statistics students decided to investigate the effect of sugar on the life of cut flowers. They went to the local grocery store and randomly selected 12 carnations. All the carnations seemed equally healthy when they were selected. When the students got home, they prepared 12 identical vases with exactly the same amount of water in each vase. They put one tablespoon of sugar in 3 vases, two tablespoons of sugar in 3 vases, and three tablespoons of sugar in 3 vases. In the remaining 3 vases, they put no sugar. After the vases were prepared and placed in the same location, the students randomly assigned one flower to each vase and observed how many hours each flower continued to look fresh. Here are the data and computer output. Predictor Constant Sugar (tbs) Coef 181.200 15.200 SE Coef 3.635 1.943 T 49.84 7.82 S = 7.52596 R-Sq = 86.0% R-Sq(adj) = 84.5% P 0.000 0.000 Sugar (tbs.) 0 0 0 1 1 1 2 2 2 3 3 3 Freshness (hours) 168 180 192 192 204 204 204 210 210 222 228 234 Construct and interpret a 99% confidence interval for the slope of the true regression line. Can you use your calculator for the DO step? HW #22: page 759 (3–11 odd, 17, 19–24) 117 Thursday, March 3: Chapters 3/12.1 Review A random sample of 11 high schools was selected from Michigan. The percent of students who qualify for free/reduced lunch and the average SAT math score of each high school in the sample were recorded. Here are a scatterplot with the least-squares regression line, a residual plot, and some computer output. Predictor Coef SE Coef T P Constant 577.9 12.5 46.16 0.000 Foot length -1.993 0.276 -7.22 0.000 S = 23.3168 R-Sq = 85.29% R-Sq(adj) = 83.66% (a) Describe the association in the scatterplot. (b) What is the equation of the least-squares regression line that describes the relationship between percent free/reduced lunch and average SAT math score? Define any variables that you use. (c) Interpret the slope and y intercept of the least-squares regression line. (d) Calculate and interpret the residual for East Kentwood High School, with 58% FRL and 490 SAT. (e) By about how much do the actual average SAT math scores typically vary from the values predicted by the least-squares regression line with x = percent free/reduced lunch? 118 (f) Interpret the value of r2. (g) Find the correlation. (h) What effect does the school with 19% FRL and 545 SAT have on the correlation? The standard deviation of the residuals? The equation of the least-squares regression line? Explain. (i) Do these data provide convincing evidence that schools with a greater percentage of students that qualify for free/reduced lunch have smaller average SAT scores? (j) Construct and interpret a 95% confidence interval for the true slope of the least-squares regression line relating average SAT score to percent of students that qualify for free/reduced lunch. Is this interval consistent with your decision in the previous question? Explain. HW #23: page 795 (1–4), page 797 (1, 3–8, 11) Monday, March 6: Chapters 3/12.1 Review HW #24: page 203 Chapter 3 AP Practice Test (1–13) Tuesday, March 7: Chapters 3/12.1 Test 119 Wednesday, March 8: 12.2 Transformations to Achieve Linearity—Power Models Read 765–771 When associations are non-linear, what two approaches can we take to model the associations? Which approach will we use? What is a power model? What are some examples of power models? What does a country’s income per person (measured in gross domestic product per person, adjusted for purchasing power) say about the under-5 child mortality rate (per 1000 live births) in that country? Here are the data for a random sample of 14 countries in 2009 (data from www.gapminder.org). (a) Sketch a scatterplot and describe why it would not be appropriate to compute a least-squares regression line. Country Switzerland Timor-Leste Uganda Ghana Peru Cambodia Suriname Armenia Sweden Niger Serbia Kenya Fiji Grenada Income Per Person 38,004 2476 1202 1383 7859 1831 8199 4523 32,021 643 10,005 1494 4016 8827 Under5 Mortality Rate 4.4 56.4 127.5 68.5 21.3 87.5 26.3 21.6 2.8 160.3 7.1 84 17.6 14.5 (b) What kind of power model might be appropriate to use for these data? (c) Use an appropriate power transformation to make the association linear. Sketch the resulting scatterplot, calculate the equation of the least-squares regression line, and sketch the residual plot. (d) Predict the child mortality rate for the United States, who has an income per person of 41,256. HW #25 page 785 (31–34) **NOTE: On 33 and 34, make sure to use BOTH models. 120 Friday, March 10: Transforming with Logs: Power and Exponential Models Read 771–774 Besides using power transformations, how can you linearize an association the follows a power model in the form y = axp? (e) Refer to child mortality example. Use an appropriate logarithmic transformation to make the association linear. Sketch the resulting scatterplot, calculate the equation of the least-squares regression line, and sketch the residual plot. (f) Predict the child mortality rate for the United States, who has an income per person of 41,256. Read 774–782 What is an exponential model? What are some examples of exponential models? How can you linearize a relationship that follows an exponential model? 121 In the April 1, 2011 edition of the Arizona Daily Star, the following data was presented about mandatory fees at the University of Arizona. (a) Letting x = 4 represent 2004-05, graph a scatterplot. Does the relationship look linear? Year Fees 2004-05 89 2005-06 93 2006-07 160 2007-08 213 2008-09 257 2009-10 302 2010-11 623 2011-12 913 (b) Sketch a scatterplot of ln(fees) vs. year, calculate the equation of the least-squares regression line, and sketch a residual plot. (c) Could a power model be better? Sketch a scatterplot of ln(fees) vs. ln(year), calculate the equation of the least-squares regression line, and sketch a residual plot. (d) Based on your answers to (b) and (c), would an exponential model or a power model be more appropriate for this data? Explain. (e) Use the model you chose in part (d) to predict the fees for the 2012–13 school year. Do you expect this prediction to be too low, too high, or just about right? Explain. HW #26: page 787 (35–41 odd) 122 Monday, March 20: Prop 301 Test (sub) HW #27 Article and Questions Tuesday, March 21: Chapter 12.2 Review A student opened a bag of M&M’s, dumped them out, and ate all the ones with the M on top. When he finished, he put the remaining 30 M&M’s back in the bag and repeated the same process over and over until all the M&M’s were gone. Here is a table showing the number of M&M’s remaining at the end of each “course”. (a) Sketch a scatterplot of this data. Does the relationship look like it follows a linear, exponential, or power model? Course 1 2 3 4 5 6 7 M&M’s remaining 30 13 10 3 2 1 0 (b) A scatterplot of the natural log of the number of M&M’s remaining versus course number is shown below. The last observation in the table is not included since ln(0) is undefined. Explain why it would be reasonable to use an exponential model to describe the relationship between the number of M&M’s remaining and the course number. 3.5 lnRemaining 3.0 2.5 2.0 1.5 1.0 0.5 0.0 1 2 3 4 Course 5 6 (c) Minitab output from a linear regression analysis on the transformed data is shown below. Give the equation of the least-squares regression line defining any variables you use. Predictor Constant Course Coef 4.0593 -0.68073 S = 0.198897 SE Coef 0.1852 0.04755 R-Sq = 98.1% T 21.92 -14.32 P 0.000 0.000 R-Sq(adj) = 97.6% (d) Use your model from part (c) to predict the original number of M&M’s in the bag. 123 (e) Calculate a 95% confidence interval for the slope of the least-squares regression line using the transformed data. Assume all the conditions for inference have been met. (f) Super-fun bonus question! In theory, the number of M&M’s remaining after each course should be half of the number left after the previous course. Is your confidence interval in part (e) consistent with this theory? HW #28: page 789 (43, 45, 47–50), page 796 (5, 6) Wednesday, March 22: Chapter 12.2 Review / FRAPPY Friday, March 24: Quiz on 12.2 124
© Copyright 2026 Paperzz