CHAPTER 3 BIVARIATE DATA Day 0: 3.1 Describing Relationships READ PAGES 141-149 & COMPLETE THE TWO EXAMPLES AFTER THE CHAPTER 2 TEST ***BE READY TO START DAY 1 ON TUESDAY, OCTOBER 21st*** Read 141-142 Why do we study relationships between two variables? To make predictions, to explain phenomena Read 143-144 on page 143, describe how we used these principles in chapters 1-2 explanatory variables also help predict changes in the response variable Alternate Example: Identify the explanatory and response variables a) amount of rain; weed growth b) winning percentage of a basketball team; attendance at games c) resting pulse rate; amount of daily exercise Read 144-149 Scatterplots are the only choice for displaying relationships between two Q variables How do you know which variable to put on which axis? Where do you start each axis? Always plot the explanatory variable, if there is one, on the horizontal axis (the x-axis) of a scatterplot. As a reminder, we usually call the explanatory variable the x and the response variable y. If there is no explanatory-response distinction, either variable can go on the horizontal axis. Not necessarily at (0,0)! What is the easiest way to lose points when making a scatterplot? Not labeling the x and y axis and the scale. Alternate Example: Track and Field Day! The table below shows data for 13 students in a statistics class. Each member of the class ran a 40-yard sprint and then did a long jump (with a running start). Make a scatterplot of the relationship between sprint time (in seconds) and long jump distance (in inches). Sprint Time (s) Long Jump Distance (in) 5.41 171 5.05 184 9.49 48 8.09 151 7.01 90 7.17 65 6.83 94 6.73 78 8.01 71 5.68 130 5.78 173 6.31 143 6.04 141 Day 1: Describing Relationships What four characteristics should you consider when interpreting a scatterplot? Direction: But not an absolute relationship Form: linear, non-linear--don’t let one or two points sway you! Clusters (show two positive assoc clusters that form a neg assoc overall) Strength: how closely the points follow the form Outliers: outside the overall pattern (no specific rule) Does a strong association between two variables indicate a cause-and-effect relationship? AP Calculus exams and global warming Ice cream sales and drowning deaths (Page 148)Other explanations for manatee deaths: changes in food supply, population getting larger, but death rate staying the same. Alternate Example: The following scatterplot shows the amount of carbs (in grams) and amount of fat (in grams) of beef sandwiches from McDonalds. (a) Describe the relationship between carbs and fat. (b) Is there a cause-and-effect relationship between carbs and fat? Explain. Read 149-150: Using technology to create scatterplots How to use different marking symbols HW #1: page 158 (1, 5, 7, 9, 11) Day 2: 3.1 Correlation Just like two distributions can have the same shape and center with different spreads, two associations can have the same direction and form, but very different strengths. Read 150-151 What is the correlation r? Correlation measures direction and strength of a LINEAR association between two quantitative variables. It does not assess form! What are some characteristics of the correlation? –1 r 1 r < 0 means negative association, r > 0 positive association r close to 0 means weak r close to 1 means strong Can you determine the form of a relationship using only the correlation? No – You should always look at a graph of the data as well. Is correlation a resistant measure of strength? An “outlier” in the pattern increases r An “outlier” out of the pattern decreases r Read 152-155 Do you need to know the formula for correlation? NO! You should use your calculator to calculate r. Read 155-156 What are some additional facts about correlation? Correlation makes no distinction between the explanatory and response variables Because r uses the standardized values of the observations, r does not change when we change the units of x, y, or both. The correlation r itself has no unit of measure. It is just a number. Correlation does not describe a curved relationship. HW #2: page 160 (15-18, 21, 27-32) Day 3: 3.2 Least-Squares Regression Read 164-167 What is the general form of a regression equation? What is the difference between y and ŷ ? 𝑦̂ = a + bx 𝑦̂ (read “y hat”) is the predicted value of the response variable for a given value of the explanatory variable x. b is the slope, the amount by which y is predicted to change when x is increased by one unit. a is the y-intercept, the predicted value of y when x = 0. y is the actual value of the response variable for a given value of the explanatory variable x. Formula sheet on AP exam Alternate Example: Used Hondas The following scatterplot shows the number of miles driven (in thousands) and advertised price (in thousands) for 11 used Honda CR-Vs from the 2002-2006 model years. The regression line shown on the scatterplot is ŷ = 18773 – 0.08618x. Note that the y intercept doesn’t appear on the graph. Why? a) Interpret the slope and y intercept of a regression line. Miles Driven 22000 29000 35000 39000 45000 49000 55000 56000 69000 70000 86000 Price 17998 16450 14998 13998 14599 14988 13599 14599 11998 14450 10998 b) Predict the price of a used CR-V with 50,000 miles. c) Predict the price of a used CR-V with 250,000 miles. How confident are you in this prediction? What is extrapolation? Is it a good idea to extrapolate? Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate and therefore it is not a good idea to extrapolate. Alternate Example: Using the Track and Field data from earlier, the equation of the least-squares regression line is ŷ = 305 – 27.6x where y = long jump distance and x = sprint time. a) Interpret the slope. b) Does it make sense to interpret the y-intercept? Explain. Read 168-171 What is a residual? How do you interpret a residual? Vertical deviations that represent “leftover” variation in the response variable after fitting the regression line. Residual = Actual – Predicted (AP) Connection with standard deviation and z-scores Must have size and direction of the error (overpredicts/underpredicts the y value by this much) Calculate and interpret the residual for the Honda CR-V with 86,000 miles and an asking price of $10,998. How can we determine the “best” regression line for a set of data? The line that makes the sum of the residuals as small as possible. (Least-Squares Regression Line-LSRL) Is the least-squares regression line resistant to outliers? Absolutely Not. The LSRL is influenced by outliers. Alternate Example: McDonalds Beef Sandwiches. Carbs (g) Fat (g) 31 9 33 12 34 23 37 19 40 26 40 42 45 29 37 24 38 28 (a) Calculate the equation of the least-squares regression line using technology. Make sure to define variables! Sketch the scatterplot with the graph of the least-squares regression line. (b) Interpret the slope and y-intercept in context. (c) Calculate and interpret the residual for the Big Mac, with 45g of carbs and 29g of fat. HW #3: page 191 (35, 37, 39, 41, 45) Day 4: 3.2 More about Residuals Read 174-178 What is a residual plot? What is the purpose of a residual plot? A residual plot is a scatterplot of the residuals against the explanatory variable x The purpose of a residual plot is to investigate FORM! It helps us assess how well a regression line fits the data. What two things do you look for in a residual plot? How can you tell if a linear model is appropriate? Leftover patterns Size of residuals Linear model is appropriate if the residual plot has no pattern. Tube-like scatterplot around the zero line. Construct and interpret a residual plot for the Honda CR-V data. What is the standard deviation of the residuals? How do you calculate and interpret it? S = how far “off” the prediction values are on average We calculate s using one variable statistics for the residual list. Calculate and interpret the standard deviation of the residuals for the Honda CR-V data. Suppose that you see a used Honda CR-V for sale. Predict the asking price for this CR-V. Day 4: 3.2 More about Residuals Read 179-181 What is the coefficient of determination r2? How do you calculate and interpret r2? R2 is another numerical quantity that tells us how well the least-squares regression line predicts values of the response variable y. Interpret - ____% or proportion of the variation in (response variable) is explained by the linear model relating (response variable) to (explanatory variable). How is r 2 related to r? How is r 2 related to s? r 2 is literally r times r or r-squared. r 2 and s both measure how well the least-squares regression line models the data (how much scatter there is from the least-squares regression line) s is measured in the units of the response variable, r2 is on a standard scale Using the McDonalds beef sandwich data: a) Construct and interpret a residual plot b) Calculate and interpret the standard deviation of the residuals c) Calculate and interpret r 2 . HW #4 page 191 (53, 55, 57, 61) Day 5: Chapter 3 Review (so far) Using the top ten money winners from the 2009 LPGA Tour, we can investigate the relationship between average driving distance and driving accuracy using a scatterplot. Here we will use average driving distance (in yards) as the explanatory variable and driving accuracy (proportion of drives that land in the fairway) as the response variable. 1. Player Jiyai Shin Lorena Ochoa Ai Miyazato Cristie Kerr Na Yeon Choi Suzann Pettersen Yani Tseng In-Kyung Kim Paula Creamer Anna Nordqvist Average Driving Distance 246.8 265.2 254.3 263.7 255.5 268.1 269.2 249.3 248.6 245.7 Driving Accuracy 0.824 0.718 0.757 0.714 0.733 0.660 0.654 0.748 0.811 0.770 Explain why average driving distance should be the explanatory variable. 2. Draw a scatterplot for this association and discuss the noticeable features. 3. Calculate the equation of the least squares regression line and graph it on the scatterplot. 4. Interpret the slope and y-intercept in the context of the problem. 5. Calculate and interpret the value of the correlation coefficient. 6. If the distance was measured in feet instead of yards, how would the correlation change? Explain. 7. Calculate and interpret the residual for Lorena Ochoa. 8. Sketch the residual plot. What information does this provide? 9. Calculate and interpret the value of s in the context of the problem. 10. Calculate and interpret the value of r 2 in the context of the problem. 11. If you were to use the percentage of shots that land in the fairway instead of the proportion of shots that land in the fairway, how would the values of r 2 and s change? 12. Predict the driving accuracy for an LPGA golfer with an average driving distance of 300 yards. How confident are you in your prediction? Explain. HW #6 page 193 (54, 56, 58, 59, 60) Day 6: 3.2 Computer Output/ Regression Wisdom Read 181-183 Alternate Example: Mentos and Diet Coke When Mentos are dropped into a newly opened bottle of Diet Coke, carbon dioxide is released from the Diet Coke very rapidly, causing the Diet Coke to be expelled from the bottle. Will more Diet Coke be expelled when there is a larger number of Mentos dropped in the bottle? Two statistics students, Brittany and Allie, decided to find out. Using 16 ounce (2 cup) bottles of Diet Coke, they dropped either 2, 3, 4, or 5 Mentos into a randomly selected bottle, waited for the fizzing to die down, and measured the number of cups remaining in the bottle. Then, they subtracted this measurement from the original amount in the bottle to calculate the amount of Diet Coke expelled (in cups). Output from a regression analysis is shown below. 1.45 0.10 1.40 0.05 1.30 Residual Amount Expelled 1.35 1.25 0.00 1.20 -0.05 1.15 1.10 -0.10 1.05 2.0 2.5 3.0 3.5 Mentos 4.0 4.5 5.0 (a) What is the equation of the least-squares regression line? Define any variables you use. 1.15 Predictor Constant Mentos 1.20 1.25 Fitted Value Coef 1.00208 0.07083 S = 0.0672442 = 58.4% 1.30 SE Coef 0.04511 0.01228 R-Sq = 60.2% 1.35 T 22.21 5.77 P 0.000 0.000 R-Sq(adj) (b) Interpret the slope of the least-squares regression line. (c) What is the correlation? (d) Is a linear model appropriate for this data? Explain. (e) Calculate and interpret the residual for bottle of diet coke that had 2 mentos and lost 1.25 cups. (f) Interpret the values of r 2 and s. Read 183-188 Does it matter which variable is x and which is y? Not for correlation, yes for regression. What should you always do before calculating the correlation or least-squares regression line? Look at the scatterplot!!!!! How do outliers affect the correlation, least-squares regression line, and standard deviation of the residuals? Are all outliers influential? CORRELATION LSRL S Here is a scatterplot showing the cost in dollars and the battery life in hours for a sample of netbooks (small laptop computers). What effect do the two netbooks that cost $500 have on the equation of the least-squares regression line, correlation, standard deviation of the residuals, and r 2 ? Explain. Here is a scatterplot showing the relationship between the number of fouls and the number of points scored for NBA players in the 2010-2011 season. a) Describe the association. b) Should NBA players commit more fouls if they want to score more points? Explain. HW #7 page 194 (63, 65, 71-78) OUTLIERS Day 7: 3.2 Regression to the Mean Read 172-174 How can you calculate the equation of the least-squares regression line using summary statistics? These are on the formula sheet! What happens to the predicted value of y for each increase of 1 standard deviation in x? HW #8 page 198 Chapter 3 Review Exercises Day 8: Review & QUIZ Day 9: Review & FRAPPY MONDAY, NOVEMBER 3rd: CHAPTER 3 TEST TUESDAY, NOVEMBER 4th: Cumulative Review WEDNESDAY, NOVEMBER 5th: TEST (CHAPTERS 1-4) No Retest Available
© Copyright 2026 Paperzz