10/2/2014 Find the mean and standard deviation of the x’s and y’s using 2-var stats. 2-Variable Statistics Now that we have used one variable statistics to “store” our necessary numbers, let’s learn another way that’s even better☺ ☺ Find the mean and standard deviation of the x’s and y’s using 2-var stats. x y 21 6 18 9 30 3 35 4 Use this when using your lists to find r. x y 21 6 18 9 30 3 35 4 Find the correlation Coefficient: x y 4 6 8 15 15 22 19 18 22 27 = Find the correlation Coefficient: x 32 40 30 18 15 25 y 27 82 34 14 1 22 = Zx Zy 4.558674571 = 0.912 5 Zx*Zy Zx Zy Zx*Zy 3.599887921 = 0.900 4 Find the correlation Coefficient: x 2 8 10 14 28 32 18 y 72 60 64 52 43 40 32 = Zx Zy Zx*Zy −4.868894211 = −0.811 6 1 10/2/2014 A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman dates. Draw & discuss the scatterplot and calculate the correlation coefficient. Women (x) Men (y) 66 64 A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms. Then she measures the next man each woman dates. Draw & discuss the scatterplot and calculate the correlation coefficient. Women (x) Men (y) 72 66 72 0 1.1859 0 68 64 68 -0.9535 -0.3953 0.3769 66 70 66 70 0 0.3953 0 65 68 65 68 -0.4767 -0.3953 0.1884 70 71 70 71 1.9069 0.7906 1.5076 65 65 65 65 -0.4767 -1.581 0.7538 = 2.826668855 0.565 5 Guess the correlation coefficient http://istics.net/stat/Correlations/ Linear Regression Can we make a Line of Best Fit Regression Line When a scatterplot shows a linear relationship, we’d like to Want: 1) The distances to the line to be the same. 2) The smallest distances. summarize the overall pattern by drawing a line on the scatterplot. A regression line summarizes the relationship between two variables, but only in a specific setting: when one of the variables helps explain or predict the other. Regression – unlike scatter plots – requires that we have an explanatory variable and a response variable. 2 10/2/2014 Regression Line Let’s try some! This is a line that describes how a response variable (y) changes http://illuminations.nctm.org/ActivityDetail.aspx?ID=146 as an explanatory variable (x) changes. It’s used to predict the value of (y) for a given value of (x). The regression line is a model for the data. Regression Line When given the response variable (y) and the explanatory variable (x), the regression line relating y to x has equation of the following form: Predicted Value: ) – The ( predicted value of y for a given value of x. y-intercept: (a) - the predicted value of the y when x is 0. yˆ = a + bx Slope: (b) – the amount by which y is predicted to change when x increases by 1 unit. Use the regression line to answer the = 18773 − 86.18( !"#$%"&') following. Slope The predicted price of the car decreases by $86.18 for every additional thousand miles driven. y -intercept The following data shows the number of miles driven and advertised price for 11 used Honda CR-Vs from the 2002-2006 model years (prices found at www.carmax.com). The scatterplot below shows a strong, negative linear association between number of miles and advertised cost. The correlation is -0.874. The line on the plot is the regression line for predicting advertised price based on number of miles. Thousand Miles Driven Cost (dollars) 22 29 35 39 45 49 55 56 69 70 86 17998 16450 14998 13998 14599 14988 13599 14599 11998 14450 10998 18 16 14 12 10 20 30 40 50 60 70 80 90 ThousandMilesDriven Predict the price for a Honda with 50,000 miles. (Use 50 in equation!) = 18773 − 86.18( !"#$%"&') The predicted cost ($18,773) of a used Honda 2002 to 2006 CR-V with 0 miles. = ,-../ − -0. ,- 12 ()*+ =$14, 464. ()*+ 3 10/2/2014 Extrapolation This refers to using a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. They are not usually very accurate predictions. Should we predict the asking price for a used 2002-2006 Honda CR-V with 250,000 miles? No! We only have data for cars with between 22,000 and 86,000 miles. We don’t know if the linear pattern will continue beyond these values. In fact, if we did predict the asking price for a car with 250 thousand miles, it would be −$2772! Residual A residual is the difference between an observed value of the response variable and the value predicted by the regression line. Slope: 3&$ = 40. 45678&& 6'7589574096"'8:. Y-int: ; − 7 = 100. 45678&&89510096"'67#75. Predict weight after 16 wk ;< = 100 = 40 16 = 74096"' Predict weight at 2 years: 2;' = 1048:';;< = 100 = 40 104 = 426096"'(6#$!79.4$! ?') This is unreasonable and is a result of extrapolation. Example The equation of the least-squares regression line for the sprint time and long-jump distance data is predicted long-jump distance = 304.56 – 27.63 (sprint time). Find and interpret the residual for the student who had a sprint time of 8.09 seconds. residual = observed y – predicted y = /2C. 10D.. 0/ -. 2E = -,. 2/)B*+A (+@)*+@@)AB*+ residual = ; − ;< Regression Let’s see how a regression line is calculated. '?!6& = ; − ;< '?!6& = 151 − 81.03 '?!6& = 69.97 This student jumped 69.97 inches farther than we expected based on his sprint time. Fat vs Calories in Burgers Fat (g) 19 31 34 35 39 39 43 Calories 410 580 590 570 640 680 660 4 10/2/2014 Let’s standardize the variables Fat Cal z - x's 19 410 -1.959 -2 31 580 -0.42 -0.1 34 590 -0.036 0 35 570 0.09 -0.2 39 640 0.6 0.56 39 680 0.6 1 43 660 1.12 0.78 Let’s clarify a little. (Just watch & listen) The equation for a line that passes through the origin can be written with just a slope & no intercept: ; = "G But, we’re using z-scores so our equation should reflect this and thus it’s z y = mz x z - y's F The line must contain the point Many lines with different slope pass through the origin. Which one fits our data the best? That is, which slope determines the line that minimizes the sum of the squared residuals. F ( x, y ) and pass through the origin. Line of Best Fit –Least Squares Regression Line Let’s find it. (just watch & soak it in) Note: MSR is “Mean Squared Residual” It’s the line for which the sum of the squared residuals is smallest. We want to find the mean squared residual. H+A)@IJ = KLA+M+@ − N+@)*+@ MSR = ∑( z y − mz x ) 2 ⌢ z y = mz x 2 St. Dev of z scores is 1 so variance is 1 also. y Slope – rise over run O3P = 1 − 2" + "Q −b 2a A slope of r for z-scores means that for every increase of 1 standard deviation in z x , there is an increase of r standard deviations in z y. “Over 1 and up r.” Translate back to x & y values – “over one standard deviation in x, up r standard deviations in y. Slope of the regression line is: Hence – the slope of the best fit line for z-scores is the correlation coefficient → r. since O3P = 1 − 2" + "Q Continue…… This gives us 2 n −1 ∑ zx z y + m2 ∑ zx 2 − 2m n −1 n −1 n −1 MSR = 1 − 2mr + m 2 This is r! Focus on the vertical deviations from the line. −(−D) R= = D(,) ) n −1 z ∑ ( y 2 − 2mz x z y + m2 z x 2 ) ∑z MSR = Since this is a parabola – it reaches it’s minimum at x = ⌢ − zy n −1 ∑( z MSR = MSR = y b= rs y sx 5 10/2/2014 Why is correlation “r” Let’s Write the Equation from algebra y = mx + b Because it was calculated from the regression of y on x after standardizing the variables – just like we have just done – thus he used r to stand for (standardized) regression. b0 y-intercept b1 slope y = b0 + b1 x Slope: #S = Fat (g) Calories 19 410 31 580 34 590 35 570 39 640 39 680 43 660 'T 0.961(89.815) = = 11.056 7.804 'U Explain the slope: Your calories increase by 11.056 for every gram of fat. Now for the final part – the equation! Now it can be used to predict. Y-intercept: Remember – it has to pass through the point . How many calories do I expect to find in a hamburger that y = b0 + b1 x ( x, y has 25 grams of fat? ) Solve for y-intercept: F = L + L, F F = D,2. E1C + ,,. 210 /C. D-1.,CDE F = 1-E. 00 F − L, F = D,2. E1C L2 = = ,2,. /D-1 − E. DE10 YIM)MJH+ = ,2,. /D-1 − E. DE10(Z+B[JJ − − Y*\)R+) Try another problem Interpret the slope: The survival rate will decrease by 9.2956 for every one minute of call-to-shock. Mean call to-shock time Survival Rate 2 90 6 45 7 30 9 5 12 2 = −3.84030233 = −0.960 4 Interpret the y-int: 'T = −9.2956 'U The survival rate is 101.3285 when there is NO call to shock time. #V = ;W − #S G̅ = 101.3285 Predict the survival rate for a 10 min. call to shock time #S = = ,2,. /D-1 − E. DE10 YIM)MJH+ = ,2,. /D-1 = E. DE10(Z+B[JJ − − Y*\)R+) = ,2,. /D-1 − E. DE10 ,2 = -. /.D1R)BI+A Predict the survival rate for a 20 min. call to shock time = ,2,. /D-1 − E. DE10 D2 = −-C. 1-/1R)BI+A] ^(J)B 6 10/2/2014 Try another problem verbal = 1.05 (math) + 20.7 Interpret the slope: SAT Math SAT Verbal 600 650 720 800 540 600 450 500 620 620 Interpret the y-int: Predict the verbal score for a math score of 400 Predict the verbal score for a math score of 500 That’s…all…..Folks! Homework: p. 191 (27-32, 35, 37,39,41, 47) 7
© Copyright 2026 Paperzz