CH 15 Regression and Correlation 15.1 Stochastic Relationships and Scatter Diagrams Two Quantitative Variables Plot observed data on a graph. Horizontal (X axis) independent variable Vertical (Y axis) dependent variable We call the graph a scatter diagram or scatter plot. Example X = Dosage of Drug Y = Reduction in Blood Pressure X 100 200 300 400 500 Y 10 18 32 44 56 Scatter Diagram Note: the strong relationship between X and Y Perfect positive linear correlation 50 40 C1 30 20 10 0 0 10 20 30 C2 40 50 Perfect negative linear correlation 50 40 C1 30 20 10 0 0 10 20 30 40 50 C2 Positive linear correlation 50 40 C1 30 20 10 0 0 10 20 30 40 50 30 40 50 C2 Negative linear correlation 50 40 C1 30 20 10 0 0 10 20 C2 Non-linear correlation 30 C2 20 10 0 0 10 20 30 40 50 C1 No correlation 50 40 C2 30 20 10 0 0 10 20 30 40 50 C1 15.9 The Correlation Coefficient We wish to quantify the strength of a linear relationship with the Pearson correlation coefficient, r. r ( x x )( y y ) ( x x) ( y y) 2 2 n x n xy x y 2 x n y 2 y 2 2 Positive r suggests large values of X and Y occur together and that small values of X and Y occur together. Ex: Experience and Salary Negative r suggests large values of one variable tend to occur with small values of the other variable. Ex: Weight of a car and gas mileage -1 <= r <= 1 r = 1 All data is exactly on a straight line with positive slope r = -1 All data is exactly on a straight line with negative slope r = 0 no linear relationship The stronger the linear relationship the larger |r| is. Example Back to the Dosage of a Drug and Blood pressure data x 100 200 300 400 500 x 1500 r n x y 10 18 32 44 56 y 160 x2 10,000 40,000 90,000 160,000 250,000 x 2 550,000 n xy x y 2 x n y 2 y 2 2 559,800 1500160 xy 1,000 3,600 9,600 17,600 28,000 xy 59,800 5550,000 1500 56,520 160 2 y2 100 324 1,024 1,936 3,136 y 2 6,520 2 59,000 3,500,000,000 .099728 Existence of correlation does not imply a cause and effect relationship. Yields of tomatoes and beans have positive correlation (Driving force is weather) 15.2 The Simple Linear Regression Model Purpose of linear regression: To predict the value of a difficult to measure variable, Y(response variable), based on an easy to measure variable, X(explanatory variable). Example Predict reaction time from blood alcohol level In order to use linear regression, make sure the model is reasonable. The points on the scatter plot should fall around a straight line and the correlation coefficient should be strong. If the model is not reasonable, do not fit a straight line. The linear regression model is: Y b 0 b1 x where b 0 is the y-intercept and b1 is the slope Example Back to the dosage of drug and reduction in blood pressure data The strong relationship between these variables has been established. We will now predict the Reduction of Blood Pressure, y, based on the Dosage of Drug, x. Regression Plot Y = -3.4 + 0.118X R-Sq = 99.5 % 60 Pressure 50 40 30 20 10 100 200 300 400 500 Drug Notice b 0 3.4 and b1 0.118 Lets predict the Reduction in Blood Pressure if 250 is the Dosage of Drug. y 3.4 0.118x y 3.4 0.118(250) y 26.1 Interpolation – Predicting Y values for X values that are within the range of the scatter plot. (This is what regression should be used for) Extrapolation – Predicting Y values for X values beyond the range of the observations. (This should not be done using a basic regression model it is a complex problem) The R 2 value is the percent of the variation of y explained by the model. Example For the dosage of drug and blood pressure example R 2 0.99728 2 .995 99.5% 0% R 2 100% The higher R 2 is, the better the model is. 15.3 Method of Least Squares The regression model expresses Y as a function of X plus random error. Random error reflects variation in Y values among items or individuals having the same X value. Draw picture We need a line that is the “best” fit for our data. We will use the method of least-squares. This says that the sum of the squares of the vertical distances from the points to the line is minimized. Draw picture It can be shown that ( x x )( y y ) n xy x y b1 2 n x 2 x (x x)2 b0 y b1 x y b x 1 n Note: The least-squares line can be affected greatly by extreme data points. Example n xy x y 559,800 1500160 59,000 b1 0.118 2 2 500,000 5550,000 1500 n x 2 x b0 y b x 160 0.1181500 17 3.4 1 n 5 5 Notice these match the model we used before Residual – the difference between an actual value and the fitted value e y b0 b1 x Example Residual for the point (400, 44) e y b0 b1 x 44 3.4 0.118400 44 43.8 0.2
© Copyright 2026 Paperzz