Elementary Statistics and Inference 22S:025 or 7P:025 Lecture 13 1 Elementary Statistics and Inference 22S:025 or 7P:025 Chapter 10 2 11. Regression A. Introduction Consider the combination of height and weights for the 988 men age 18-24 from the Health and Nutrition Survey – HANES (see page 58) as shown in Figure 1 below. 3 1 11. Regression (cont.) 4 11. Regression (cont.) The average height = 70 inches SD = 3 inches The average weight = 180 pounds r = .40 SD = 45 pounds Th SD liline and The d a ““regression i liline”” are shown h on th the graph. Note the regression line has more “slant” than the SD line. The vertical strip identifies men who were one SD above average in height, and those who were also one SD above average in height are on the SD line. 5 11. Regression (cont.) The association between height and weight is for every weight, the associated height is: ⎛ S ⎞ ⎛ ⎞ S Y ′ = ⎜⎜ r Y ⎟⎟ X + ⎜⎜ Y − r Y X ⎟⎟ SX ⎠ ⎝ SX ⎠ ⎝ where m = r SY , b = Y − mX . SX This is called the regression line. Y ′ = mX + +b = ( slope ) X + interest 6 2 11. Regression (cont.) For a man who is 73 inches tall, the best estimate of his height can be determined as follows: m=r SY ⎛ 45 ⎞ = (.40)⎜ ⎟ = 6.0 SX ⎝ 3⎠ b = Y − mX = 180 − (6.0)(70) = −240 Y ′ = 6.0 X − 240 Y ′ = (6.0)(73) − 240 = 198 pounds See the explanation given by the authors on pages 158-159. 7 11. Regression (cont.) For men who were 1 SD above average in height (i.e., 73 inches) they were slightly less than 1 SD above average in weight (i.e., (r·SD of Y)), and if r<1.00, the estimated weight is “regressed” toward the average g Similarly, y, for men who were 1 SD below weight. average weight (i.e., 132 pounds), their estimated height is closer to the average height 8 11. Regression (cont.) regression line Weight 3 3 -3 180 (.40)(45)=18.0 (.40)(45) 18.0 (-.40)(45)=-18 67 70 73 Height 9 3 11. Regression (cont.) The estimated weight would be 180 + 18 pounds = 198 pounds, which is less than 1 SD above average weight (i.e., 180 + 45 = 225 pounds). This is called the “regression” effect. Suppose a man were 1 SD below average height (i.e., 67 inches), his estimated weight would be 180 – 18 = 162 pounds, which is less than 1 SD below average weight – again the estimate is “regressed” toward the average height. 10 11. Regression (cont.) The regression line is a line of estimated subpopulation means of Y for different values of X. We can use the regression equation to predict the value of Y for any given value of X within the range of X-values. The estimate of Y is essentiallyy the estimated average g Y for that group of persons with the same score on X. See Examples – (page 161) Exercise Set A #1, 2, 3, 4, 5 11 11. Regression (cont.) 12 4 11. Regression (cont.) B. The Graph of Averages For each subgroup of scores on (X), the regression line is determined to be the estimated line connecting the average values of (Y) for each of the different X X-values. values 13 11. Regression (cont.) 14 11. Regression (cont.) The regression line is determined to be the “best fitting” line to all the score points. The aggregate sum of the squared distances from each point to the line is minimum – that is, if any other line were used as the best estimate of Y-means the second line would not be such that the sum of the squared distances to the line would not be minimum. Each point on the regression line is an estimate of the average Y-values for subjects (men) with the same Xvalue – this is called an unbiased estimate of the “conditional mean” of Y given the value of X. 15 5 11. Regression (cont.) If the scatter diagram is not linear, the linear regression line Y′ = r S SY + (Y − r Y X ) SX SX is not the best fitting regression line (see illustration on page 163 of text). Exercise Set C – (page 167) #1, 2, 3 16 17 18 6 11. Regression (cont.) 19 11. Regression (cont.) 20 11. Regression (cont.) C. Two Regression Lines Suppose X = 20.5 S X = 4.0 Y = 2.8 S Y = 0.6 r = .50 Where X=ACT Composite Score Y=Freshman Grade Point Average 21 7 11. Regression (cont.) The regression of GPA on ACT Composite is: (.50)(.6) (.5)(.6) ⎞ ⎛ X + ⎜ 2.8 − (20.5) ⎟ 4.0 4.0 ⎠ ⎝ Y ′ = .75 X + (2.8 − 1.54) Y ′ = .075 X + 1.26 Y′ = 22 11. Regression (cont.) The regression of ACT Composite on GPA is: ⎛ (.50)(4.0) (2.8) ⎞⎟ (.50)(4.0) Y + ⎜⎜ 20.5 − ⎟ (.60) (.60) ⎝ ⎠ X ′ = 3.33Y + 11.17 X′= Note the difference in the equations! 23 11. Regression (cont.) 24 8 11. Regression (cont.) 25 11. Regression (cont.) 26 27 9 11. Regression (cont.) 28 29 11. Regression (cont.) 30 10 31 11
© Copyright 2026 Paperzz