Elementary Statistics and Inference Elementary Statistics and

Elementary Statistics and
Inference
22S:025 or 7P:025
Lecture 13
1
Elementary Statistics and
Inference
22S:025 or 7P:025
Chapter 10
2
11. Regression
A.
Introduction
Consider the combination of height and weights for the
988 men age 18-24 from the Health and Nutrition
Survey – HANES (see page 58) as shown in Figure 1
below.
3
1
11. Regression (cont.)
4
11. Regression (cont.)
„
The average height = 70 inches
SD = 3 inches
„
The average weight = 180 pounds
r = .40
SD = 45 pounds
„
Th SD liline and
The
d a ““regression
i liline”” are shown
h
on th
the
graph. Note the regression line has more “slant” than
the SD line.
„
The vertical strip identifies men who were one SD above
average in height, and those who were also one SD
above average in height are on the SD line.
5
11. Regression (cont.)
„
The association between height and weight is for every
weight, the associated height is:
⎛ S ⎞
⎛
⎞
S
Y ′ = ⎜⎜ r Y ⎟⎟ X + ⎜⎜ Y − r Y X ⎟⎟
SX ⎠
⎝ SX ⎠
⎝
where m = r
SY
, b = Y − mX .
SX
This is called the regression line.
ƒ Y ′ = mX + +b = ( slope ) X + interest
6
2
11. Regression (cont.)
„
For a man who is 73 inches tall, the best estimate of his
height can be determined as follows:
m=r
SY
⎛ 45 ⎞
= (.40)⎜ ⎟ = 6.0
SX
⎝ 3⎠
b = Y − mX = 180 − (6.0)(70) = −240
Y ′ = 6.0 X − 240
Y ′ = (6.0)(73) − 240 = 198 pounds
See the explanation given by the authors on pages
158-159.
7
11. Regression (cont.)
For men who were 1 SD above average in height (i.e.,
73 inches) they were slightly less than 1 SD above
average in weight (i.e., (r·SD of Y)), and if r<1.00, the
estimated weight is “regressed” toward the average
g
Similarly,
y, for men who were 1 SD below
weight.
average weight (i.e., 132 pounds), their estimated height
is closer to the average height
8
11. Regression (cont.)
regression line
Weight
3
3
-3
180
(.40)(45)=18.0
(.40)(45)
18.0
(-.40)(45)=-18
67
70
73
Height
9
3
11. Regression (cont.)
The estimated weight would be 180 + 18 pounds = 198
pounds, which is less than 1 SD above average weight
(i.e., 180 + 45 = 225 pounds). This is called the
“regression” effect.
Suppose a man were 1 SD below average height (i.e.,
67 inches), his estimated weight would be 180 – 18 =
162 pounds, which is less than 1 SD below average
weight – again the estimate is “regressed” toward the
average height.
10
11. Regression (cont.)
„
The regression line is a line of estimated subpopulation
means of Y for different values of X. We can use the
regression equation to predict the value of Y for any
given value of X within the range of X-values. The
estimate of Y is essentiallyy the estimated average
g Y for
that group of persons with the same score on X.
See Examples – (page 161) Exercise Set A #1, 2, 3, 4, 5
11
11. Regression (cont.)
12
4
11. Regression (cont.)
B.
The Graph of Averages
ƒ
For each subgroup of scores on (X), the regression line
is determined to be the estimated line connecting the
average values of (Y) for each of the different X
X-values.
values
13
11. Regression (cont.)
14
11. Regression (cont.)
„
The regression line is determined to be the “best fitting”
line to all the score points. The aggregate sum of the
squared distances from each point to the line is minimum
– that is, if any other line were used as the best estimate
of Y-means the second line would not be such that the
sum of the squared distances to the line would not be
minimum.
„
Each point on the regression line is an estimate of the
average Y-values for subjects (men) with the same Xvalue – this is called an unbiased estimate of the
“conditional mean” of Y given the value of X.
15
5
11. Regression (cont.)
„
If the scatter diagram is not linear, the linear regression
line
Y′ = r
S
SY
+ (Y − r Y X )
SX
SX
is not the best fitting regression line (see illustration on
page 163 of text).
Exercise Set C – (page 167) #1, 2, 3
16
17
18
6
11. Regression (cont.)
19
11. Regression (cont.)
20
11. Regression (cont.)
C.
Two Regression Lines
Suppose
X = 20.5
S X = 4.0
Y = 2.8
S Y = 0.6
r = .50
Where X=ACT Composite Score
Y=Freshman Grade Point Average
21
7
11. Regression (cont.)
„
The regression of GPA on ACT Composite is:
(.50)(.6)
(.5)(.6)
⎞
⎛
X + ⎜ 2.8 −
(20.5) ⎟
4.0
4.0
⎠
⎝
Y ′ = .75 X + (2.8 − 1.54)
Y ′ = .075 X + 1.26
Y′ =
22
11. Regression (cont.)
„
The regression of ACT Composite on GPA is:
⎛
(.50)(4.0) (2.8) ⎞⎟
(.50)(4.0)
Y + ⎜⎜ 20.5 −
⎟
(.60)
(.60)
⎝
⎠
X ′ = 3.33Y + 11.17
X′=
Note the difference in the equations!
23
11. Regression (cont.)
24
8
11. Regression (cont.)
25
11. Regression (cont.)
26
27
9
11. Regression (cont.)
28
29
11. Regression (cont.)
30
10
31
11