Math 141 - Lecture 23: More Dummy Variables: Interactions

Math 141
Lecture 23: More Dummy Variables: Interactions
Albyn Jones1
1 Library
304
[email protected]
www.people.reed.edu/∼jones/courses/141
Albyn Jones
Math 141
Analysis of Covariance
We will study the relationship between the linear models below:
Model
R formula
Y = β0 + βx X + Y ∼X
Y = β0 + βx X + βd D + Y ∼X +D
Y = β0 + βx X + βd D + βx:d (X · D) + Y ∼X ∗D
where the X is numeric and D is a dummy variable.
Albyn Jones
Math 141
Example: Berkeley Longitudinal Study, 1
> B.lm1 <- lm(ht18 ˜ ht2, data = Berkeley)
> summary(B.lm1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.1203
26.6572
1.205
0.233
ht2
1.5998
0.3031
5.278 2.2e-06
--Residual standard error: 7.572
on 56 degrees of freedom
Albyn Jones
Math 141
One Line to Fit them All!
ht18 ~ ht2
190
●
●
●
●
●
●
●
●
180
●
●
●
●
●●
●
●
●
●
●
●
ht18
●
●
170
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
160
●
●
●
●
●
●
82
84
86
88
ht2
Albyn Jones
Math 141
90
92
94
Example: Berkeley Longitudinal Study, take 2
> B.lm2 <- lm(ht18 ˜ ht2 + sex, data = Berkeley)
> summary(B.lm2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.4174
16.4052
3.012 0.00391
ht2
1.3416
0.1873
7.162 2.05e-09
sexMale
12.0192
1.2356
9.727 1.49e-13
--Residual standard error: 4.633
on 55 degrees of freedom
Albyn Jones
Math 141
Parallel Lines: No Interaction
ht18 ~ ht2 + sex
190
●
●
●
●
●
●
●
●
180
●
●
●
●
●●
●
●
●
●
●
●
ht18
●
●
170
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
160
●
●
●
●
●
●
82
84
86
88
ht2
Albyn Jones
Math 141
90
92
94
Parallel Lines: Making the PLot
> coef(B.lm2)
(Intercept)
49.417437
ht2
1.341649
sexMale
12.019233
with(Berkeley,plot(ht2, ht18, pch=19, col=
ifelse(Berkeley$sex==’Male’,’blue’,’red’)))
abline(49.41, 1.34, lty=2, lwd=2, col=’red’)
abline(49.41 + 12.02, 1.34,
lty=2, lwd=2, col=’blue’)
title(’ht18 ˜ ht2 + sex’)
Albyn Jones
Math 141
The ht2 by sex Interaction Model
> B.lm3 <- lm(ht18 ˜ ht2 * sex, data = Berkeley)
> summary(B.lm3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.7541
20.9285
2.616
0.0115
ht2
1.2806
0.2391
5.356 1.79e-06
sexMale
-2.2400
34.3193 -0.065
0.9482
ht2:sexMale
0.1619
0.3895
0.416
0.6792
--Residual standard error: 4.668
on 54 degrees of freedom
Albyn Jones
Math 141
Interaction Model: non-parallel lines
ht18 ~ ht2 * sex
190
●
●
●
●
●
●
●
●
180
●
●
●
●
●●
●
●
●
●
●
●
ht18
●
●
170
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
160
●
●
●
●
●
●
82
84
86
88
ht2
Albyn Jones
Math 141
90
92
94
Interaction: Making the PLot
> coef(B.lm3)
(Intercept)
ht2
54.7541433
1.2806343
sexMale ht2:sexMale
-2.2400286
0.1619488
with(Berkeley,plot(ht2,ht18,pch=19,col=
ifelse(Berkeley$sex==’Male’,’blue’,’red’)))
abline(54.75, 1.28, lty=2, lwd=2, col=’red’)
abline(54.75 -2.24, 1.28 + .16,
lty=2, lwd=2, col=’blue’)
title(’ht18 ˜ ht2 * sex’)
Albyn Jones
Math 141
Both Models
Both Models
190
●
●
●
●
●
●
●
●
180
●
●
●
●
●●
●
●
●
●
●
●
ht18
●
●
170
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
160
●
●
●
●
●
●
82
84
86
88
90
92
ht2
The dotted lines are for the interaction model.
Albyn Jones
Math 141
94
Interpretation of Coefficients
Y ∼ X : β0 is the intercept, βx is the slope.
Albyn Jones
Math 141
Interpretation of Coefficients
Y ∼ X : β0 is the intercept, βx is the slope.
Y ∼ X + D: β0 is the intercept for the baseline group (D = 0),
β0 + βd is the intercept for the other group (D = 1),
βx is the common slope.
Albyn Jones
Math 141
Interpretation of Coefficients
Y ∼ X : β0 is the intercept, βx is the slope.
Y ∼ X + D: β0 is the intercept for the baseline group (D = 0),
β0 + βd is the intercept for the other group (D = 1),
βx is the common slope.
Y ∼ X ∗ D: β0 is the intercept for the baseline group (D = 0),
βx is the slope for that group.
β0 + βd is the intercept for the other group (D = 1),
βx + βd:x is the slope for that group.
Albyn Jones
Math 141
What does Interaction mean?
0
−2
−1
Y
1
2
Interaction
−2
−1
0
1
2
X
Interaction: the distance between the two lines depends on
the X coordinate. Equivalently, the slope depends on the group.
Albyn Jones
Math 141
What does No Interaction mean?
0
−2
−1
Y
1
2
No Interaction
−2
−1
0
1
2
X
Additive or Parallel Lines: the distance between the two lines
is constant.
Albyn Jones
Math 141
Interpretation Again
Additive or No Interaction: The difference between the
groups is constant; it does not depend on the value of the
covariate X .
Additive or No Interaction: The slope for X does not depend
on group membership.
Interaction: The difference between the groups is not
constant; it varies with the value of the covariate X .
Interaction: The slope for X depends on group membership.
Albyn Jones
Math 141
Generic Interpretation
Additive or No Interaction: The response to one factor does
not depend on the value of the other.
Interaction: The response to one factor depends on the value
of the other.
Not Interaction!! Interaction does not mean the explanatory
variables are correlated with each other!!
Albyn Jones
Math 141
Analysis of Covariance
The models again:
Model
R formula
Y = β0 + βx X + Y ∼X
Y = β0 + βx X + βd D + Y ∼X +D
Y = β0 + βx X + βd D + βx:d (X · D) + Y ∼X ∗D
where the X is numeric and D is a dummy variable.
Albyn Jones
Math 141
Ananlysis of Covariance: Summary
For the model with parallel lines
Y = β0 + βx X + βd D + the coefficient for the dummy variable βd represents the
(constant!) distance between the lines.
For the interaction model,
Y = β0 + βx X + βd D + βx:d (X · D) + the coefficient for the dummy variable βd represents the
difference between the intercepts for the two lines; the
interaction coefficient βx:d represents the difference between
their slopes.
Albyn Jones
Math 141
What if there are more categories?
Suppose we have a variable called Group with 3 categories: A,
B and C.
R will automatically code two dummy variables: GroupB and
GroupC. The model
Y ∼ X + Group
represents 3 parallel lines.
The model
Y ∼ X ∗ Group
represents 3 arbitrary lines.
Albyn Jones
Math 141
Interactions between Numeric Variables
One can fit models like
lm(Y ˜ X*U)
# the same as
XU <- X*U
lm(Y ˜ X + U + XU)
where X and U are both numeric variables. This is really a
specialization of the full quadratic model:
X2 <- Xˆ2
U2 <- Uˆ2
XU <- X*U
lm(Y ˜ X+U+X2+XU+U2)
Albyn Jones
Math 141
Summary
3 types of models
Albyn Jones
Math 141
Summary
3 types of models
No Group dependence
Y ∼X
Albyn Jones
Math 141
Summary
3 types of models
No Group dependence
Y ∼X
Additive models
Y ∼X +G
Albyn Jones
Math 141
Summary
3 types of models
No Group dependence
Y ∼X
Additive models
Y ∼X +G
Interaction models
Y ∼X ∗G
Albyn Jones
Math 141