Lecture 12 - Wharton Statistics Department

Stat 112: Lecture 19 Notes
• Chapter 7.2: Interaction Variables
• Thursday: Paragraph on Project Due
Interaction
• Interaction is a three-variable concept. One of
these is the response variable (Y) and the other
two are explanatory variables (X1 and X2).
• There is an interaction between X1 and X2 if the
impact of an increase in X2 on Y depends on the
level of X1.
• To incorporate interaction in multiple regression
model, we add the explanatory variable
( X.1  X1 ) * ( X 2  X 2 ) There is evidence of an
interaction if the coefficient on ( X1  X1 ) * ( X 2  X 2 )
is significant (t-test has p-value < .05).
An experiment to study how noise affects the performance of children tested second
grade hyperactive children and a control group of second graders who were not
hyperactive. One of the tasks involved solving math problems. The children solved
problems under both high-noise and low-noise conditions. Here are the mean scores:
Mean Mathematics Score
250
200
150
High Noise
Low Noise
100
50
0
Control
Hyperactive
X1 
Let
Y=Mean
Mathematics
Score,
Type of Child (0= Control, 1 = Hyperactive),
X2
=Type of Noise (0= Low Noise, 1= High Noise). There is an interaction between
type of child and type of noise: Impact of increasing noise from low to high depends on
the type of child.
Interaction variables in JMP
• To add an interaction variable in Fit Model
in JMP, add the usual explanatory
variables first, then highlight X1 in the
Select Columns box and X 2 in the
Construct Model Effects Box. Then click
Cross in the Construct Model Effects Box.
• JMP creates the explanatory variable
( X1  X1 ) * ( X 2  X 2 )
Interaction Example
• The number of car accidents on a stretch of highway
seems to be related to the number of vehicles that travel
over it and the speed at which they are traveling.
• A city alderman has decided to ask the county sheriff to
provide him with statistics covering the last few years
with the intention of examining these data statistically so
that she can introduce new speed laws that will reduce
traffic accidents.
• accidents.JMP contains data for different time periods on
the number of cars passing along the stretch of road, the
average speed of the cars and the number of accidents
during the time period.
Interactions in Accident Data
Response Accidents
Parameter Estimates
Term
Intercept
Cars
Speed
(Speed-60.0017)*(Cars-9.935)
Estimate
-0.852117
0.4154531
0.0644162
1.0763228
Std Error
7.314465
0.136048
0.118519
0.087791
t Ratio
-0.12
3.05
0.54
12.26
Prob>|t|
0.9077
0.0035
0.5889
<.0001
Eˆ ( Accidents | Cars  8, Speed  66)  Eˆ (Cars  8, Speed  65)  [0.852  0.415 * 8 
0.064 * 66  1.076 * (66  60.0017) * (8  9.935)]  [0.852  0.415 * 8  0.064 * 65 
1.076 * (65  66.0017) * (8  9.935)] 
0.064 * (66  65)  1.076 * (66  65) * (8  9.935)  2.02
Eˆ ( Accidents | Cars  11, Speed  66)  Eˆ (Cars  11, Speed  65)  [0.852  0.415 * 11 
0.064 * 66  1.076 * (66  60.0017) * (11  9.935)]  [0.852  0.415 * 11  0.064 * 65 
1.076 * (65  66.0017) * (11  9.935] 
0.064 * (66  65)  1.076 * (66  65) * (11  9.935)  1.21
Increases in speed have a worse impact on number of accidents when there are
a large number of cars on the road than when there are a small number of cars on
the road.
Notes on Interactions
• The need for interactions is not easily spotted
with residual plots. It is best to try including an
interaction term and see if it is significant.
• To understand better the multiple regression
relationship when there is an interaction, it is
useful to make an Interaction Plot. After Fit
Model, click red triangle next to Response, click
Factor Profiling and then click Interaction Plots.
Interaction Profiles
12
12.6
10
6
4
Cars
Accidents
8
Cars
2
0
7
-2
12
62.5
10
6
Speed
4
Speed
Accidents
8
2
0
56.6
-2
7 8 9 10
12
57 58 59 60 61 62 63
Plot on left displays E(Accidents|Cars, Speed=56.6), E(Accidents|Cars,Speed=62.5)
as a function of Cars. Plot on right displays E(Accidents|Cars=12.6), E(Accidents|
Cars,Speed=7) as a function of Speed. We can see that the impact of speed on
Accidents depends critically on the number of cars on the road.
Toy Factory Manager Data
Bivariate Fit of Time for Run By Run Size
Time for Run
300
250
200
150
50
100
150
200
Run Size
Squares = Manager A
+
= Manager B
x
= Manager C
250
300
350
Response Time for Run
Whole Model
Regression Plot
Model without Interaction
Time for Run
300
250
a
b
c
200
150
50
100
150
200
250
300
350
Run Size
Expanded Estimates
Nominal factors expanded to all levels
Term
Estimate
Intercept
176.70882
Run Size
0.243369
Manager[a]
38.409663
Manager[b]
-14.65115
Manager[c]
-23.75851
Std Error
5.658644
0.025076
3.005923
3.031379
2.995898
t Ratio
31.23
9.71
12.78
-4.83
-7.93
Prob>|t|
<.0001
<.0001
<.0001
<.0001
<.0001
This model assumes that the effect of increasing run size is the same
for each of the three managers.
Interaction Model
Response Time for Run
Expanded Estimates
Nominal factors expanded to all levels
Term
Intercept
Run Size
Manager[a]
Manager[b]
Manager[c]
Manager[a]*(Run Size-209.317)
Manager[b]*(Run Size-209.317)
Manager[c]*(Run Size-209.317)
Estimate
179.59191
0.2344284
38.188168
-13.5381
-24.65007
0.0728366
-0.097651
0.0248147
Std Error
5.619643
0.024708
2.900342
2.936288
2.887839
0.035263
0.037178
0.032207
t Ratio
31.96
9.49
13.17
-4.61
-8.54
2.07
-2.63
0.77
Eˆ (time _ for _ run | runsize  x, Manager  A) 
179.59  0.234 * x  38.188 *1  13.538 * 0  24.651* 0 
0.073 *1* ( x  209.317)  0.098 * 0 * ( x  209.317)  0.025 * 0 * ( x  209.317) 
179.59  0.234 * x  38.188  0.073 * ( x  209.317)
Eˆ (time _ for _ run | runsize  x, Manager  A)  (179.59  38.188  0.073 * 209.317)  (0.234  0.073) * x
Eˆ (time _ for _ run | runsize  x, Manager  B)  (179.59  13.538  0.098 * 209.317)  (0.234  0.098) * x
Eˆ (time _ for _ run | runsize  x, Manager  C )  (179.59  24.651  0.025 * 209.317)  (0.234  0.025 * x
Prob>|t|
<.0001
<.0001
<.0001
<.0001
<.0001
0.0437
0.0112
0.4444
Interaction Model in JMP
• To add interactions involving categorical
variables in JMP, follow the same procedure as
with two continuous variables. Run Fit Model in
JMP, add the usual explanatory variables first,
then highlight one of the variables in the
interaction in the Construct Model Effects box
and highlight the other variable in the interaction
in the Columns box and then click Cross in the
Construct Model Effects box.
Interaction Model
• Interaction between run size and Manager: The effect on
mean run time of increasing run size by one is different
for different managers.
Eˆ (time _ for _ run | runsize  x  1, Manager  A)  Eˆ (runsize  x, Manager  A)  0.234  0.073  0.307
Eˆ (time _ for _ run | runsize  x  1, Manager  B)  Eˆ (runsize  x, Manager  B)  0.234  0.098  0.136
Eˆ (time _ for _ run | runsize  x  1, Manager  C )  Eˆ (runsize  x, Manager  C )  0.234  0.025  0.259
• Effect Test for Interaction:
Effect Tests
Source
Run Size
Manager
Manager*Run Size
Nparm
1
2
2
DF
1
2
2
Sum of Squares
22070.614
43981.452
1778.661
F Ratio
90.0192
89.6934
3.6273
Prob > F
<.0001
<.0001
0.0333
• Manager*Run Size Effect test tests null hypothesis that
there is no interaction (effect on mean run time of
increasing run size is same for all managers) vs.
alternative hypothesis that there is an interaction
between run size and managers. p-value =0.0333.
Evidence that there is an interaction.
Eˆ (time _ for _ run | runsize  x, Manager  A)  202.498  0.307 * x
Eˆ (time _ for _ run | runsize  x, Manager  B)  186.565  0.136 * x
Eˆ (time _ for _ run | runsize  x, Manager  C )  149.706  0.259 * x
• The runs supervised by Manager A appear
abnormally time consuming. Manager b
has higher initial fixed setup costs than
Manager c (186.565>149.706) but has
lower per unit production time
(0.136<0.259).
Interaction Profile Plot
300
Time
for Run
345
Run Size
200
Run Size
250
58
150
a
300
Time
for Run
c
b
Manager
200
Manager
250
150
100
150
200
250
300
350
400
a
b c
Lower left hand plot shows mean time for run vs. run size for the three managers
a, b and c.
Interactions Involving Categorical
Variables: General Approach
• First fit model with an interaction between
categorical explanatory variable and continuous
explanatory variable. Use effect test on
interaction to see if there is evidence of an
interaction.
• If there is evidence of an interaction (p-value
<0.05 for effect test), use interaction model.
• If there is not strong evidence of an interaction
(p-value >0.05 for effect test), use model without
interactions.