Multiple Regression Models

Class 5
Multiple Regression Models
Multiple Regression Models
• We can readily imagine that there may be
several factors that we can include in our
model to explain test scores.
GPA
2.6
3.4
2.8
2
2.2
3.1
3.3
3
3.5
3.7
Hours of Study Test Score
1
65
2
73
2
73
3
75
4
81
5
87
6
92
6
96
5
98
8
100
Using EXCEL
• The procedure is the same: tools/data
analysis/regression.
• Note that the independent variables have to be in
contiguous columns.
• The F-test now tests to see if all of the
variables are explaining variation in y.
• The problem becomes tricky because the
degree to which a variable appears to be
important in explaining the variation in y
depends on the other variables present!
Hypothesis Testing
• The F-test tests to see if all of the
coefficients of the independent variables are
zero. For our model:
H 0 : 1   2  0
H1 :  i  0 for at least one i  1,2
• The t-test tests to see if each coefficient of
an independent variables is zero.
H0 : i  0
H1 :  i  0
Some Final Comments
• The first step in building a regression model
is to develop a list of candidate variables.
• Notice that measurement might be a problem.
• Note that the t-test now takes on an
important role. But all you need are the pvalues!
• Examination of residuals may provide clues
about other factors that you have left out.
Adding Qualitative Factors
• Qualitative factors can be added to the
model through the use of dummy variables.
• Consider the following data:
Salary ($1,000's)
35
33
42
41
45
43
40
46
48
Gender
Female
Female
Female
Female
Male
Male
Male
Male
Male
Is there information
available that shows
discrimination based
on gender?
Coding the Data
• We can add the gender factor by coding a
variable in the following way:
• If Female  then x = 1,
• If Male  then x = 0.
• What does our model say about salary?
E(y) = expected salary = 0 + 1x
Doing the Analysis
• After doing the regression analysis, what
hypothesis should we test?
• Is there another way of doing this test?
From prior material?
Coding Variables with More than
Two Levels
• Consider the following data set. How
would you code the qualitative factor
MPG
additive for the model?
30
The additives were added to the gasoline
and resulted in the following miles per
gallon (MPG). Is there a difference in
the additives? What model should we
build to check this? Be careful about
what the model implies!
32
28
30
22
27
27
24
30
37
34
35
Additive
A
A
A
A
B
B
B
B
C
C
C
C
Coding Qualitative Variables-Summary
• The coding of dummy variables depends
upon the number of levels that the
qualitative factor has. For k levels, use k-1
dummy variables. The case where k=5:
Level Dummy 1 Dummy 2 Dummy 3 Dummy 4
1
0
0
0
0
2
1
0
0
0
3
0
1
0
0
4
0
0
1
0
5
0
0
0
1
This adds
four variables
to the model
(four columns
in your
spreadsheet).
More on Dummy Variables
• Of course, these dummy variables just
define different populations of which we are
comparing the means.
• If there are only two populations (one
dummy variable), you can use the pooled ttest.
• In a regression model, we have the luxury
of including other factors!
Controlling for other factors!
More on Dummy Variables
• If you have only a set of dummy variables
(like the fuel additive problem), you can use
ANOVA.