Class 5 Multiple Regression Models Multiple Regression Models • We can readily imagine that there may be several factors that we can include in our model to explain test scores. GPA 2.6 3.4 2.8 2 2.2 3.1 3.3 3 3.5 3.7 Hours of Study Test Score 1 65 2 73 2 73 3 75 4 81 5 87 6 92 6 96 5 98 8 100 Using EXCEL • The procedure is the same: tools/data analysis/regression. • Note that the independent variables have to be in contiguous columns. • The F-test now tests to see if all of the variables are explaining variation in y. • The problem becomes tricky because the degree to which a variable appears to be important in explaining the variation in y depends on the other variables present! Hypothesis Testing • The F-test tests to see if all of the coefficients of the independent variables are zero. For our model: H 0 : 1 2 0 H1 : i 0 for at least one i 1,2 • The t-test tests to see if each coefficient of an independent variables is zero. H0 : i 0 H1 : i 0 Some Final Comments • The first step in building a regression model is to develop a list of candidate variables. • Notice that measurement might be a problem. • Note that the t-test now takes on an important role. But all you need are the pvalues! • Examination of residuals may provide clues about other factors that you have left out. Adding Qualitative Factors • Qualitative factors can be added to the model through the use of dummy variables. • Consider the following data: Salary ($1,000's) 35 33 42 41 45 43 40 46 48 Gender Female Female Female Female Male Male Male Male Male Is there information available that shows discrimination based on gender? Coding the Data • We can add the gender factor by coding a variable in the following way: • If Female then x = 1, • If Male then x = 0. • What does our model say about salary? E(y) = expected salary = 0 + 1x Doing the Analysis • After doing the regression analysis, what hypothesis should we test? • Is there another way of doing this test? From prior material? Coding Variables with More than Two Levels • Consider the following data set. How would you code the qualitative factor MPG additive for the model? 30 The additives were added to the gasoline and resulted in the following miles per gallon (MPG). Is there a difference in the additives? What model should we build to check this? Be careful about what the model implies! 32 28 30 22 27 27 24 30 37 34 35 Additive A A A A B B B B C C C C Coding Qualitative Variables-Summary • The coding of dummy variables depends upon the number of levels that the qualitative factor has. For k levels, use k-1 dummy variables. The case where k=5: Level Dummy 1 Dummy 2 Dummy 3 Dummy 4 1 0 0 0 0 2 1 0 0 0 3 0 1 0 0 4 0 0 1 0 5 0 0 0 1 This adds four variables to the model (four columns in your spreadsheet). More on Dummy Variables • Of course, these dummy variables just define different populations of which we are comparing the means. • If there are only two populations (one dummy variable), you can use the pooled ttest. • In a regression model, we have the luxury of including other factors! Controlling for other factors! More on Dummy Variables • If you have only a set of dummy variables (like the fuel additive problem), you can use ANOVA.
© Copyright 2026 Paperzz