複迴歸迴歸常見問題 線性重合 虛擬變數 交叉變數 多次項變數 完全線性重合(perfect multicollinaerity) 如果多元迴歸模型中的解釋變數之間具有線性關 係: 是謂完全線性重合 亦即, 至少有一個解釋變數可以寫成其他解釋變數 的線性組合。一旦存在完全線性重合, 代表模型中 有一個多餘的變數, 使得迴歸係數的估計有認定上 的問題, 無法求算 值得注意的是, 我們所定義的完全線性重合係定義 在解釋變數的線性關係, 因此, 非線性關係如 則不構成完全線性重合問題 反之, 則具有完全線性重合問題(為什麼?) 完全線性重合EXCEL不會執行,反而不是問題。 問題在於高度線性重合 高度線性重合,亦即解釋變數中有數個相當類似 。 如何解決線性重合問題 Remove redundant explanatory variables. Re-express explanatory variables Do nothing if the explanatory variables are significant with sensible estimates. 26 of 46 Copyright © 2011 Pearson Education, Inc. Example : RETAIL PROFITS Motivation A chain of pharmacies is looking to expand into a new community. It has data for 110 cities on the following variables: income, disposable income, birth rate, social security recipients, cardiovascular deaths and percentage of local population aged 65 or more. 36 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Method Use multiple regression. The response variable is profit. Examine the correlation matrix and the scatterplot matrix. 37 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Method Several high correlations are present (shaded in table) and indicate the presence of collinearity. 38 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Method This partial scatterplot matrix identifies communities that are distinct from others. Linearity and no lurking variables conditions are met. 39 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Mechanics – Estimation Results 40 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Mechanics – Examine Plots These and other plots (not shown here) indicate that all MRM conditions are satisfied. 41 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Mechanics The F-statistic indicates that this collection of explanatory variables explains statistically significant variation in profits. The VIF’s indicate some explanatory variables are redundant and should be removed (one at a time) from the model. 42 of 46 Copyright © 2011 Pearson Education, Inc. 4M Example 24.2: RETAIL PROFITS Mechanics – Simplified Model This multiple regression separates the effects of birth rates from age (and income). It reveals that cities with higher birth rates produce higher profits when compared to cities with lower birth rates but comparable income and local population above 65. 43 of 46 Copyright © 2011 Pearson Education, Inc. 虛擬變數(dummy variables) 討論至此, 我們所探討的解釋變數均為連續隨機變 數 有時我們關心的解釋變數可能為間斷 譬如說, 回到阿中送貨的例子, 如果在外奔波時數 還會受到天氣影響, 則我們的解釋變數為 稱之為虛擬變數 虛擬變數 我們的模型變成 給定當天為晴天, 在外奔波時數的條件期望值為 給定當天為雨天, 在外奔波時數的條件期望值為 虛擬變數 兩者之差異 就是在 控制了其他變數後(給定相同的送貨路程與送貨點 個數), 天氣對於在外奔波時數的條件均數之影響 一般而言, 下雨天的視線不良, 路況不佳, 我們預期 平均而言在外奔波時數會增加, 亦即 > 0 虛擬變數 關於虛擬變數, 在給定迴歸模型存在截距項 的 情況下, 有一個重要的設定規則: 如果有m 種不同 屬性需要考慮, 則只能設定m − 1 個虛擬變數。 關於這樣的設定規則, 其背後的理由在於, 如果我 們設定了m 個虛擬變數, 在截距項 存在的情況 下, 將會造成完全線性重合問題 虛擬變數 回到阿中的例子。如果公司有四輛貨車(I, II, III,以 及IV 號車), 由於車況不同, 亦會影響在外奔波時數 , 則我們只能設定3 個虛擬變數: 虛擬變數的設定 Interaction Models Interaction Model With 2 Independent Variables • Hypothesizes interaction between pairs of x variables — Response to one x variable varies at different levels of another x variable • Contains two-way cross product terms E ( y ) 0 1 x1 2 x 2 3 x1 x 2 • Can be combined with other models — Example: dummy-variable model Effect of Interaction Given: E ( y ) 0 1 x1 2 x 2 3 x1 x 2 • Without interaction term, effect of x1 on y is measured by 1 • With interaction term, effect of x1 on y is measured by 1 + 3x2 — Effect increases as x2 increases Interaction Model Relationships E(y) = 1 + 2x1 + 3x2 + 4x1x2 E(y) E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 1 2 8 E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 4 0 0 0.5 1 1.5 x1 Effect (slope) of x1 on E(y) depends on x2 value Interaction Model Worksheet Case, i yi x1i 1 2 3 4 : 1 4 1 3 : 1 8 3 5 : x2i x1i x2i 3 3 5 40 2 6 6 30 : : Multiply x1 by x2 to get x1x2. Run regression with y, x1, x2 , x1 x2 Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05. Interaction Model Worksheet yi x1i 1 4 1 3 2 4 1 8 3 5 6 10 x2i x1i x2i 2 2 8 64 1 3 7 35 4 24 6 60 Multiply x1 by x2 to get x1x2. Run regression with y, x1, x2 , x1 x2 Excel Computer Output Solution Global F–test indicates at least one parameter is not zero F P-Value Interaction Test Solution • • • • • H0 : 3 = 0 Test Statistic: Ha: 3 ≠ 0 .05 df 6 - 4 = 2 Critical Value(s): Decision: Reject H0 .025 Reject H0 .025 -4.3027 0 4.3027 t Conclusion: Excel Computer Output Solution ˆ3 t sˆ 3 Interaction Test Solution • • • • • H0 : 3 = 0 Ha: 3 ≠ 0 .05 df 6 - 4 = 2 Critical Value(s): Reject H0 .025 Reject H0 .025 -4.3027 0 4.3027 t Interaction Test Solution Test Statistic: t = 1.8528 Decision: Do no reject at = .05 Conclusion: There is no evidence of interaction 虛擬變數+交叉變數 Does Wal-Mart discriminate against female employees? Are they paid less than men? Use multiple regression with a categorical explanatory variable representing gender to analyze pay data. Regression analysis can adjust the comparison between men and women to account for other variables that may affect pay. 3 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Example: Mid-Level Managers’ Salaries The average salary for women is $140,000 and the average salary for men is $144,700. 4 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Example: Mid-Level Managers’ Salaries The 95% confidence for the difference in mean salaries is $740 to $8,591 (since 0 is not in this interval, the difference is significant). Assume conditions for inference are satisfied. 5 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Without a randomized experiment, we must be careful about lurking variables that would account for the significant difference between average salaries (e.g., experience). Experience is a confounding variable if it is correlated with salary and the two groups (men and women) differ with regard to experience. 6 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Restrict analysis to a subset of cases with matching levels of the confounding variable (e.g., compare men and women with 5 years of experience). 7 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 The 95% confidence interval for the difference in average salaries between men and women within the subset of managers with 5 years experience includes 0 (the difference is not significant). However, the standard error of the difference is much larger; the cases in the subset do not produce a precise estimate. 8 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 What about the difference between average salaries for managers with 2, 10 or 15 years experience? Analysis of covariance: regression that combines categorical and numerical explanatory variables; adjusts the comparison of means for the effects of confounding variables. 9 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 10 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Simple regressions fit separately to men and women show that estimated salary rises faster with experience for women compared to men. 11 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Combining the separate regressions for men and women requires a dummy variable identifying whether a manager is male or female (Group = 1 for men; Group = 0 for women). Also requires the interaction term Group Years. An interaction term is the product of two explanatory variables in a regression model. 12 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Combining Regressions 13 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 Combining Regressions 14 of 47 Copyright © 2011 Pearson Education, Inc. 虛擬變數+交叉變數 The equation for the group coded as 0 in the dummy variable forms a baseline for comparison. The slope of the dummy variable is the difference between estimated intercepts in the simple regressions. The slope of the interaction is the difference between estimated slopes in the simple regressions. 15 of 47 Copyright © 2011 Pearson Education, Inc. Second–Order Models Second-Order Model With 1 Independent Variable • • • Relationship between 1 dependent and 1 independent variable is a quadratic function Useful 1st model if non-linear relationship suspected Curviline Model ar effect E ( y ) 0 1 x 2 x Linear effect 2 Second-Order Model Relationships y 2 > 0 y 2 > 0 x1 y 2 < 0 x1 y 2 < 0 x1 x1 Second-Order Model Worksheet 2 Case, i yi xi xi 1 2 3 4 : 1 4 1 3 : 1 8 3 5 : 1 64 9 25 : Create x2 column. Run regression with y, x, x 2. 2nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests. Errors (y) Weeks (x) 20 18 16 10 8 4 3 1 2 1 0 1 1 1 2 4 4 5 6 8 10 11 12 12 Second-Order Model Worksheet 2 yi xi xi 20 1 1 18 1 1 16 2 4 10 4 16 : : : Create x2 column. Run regression with y, x, x2. Excel Computer Output Solution yˆ 23.728 4.784 x .242 x 2 Overall Model Test Solution Global F–test indicates at least one parameter is not zero F P-Value β2 Parameter Test Solution β2 test indicates curvilinear relationship exists t P-Value
© Copyright 2026 Paperzz