Lecture 19 Multiple Regression II Chapter7 sec. 2 Coefficients of Partial Determination Recall that the coefficient of multiple determination, R 2 , measures the proportionate reduction in the variation of Y gained by the introduction of the entire set of X variables considered in the model. A coefficient of partial determination, in contrast, measures the marginal contribution of one X variable when other variables are already in the model. In another word, it measures the proportionate reduction in the remaining variation of Y ( SSE for the model with other predictor variables) that is gained by adding this one X variable. Note: R 2 measures the relation between Y and the entire set of X variables considered in the model. And coefficient of partial determination measures the relation between Y and one X variable given some other predictor variables are already in the in the model. Example 1. Consider a case that there are two predictor variables X 1 and X 2 . We calculate the coefficient of partial determination between Y and X 1 , given X 2 is already in the model as the following: RY21|2 SSE ( X 2 ) SSE ( X 1 , X 2 ) SSR( X 1 | X 2 ) . SSE ( X 2 ) SSE ( X 2 ) And similarly, we have RY22|1 SSR( X 2 | X 1 ) . Which is larger, between RY22|1 and RY2 2 ? SSE ( X 1 ) 2. Consider the general case that has three or more predictor variables considered in the multiple regression model. The following are some possible coefficients of partial determination: RY21|23 SSR( X 1 | X 2 , X 3 ) SSR( X 2 | X 1 , X 3 ) SSR( X 4 | X 1 , X 2 , X 3 ) , RY22|13 , RY24|123 . SSE ( X 2 , X 3 ) SSE ( X 1 , X 3 ) SSE ( X 1 , X 2 , X 3 ) Note: The entries to the left of the vertical bar show in turn the variable taken as the response and the X variable being added. The entries to the right of the vertical bar show the X variables already in the model. 3. For the body fat example, the following are the SAS output proc reg data=fat; model Y=X2 X3 X1/SS1 SS2; run; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 396.98461 132.32820 21.52 <.0001 Error 16 98.40489 6.15031 Corrected Total 19 495.38950 Root MSE 2.47998 R-Square Dependent Mean 20.19500 Coeff Var 12.28017 0.8014 Adj R-Sq 0.7641 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS Intercept 1 117.08469 99.78240 1.17 0.2578 8156.76050 8.46816 X2 1 -2.85685 2.58202 -1.11 0.2849 381.96582 7.52928 X3 1 -2.18606 1.59550 -1.37 0.1896 2.31390 11.54590 X1 1 4.33409 3.01551 1.44 0.1699 12.70489 12.70489 Please find RY23|2 and RY21|23 . RY23|2 SSR( X 3 | X 2 ) 2.31390 SSE ( X 2 ) SSE ( X 1 , X 2 , X 3 ) SSR( X 1 | X 2 , X 3 ) SSR( X 3 | X 2 ) 2.314 2.314 .0204 98.405 12.705 2.314 113.424 RY21|23 SSR( X 1 | X 2 , X 3 ) 12.705 12.705 .114 . SSE ( X 2 , X 3 ) SSE ( X 1 , X 2 , X 3 ) SSR( X 1 | X 2 , X 3 ) 98.405 12.705 Comments: 1. The coefficient of partial determination takes value between 0 and 1. 2. Let ei (Y | X 2 ) Yi Yˆi ( X 2 ) and ei ( X 1 | X 2 ) X i1 Xˆ i1 ( X 2 ) , Where, Yˆi ( X 2 ) denotes the fitted values of Y when only is in the model. Xˆ i1 ( X 2 ) denotes the fitted values of X 1 in the regression of X 1 on X 2 . Then, RY21|2 is the coefficient of simple determination R 2 between ei (Y | X 2 ) and ei ( X 1 | X 2 ) . 3. The plot of the residuals ei (Y | X 2 ) against ei ( X 1 | X 2 ) provides a graphical representation of the strength of the relationship between Y and X 1 , adjusted for X 2 . Such plots, called added variable plots or partial regression plots, are discussed in chapter 10. Coefficients of Partial Correlation The square root of a coefficient of partial determination is called a coefficient of partial correlation. It is given the same sigh as that of corresponding regression coefficient in the fitted regression function. Coefficients of Partial Correlation are useful in finding the best predictor variable to be selected next for inclusion in the regression model. This will be discussed in Chapter 9. Example Continue the body fat example rY 1|23 RY21|23 .114 .338 . But is rY 3|2 RY23|2 .0204 .143 ? We do not know yet, we need to know the coefficients of the fitted function when there are only X 2 and X 3 in the model. proc reg data=fat; model Y=X2 X3; run; Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -25.99695 6.99732 -3.72 0.0017 X2 1 0.85088 0.11245 7.57 <.0001 X3 1 0.09603 0.16139 0.60 0.5597 Hence, rY 3|2 RY23|2 .0204 .143 Standardized Multiple Regression Model Round of error tends to enter the solution of the normal equations, particularly when X X in inversed. There are two reasons for this: 1. X X has a determination close to zero. This condition occurs when the explanatory variables are highly correlated among themselves. This condition is called multicollinearity. We will discuss this problem later. 2. The explanatory variables have greatly different magnitudes so that the elements of X X cover a wide range of values. The solution to this problem is to transform the variables so that they are all of the same relative order of magnitude. This process is called standardized regression. Remark: Another problem with non-standardized regression is that the regression coefficients cannot be directly compared. The regression coefficient of a variable taking large values will tend to be smaller than the regression coefficient of a variable taking small values. As you may have noticed that a large regression coefficient may not always be significant while a regression coefficient that takes a small value may be highly significant. The Correlation Transformation Let Y 1 n 1 n Y X X ik , k 1,, p 1 , , i k n n i 1 i 1 n SY (Yi Y ) 2 i 1 n 1 n Sk (X i 1 ik X k )2 k 1,, p 1 n 1 The correlation transformation is Yi Yi Y n 1S Y X ik X ik X k n 1S k k 1,, p 1 . The standardized Regression Model For model Yi 0 1 X i1 p 1 X i , p 1 i , the standardized regression model, based on the correlation transformation, is Yi 1 X i1 p1 X i, p1 i . Note: 1. There is no need to include an interception term in this model, since the intercept, if included, will have a least square estimator which is identically zero. 2. Since the Y’s have been transformed, i are no longer N (0, 2 ) . 3. k SY k ( k 1,, p 1 ) Sk 0 Y 1 X 1 p1 X p1 . 4. The normal equation for the transformed model Yi 1 X i1 p1 X i, p1 i is: b1 b b 2 b p 1 rXX b rYX , where, rXX 1 r21 r p 1,1 r12 1 rp 1, 2 r1, p 1 r2 , p 1 with rkl 1 (X ik X k )( X il X l ) ( X ik X k ) 2 ( X il X l ) 2 being the sample correlation between X k and X l . rYX rY 1 rY 2 with rYk r Y , p 1 (Y i (Y i Y )( X ik X k ) Y )2 (X ik X k )2 being the sample correlation between Y and X k . 5. b1 b0 b b1 2 b Let b and be the least square estimators of the coefficient vectors of the b b p 1 p 1 original model and the transformed model, respectively. Then, bk 6. R R 2 2 SY bk ( k 1,, p 1 ), Sk b0 Y b1 X 1 b p1 X p1 . Example Consider the example of Dwaine Studios example in section 6.9 of your text. (Dwain Studios, Inc., operates portrait studios in 21 cities of medium size and these studios specialized in portraits of children.) Y -- Sales in a community (in thousands of dollars). X 1 -- Number of persons aged 16 or younger in the community (in thousands of persons). X 2 -- Per capita disposable personal income in the community (in thousands of dollars). data sales; data sales2; infile set sales1; 'F:\teaching\STAT512\data\CH06FI05.txt'; Do k=1 to N; input X1 X2 Y; output; run; end; proc means Noprint N Mean Std; drop k _Type_ _Freq_; output out=sales1 N=N Mean=X1bar X2bar run; Ybar data salesnew; Std=StdX1 StdX2 StdY; merge sales sales2; run; X1star=(X1-X1bar)/(sqrt(N-1)*StdX1); proc print data=sales1; X2star=(X2-X2bar)/(sqrt(N-1)*StdX2); var X1bar X2bar Ybar StdX1 StdX2 StdY; Ystar=(Y-Ybar)/(sqrt(N-1)*StdY); run; keep X1 X2 Y X1star X2star Ystar; run; Obs X1bar X2bar Ybar StdX1 StdX2 StdY 1 62.0190 17.1429 181.905 18.6203 0.97035 36.1913 Obs X1 X2 Y X1star X2star Ystar 1 68.5 16.7 174.4 0.07783 -0.10205 -0.04637 2 45.2 16.8 164.4 -0.20198 -0.07901 -0.10815 3 91.3 18.2 244.2 0.35163 0.24361 0.38489 4 47.8 16.3 154.6 -0.17075 -0.19423 -0.16870 5 46.9 17.3 181.6 -0.18156 0.03621 -0.00188 6 66.1 18.2 207.5 0.04901 0.24361 0.15814 . . . . . . . . . . . . . . . . . . . . . . 15 52.5 17.8 161.1 -0.11431 0.15143 -0.12854 16 85.7 18.4 209.7 0.28438 0.28970 0.17173 17 41.3 16.5 146.4 -0.24881 -0.14814 -0.21937 18 51.7 16.3 144.0 -0.12392 -0.19423 -0.23419 19 89.6 18.1 232.6 0.33121 0.22056 0.31322 20 82.7 19.1 224.1 0.24835 0.45100 0.26070 21 52.3 16.0 166.5 -0.11671 -0.26336 -0.09518 proc reg; proc reg; model Y=X1 X2; model Ystar=X1star X2star; run; run; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 24015 12008 99.10 <.0001 Error 18 2180.92741 121.16263 Corrected Total 20 26196 Root MSE Dependent Mean 11.00739 R-Square 0.9167 181.90476 Adj R-Sq 0.9075 Coeff Var 6.05118 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -68.85707 60.01695 -1.15 0.2663 X1 1 1.45456 0.21178 6.87 <.0001 X2 1 9.36550 4.06396 2.30 0.0333 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.91675 0.45837 104.61 <.0001 Error 19 0.08325 0.00438 Uncorrected Total 21 1.00000 Root MSE 0.06619 R-Square 0.9167 Dependent Mean 2.97381E-17 Adj R-Sq 0.9080 Coeff Var 2.225928E17 Parameter Estimates Parameter Standard DF Estimate Error t Value Pr > |t| X1star 1 0.74837 0.10605 7.06 <.0001 X2star 1 0.25110 0.10605 2.37 0.0287 Variable The fitted lines to the original data and the transformed data are: Y 68.85707 1.45456 X 1 9.3655 X 2 and Y 0.74837 X 1 0.2511X 2 Now we want to obtain the fitted line for the original data from the standardized regression coefficients b1 SY 36.1913 b1 * 0.74837 1.454567 S1 18.6203 b2 SY 36.1913 b2 * 0.2511 9.365317 S2 0.97035 b0 Y b1 X 1 b2 X 2 181.905 1.454567 * 62.019 9.365317 *17.1429 68.85448 Hence, the regression coefficients are similar to the ones obtained without standardization except for slight rounding effect differences. Question: Here can we say that X 1 has much greater impact on sales than X 2 since b1 is much greater than b2 ? The answer is no! In our example, X 1 and X 2 are highly correlated and the regression coefficients are affected by the other predictor variables in the model. See the following output: proc corr; var X1 X2 X1star X2star; run; Pearson Correlation Coefficients, N = 21 X1 X2 X1star X2star X1 1.00000 0.78130 1.00000 0.78130 X2 0.78130 1.00000 0.78130 1.00000 X1star 1.00000 0.78130 1.00000 0.78130 X2star 0.78130 1.00000 0.78130 1.00000 Remark: The magnitudes of the standardized regression coefficients are affected not only by the presence of correlations among the predictor variables but also by the spacing of the observations on each of these variables. Hence it is ordinarily not wise to interpret the comparative importance of the predictor variables. Multicollinearity and Its Effects Uncorrelated Predictor Variables Example: The data is from a small-scale experiment studying the effect of work crew size ( X 1 ) and level of bonus pay ( X 2 ) on crew productivity (Y). The predictor variables X 1 and X 2 are uncorrelated here. See the data below: Case Crew Size Bonus Pay (Dollars) Crew Productivity X i1 X i2 Yi 4 4 4 4 6 6 6 6 2 2 3 3 2 2 3 3 42 39 48 51 49 53 61 60 i 1 2 3 4 5 6 7 8 proc reg; proc reg; proc reg; model Y=X1 X2/ss1 ss2; model Y=X1; model Y=X2; run; run; run; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 402.25000 201.12500 57.06 0.0004 Error 5 17.62500 3.52500 Corrected Total 7 419.87500 Root MSE 1.87750 Dependent Mean 50.37500 Coeff Var 3.72704 R-Square Adj R-Sq 0.9580 0.9412 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.37500 4.74045 0.08 0.9400 Type I SS Type II SS 20301 0.02206 X1 1 5.37500 0.66380 8.10 0.0005 231.12500 231.12500 X2 1 9.25000 1.32759 6.97 0.0009 171.12500 171.12500 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 231.12500 231.12500 7.35 0.0351 Error 6 188.75000 31.45833 Corrected Total 7 419.87500 Root MSE 5.60877 Dependent Mean 50.37500 Coeff Var 11.13404 R-Square 0.5505 Adj R-Sq 0.4755 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 23.50000 10.11136 2.32 0.0591 5.37500 1.98300 X1 1 2.71 0.0351 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 171.12500 171.12500 4.13 0.0885 Error 6 248.75000 41.45833 Corrected Total 7 419.87500 Root MSE 6.43881 R-Square Dependent Mean 50.37500 Coeff Var 12.78177 0.4076 Adj R-Sq 0.3088 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 27.25000 11.60774 2.35 0.0572 X2 1 9.25000 4.55293 2.03 0.0885 From the output we notice the following facts: 1. The fitted lines are Yˆ .375 5.375 X 1 9.250 X 2 , Yˆ 23.5 5.375 X 1 and Yˆ 27.250 9.250 X 2 . So, the regression coefficient of X 1 is unchanged when X 2 is added to the model and equivalently, the regression coefficient of X 2 is unchanged when X 1 is added to the model. 2. SSR( X 1 ) SSR( X 1 | X 2 ) and SSR( X 2 ) SSR( X 2 | X 1 ) . 3. SSR( X 1 , X 2 ) SSR( X 1 ) SSR( X 2 ) and hence R 2 ( X 1 , X 2 ) R 2 ( X 1 ) R 2 ( X 2 ) . Effects of Multicollinearity From the example that predictor variables are perfectly correlated in the textbook (page 281-283), it implied that 1. The perfect relation between X 1 and X 2 did not inhibit our ability to obtain a good fit to the data. 2. But since many different response functions provide the same good fit, we cannot interpret any one set of regression coefficients as reflecting the effects of the different predictor variables. In practice, we seldom find predictor variables that are perfectly related or data that do not contain some random error component; nevertheless, the implications just noted for our idealized example still have relevance. 1. Multicollinearity will not inhibit the ability to obtain a good fit to the data. So, it will not affedt inferences about mean responses or prediction of new observations. 2. The estimated regression coefficients tend to have large sampling variability when the predictor variables are highly correlated. As a result, many of the estimated regression coefficients individually may be statistically not significant even though a definite statistical relation exists between the response variable and the set of predictor variables. Think about our body fat example. 3. The common interpretation of a regression coefficient as measuring the change in the expected value of the response variable when the given predictor variable is increased by one unit while other predictor variables are held constant is not fully applicable when multicollinearity exists. Example: Consider our body fat example: Proc corr; Proc reg data=fat; proc reg data=fat; proc reg data=fat; var X1 X2 X3; model Y=X2; model Y=X1 X2; model Y=X1 X2 X3/ss1 run; run; run; ss2; proc reg data=fat; run; model Y=X1; run; Pearson Correlation Coefficients, N = 20 X1 X2 X3 X1 1.00000 0.92384 0.45778 X2 0.92384 1.00000 0.08467 X3 0.45778 0.08467 1.00000 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.49610 3.31923 -0.45 0.6576 X1 1 0.85719 0.12878 6.66 <.0001 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -23.63449 5.65741 -4.18 0.0006 X2 1 0.85655 0.11002 7.79 <.0001 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS Intercept 1 -19.17425 8.36064 -2.29 0.0348 8156.76050 34.01785 X1 1 0.22235 0.30344 0.73 0.4737 352.26980 3.47289 X2 1 0.65942 0.29119 2.26 0.0369 33.16891 33.16891 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS Intercept 1 117.08469 99.78240 1.17 0.2578 8156.76050 8.46816 X1 1 4.33409 3.01551 1.44 0.1699 352.26980 12.70489 X2 1 -2.85685 2.58202 -1.11 0.2849 33.16891 7.52928 X3 1 -2.18606 1.59550 -1.37 0.1896 11.54590 11.54590 From the output the some of the predictor variables are highly correlated. Multicollinearity exists in our example and the effects are the following: 1. The regression coefficients depend on which other variables are included in the model. 2. The marginal contribution of the predictor variable in reducing the error sum of squares (increasing the regression sum of squares) varies depending on which other variables are already in the model. For example: SSR( X 1 ) 352.27 , SSR( X 1 | X 2 ) 3.47 and SSR( X 1 | X 2 , X 3 ) 12.70489 . The reason why SSR( X 1 | X 2 ) is so small compared with SSR( X 1 ) is that X 1 and X 2 are highly correlated with each other and with response variable. And X 2 contains much of the same information as X 1 . 3. More imprecise the estimated regression coefficients become as more predictor variables are added to the regression model.
© Copyright 2026 Paperzz