Lecture 18 Multiple Regression II Chapter 8 sec

Lecture 19 Multiple Regression II
Chapter7
sec. 2
Coefficients of Partial Determination
Recall that the coefficient of multiple determination, R 2 , measures the proportionate reduction in
the variation of Y gained by the introduction of the entire set of X variables considered in the
model.
A coefficient of partial determination, in contrast, measures the marginal contribution of one X
variable when other variables are already in the model. In another word, it measures the
proportionate reduction in the remaining variation of Y ( SSE for the model with other predictor
variables) that is gained by adding this one X variable.
Note: R 2 measures the relation between Y and the entire set of X variables considered in the
model. And coefficient of partial determination measures the relation between Y and one X
variable given some other predictor variables are already in the in the model.
Example
1. Consider a case that there are two predictor variables X 1 and X 2 . We calculate the
coefficient of partial determination between Y and X 1 , given X 2 is already in the model as
the following:
RY21|2 
SSE ( X 2 )  SSE ( X 1 , X 2 ) SSR( X 1 | X 2 )
.

SSE ( X 2 )
SSE ( X 2 )
And similarly, we have RY22|1 
SSR( X 2 | X 1 )
. Which is larger, between RY22|1 and RY2 2 ?
SSE ( X 1 )
2. Consider the general case that has three or more predictor variables considered in the multiple
regression model. The following are some possible coefficients of partial determination:
RY21|23 
SSR( X 1 | X 2 , X 3 )
SSR( X 2 | X 1 , X 3 )
SSR( X 4 | X 1 , X 2 , X 3 )
, RY22|13 
, RY24|123 
.
SSE ( X 2 , X 3 )
SSE ( X 1 , X 3 )
SSE ( X 1 , X 2 , X 3 )
Note: The entries to the left of the vertical bar show in turn the variable taken as the response and
the X variable being added. The entries to the right of the vertical bar show the X variables
already in the model.
3. For the body fat example, the following are the SAS output
proc reg data=fat;
model Y=X2 X3 X1/SS1 SS2;
run;
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
3
396.98461
132.32820
21.52
<.0001
Error
16
98.40489
6.15031
Corrected Total
19
495.38950
Root MSE
2.47998
R-Square
Dependent Mean
20.19500
Coeff Var
12.28017
0.8014
Adj R-Sq
0.7641
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Type I SS
Type II SS
Intercept
1
117.08469
99.78240
1.17
0.2578
8156.76050
8.46816
X2
1
-2.85685
2.58202
-1.11
0.2849
381.96582
7.52928
X3
1
-2.18606
1.59550
-1.37
0.1896
2.31390
11.54590
X1
1
4.33409
3.01551
1.44
0.1699
12.70489
12.70489
Please find RY23|2 and RY21|23 .
RY23|2 

SSR( X 3 | X 2 )
2.31390

SSE ( X 2 )
SSE ( X 1 , X 2 , X 3 )  SSR( X 1 | X 2 , X 3 )  SSR( X 3 | X 2 )
2.314
2.314

 .0204
98.405  12.705  2.314 113.424
RY21|23 
SSR( X 1 | X 2 , X 3 )
12.705
12.705


 .114 .
SSE ( X 2 , X 3 )
SSE ( X 1 , X 2 , X 3 )  SSR( X 1 | X 2 , X 3 ) 98.405  12.705
Comments:
1. The coefficient of partial determination takes value between 0 and 1.
2. Let ei (Y | X 2 )  Yi  Yˆi ( X 2 ) and ei ( X 1 | X 2 )  X i1  Xˆ i1 ( X 2 ) ,
Where,
Yˆi ( X 2 ) denotes the fitted values of Y when only is in the model.
Xˆ i1 ( X 2 ) denotes the fitted values of X 1 in the regression of X 1 on X 2 .
Then, RY21|2 is the coefficient of simple determination R 2 between ei (Y | X 2 ) and
ei ( X 1 | X 2 ) .
3. The plot of the residuals ei (Y | X 2 ) against ei ( X 1 | X 2 ) provides a graphical representation
of the strength of the relationship between Y and X 1 , adjusted for X 2 . Such plots, called
added variable plots or partial regression plots, are discussed in chapter 10.
Coefficients of Partial Correlation
The square root of a coefficient of partial determination is called a coefficient of partial correlation.
It is given the same sigh as that of corresponding regression coefficient in the fitted regression
function.
Coefficients of Partial Correlation are useful in finding the best predictor variable to be selected next for
inclusion in the regression model. This will be discussed in Chapter 9.
Example Continue the body fat example
rY 1|23   RY21|23  .114  .338 .
But is rY 3|2   RY23|2   .0204  .143 ? We do not know yet, we need to know the coefficients
of the fitted function when there are only X 2 and X 3 in the model.
proc reg data=fat;
model Y=X2 X3;
run;
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-25.99695
6.99732
-3.72
0.0017
X2
1
0.85088
0.11245
7.57
<.0001
X3
1
0.09603
0.16139
0.60
0.5597
Hence, rY 3|2   RY23|2   .0204  .143
Standardized Multiple Regression Model
Round of error tends to enter the solution of the normal equations, particularly when X X in
inversed. There are two reasons for this:
1. X X has a determination close to zero. This condition occurs when the explanatory variables
are highly correlated among themselves. This condition is called multicollinearity. We will
discuss this problem later.
2. The explanatory variables have greatly different magnitudes so that the elements of X X
cover a wide range of values. The solution to this problem is to transform the variables so that
they are all of the same relative order of magnitude. This process is called standardized
regression.
Remark: Another problem with non-standardized regression is that the regression coefficients
cannot be directly compared. The regression coefficient of a variable taking large values will tend
to be smaller than the regression coefficient of a variable taking small values. As you may have
noticed that a large regression coefficient may not always be significant while a regression
coefficient that takes a small value may be highly significant.
The Correlation Transformation
Let
Y 
1 n
1 n
Y
X

X ik , k  1,, p  1 ,
,
 i k n
n i 1
i 1
n
SY 
 (Yi  Y ) 2
i 1
n 1
n
Sk 
(X
i 1
ik
 X k )2
k  1,, p  1
n 1
The correlation transformation is
Yi  
Yi  Y
n  1S Y
X ik 
X ik  X k
n  1S k
k  1,, p  1 .
The standardized Regression Model
For model Yi   0  1 X i1     p 1 X i , p 1   i , the standardized regression model, based on the
correlation transformation, is
Yi  1 X i1     p1 X i, p1   i .
Note:
1. There is no need to include an interception term in this model, since the intercept, if included,
will have a least square estimator which is identically zero.
2.
Since the Y’s have been transformed,  i are no longer N (0,  2 ) .
3.
k 
SY 
 k ( k  1,, p  1 )
Sk
 0  Y  1 X 1     p1 X p1 .
4. The normal equation for the transformed model Yi  1 X i1     p1 X i, p1   i is:
 b1 





b
b   2 
 
 b 
 p 1 
rXX b   rYX ,
where,
rXX
1

 r21



r
 p 1,1
r12
1

rp 1, 2
 r1, p 1 

 r2 , p 1 
 with rkl 
 


 1

(X
ik
 X k )( X il  X l )
 ( X ik  X k ) 2
 ( X il  X l ) 2
being the sample
correlation between X k and X l .
rYX
 rY 1 


 rY 2 

 with rYk 



r

 Y , p 1 
 (Y
i
 (Y
i
 Y )( X ik  X k )
 Y )2
(X
ik
 X k )2
being the sample correlation between Y
and X k .
5.
 b1 
 b0 







b
 b1 
2

b

Let b  
and

 be the least square estimators of the coefficient vectors of the


 



b 
 b 
p
1


 p 1 
original model and the transformed model, respectively. Then,
bk 
6. R  R
2
2
SY 
bk ( k  1,, p  1 ),
Sk
b0  Y  b1 X 1    b p1 X p1 .
Example Consider the example of Dwaine Studios example in section 6.9 of your text. (Dwain
Studios, Inc., operates portrait studios in 21 cities of medium size and these studios specialized in
portraits of children.)
Y -- Sales in a community (in thousands of dollars).
X 1 -- Number of persons aged 16 or younger in the community (in thousands of persons).
X 2 -- Per capita disposable personal income in the community (in thousands of dollars).
data sales;
data sales2;
infile
set sales1;
'F:\teaching\STAT512\data\CH06FI05.txt';
Do k=1 to N;
input X1 X2 Y;
output;
run;
end;
proc means Noprint N Mean Std;
drop k _Type_ _Freq_;
output out=sales1 N=N Mean=X1bar X2bar
run;
Ybar
data salesnew;
Std=StdX1 StdX2 StdY;
merge sales sales2;
run;
X1star=(X1-X1bar)/(sqrt(N-1)*StdX1);
proc print data=sales1;
X2star=(X2-X2bar)/(sqrt(N-1)*StdX2);
var X1bar X2bar Ybar StdX1 StdX2 StdY;
Ystar=(Y-Ybar)/(sqrt(N-1)*StdY);
run;
keep X1 X2 Y X1star X2star Ystar;
run;
Obs
X1bar
X2bar
Ybar
StdX1
StdX2
StdY
1
62.0190
17.1429
181.905
18.6203
0.97035
36.1913
Obs
X1
X2
Y
X1star
X2star
Ystar
1
68.5
16.7
174.4
0.07783
-0.10205
-0.04637
2
45.2
16.8
164.4
-0.20198
-0.07901
-0.10815
3
91.3
18.2
244.2
0.35163
0.24361
0.38489
4
47.8
16.3
154.6
-0.17075
-0.19423
-0.16870
5
46.9
17.3
181.6
-0.18156
0.03621
-0.00188
6
66.1
18.2
207.5
0.04901
0.24361
0.15814
. .
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
15
52.5
17.8
161.1
-0.11431
0.15143
-0.12854
16
85.7
18.4
209.7
0.28438
0.28970
0.17173
17
41.3
16.5
146.4
-0.24881
-0.14814
-0.21937
18
51.7
16.3
144.0
-0.12392
-0.19423
-0.23419
19
89.6
18.1
232.6
0.33121
0.22056
0.31322
20
82.7
19.1
224.1
0.24835
0.45100
0.26070
21
52.3
16.0
166.5
-0.11671
-0.26336
-0.09518
proc reg;
proc reg;
model Y=X1 X2;
model Ystar=X1star X2star;
run;
run;
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
2
24015
12008
99.10
<.0001
Error
18
2180.92741
121.16263
Corrected Total
20
26196
Root MSE
Dependent Mean
11.00739
R-Square
0.9167
181.90476
Adj R-Sq
0.9075
Coeff Var
6.05118
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-68.85707
60.01695
-1.15
0.2663
X1
1
1.45456
0.21178
6.87
<.0001
X2
1
9.36550
4.06396
2.30
0.0333
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
2
0.91675
0.45837
104.61
<.0001
Error
19
0.08325
0.00438
Uncorrected Total
21
1.00000
Root MSE
0.06619
R-Square
0.9167
Dependent Mean
2.97381E-17
Adj R-Sq
0.9080
Coeff Var
2.225928E17
Parameter Estimates
Parameter
Standard
DF
Estimate
Error
t Value
Pr > |t|
X1star
1
0.74837
0.10605
7.06
<.0001
X2star
1
0.25110
0.10605
2.37
0.0287
Variable
The fitted lines to the original data and the transformed data are:
Y  68.85707  1.45456 X 1  9.3655 X 2 and Y   0.74837 X 1  0.2511X 2
Now we want to obtain the fitted line for the original data from the standardized regression coefficients
b1 
SY  36.1913
b1 
* 0.74837  1.454567
S1
18.6203
b2 
SY  36.1913
b2 
* 0.2511  9.365317
S2
0.97035
b0  Y  b1 X 1  b2 X 2  181.905  1.454567 * 62.019  9.365317 *17.1429  68.85448
Hence, the regression coefficients are similar to the ones obtained without standardization except for slight
rounding effect differences.
Question: Here can we say that X 1 has much greater impact on sales than X 2 since b1 is much
greater than b2 ?
The answer is no! In our example, X 1 and X 2 are highly correlated and the regression coefficients
are affected by the other predictor variables in the model. See the following output:
proc corr;
var X1 X2 X1star X2star;
run;
Pearson Correlation Coefficients, N = 21
X1
X2
X1star
X2star
X1
1.00000
0.78130
1.00000
0.78130
X2
0.78130
1.00000
0.78130
1.00000
X1star
1.00000
0.78130
1.00000
0.78130
X2star
0.78130
1.00000
0.78130
1.00000
Remark: The magnitudes of the standardized regression coefficients are affected not only by the
presence of correlations among the predictor variables but also by the spacing of the observations
on each of these variables. Hence it is ordinarily not wise to interpret the comparative importance
of the predictor variables.
Multicollinearity and Its Effects
Uncorrelated Predictor Variables
Example: The data is from a small-scale experiment studying the effect of work crew size ( X 1 )
and level of bonus pay ( X 2 ) on crew productivity (Y). The predictor variables X 1 and X 2 are
uncorrelated here. See the data below:
Case
Crew Size
Bonus Pay
(Dollars)
Crew
Productivity
X i1
X i2
Yi
4
4
4
4
6
6
6
6
2
2
3
3
2
2
3
3
42
39
48
51
49
53
61
60
i
1
2
3
4
5
6
7
8
proc reg;
proc reg;
proc reg;
model Y=X1 X2/ss1 ss2;
model Y=X1;
model Y=X2;
run;
run;
run;
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
2
402.25000
201.12500
57.06
0.0004
Error
5
17.62500
3.52500
Corrected Total
7
419.87500
Root MSE
1.87750
Dependent Mean
50.37500
Coeff Var
3.72704
R-Square
Adj R-Sq
0.9580
0.9412
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
0.37500
4.74045
0.08
0.9400
Type I SS
Type II SS
20301
0.02206
X1
1
5.37500
0.66380
8.10
0.0005
231.12500
231.12500
X2
1
9.25000
1.32759
6.97
0.0009
171.12500
171.12500
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
1
231.12500
231.12500
7.35
0.0351
Error
6
188.75000
31.45833
Corrected Total
7
419.87500
Root MSE
5.60877
Dependent Mean
50.37500
Coeff Var
11.13404
R-Square
0.5505
Adj R-Sq
0.4755
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
23.50000
10.11136
2.32
0.0591
5.37500
1.98300
X1
1
2.71
0.0351
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
1
171.12500
171.12500
4.13
0.0885
Error
6
248.75000
41.45833
Corrected Total
7
419.87500
Root MSE
6.43881
R-Square
Dependent Mean
50.37500
Coeff Var
12.78177
0.4076
Adj R-Sq
0.3088
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
27.25000
11.60774
2.35
0.0572
X2
1
9.25000
4.55293
2.03
0.0885
From the output we notice the following facts:
1. The fitted lines are
Yˆ  .375  5.375 X 1  9.250 X 2 , Yˆ  23.5  5.375 X 1 and Yˆ  27.250  9.250 X 2 .
So, the regression coefficient of X 1 is unchanged when X 2 is added to the model and
equivalently, the regression coefficient of X 2 is unchanged when X 1 is added to the model.
2.
SSR( X 1 )  SSR( X 1 | X 2 ) and SSR( X 2 )  SSR( X 2 | X 1 ) .
3.
SSR( X 1 , X 2 )  SSR( X 1 )  SSR( X 2 ) and hence R 2 ( X 1 , X 2 )  R 2 ( X 1 )  R 2 ( X 2 ) .
Effects of Multicollinearity
From the example that predictor variables are perfectly correlated in the textbook (page 281-283), it
implied that
1. The perfect relation between X 1 and X 2 did not inhibit our ability to obtain a good fit to the data.
2. But since many different response functions provide the same good fit, we cannot interpret any one
set of regression coefficients as reflecting the effects of the different predictor variables.
In practice, we seldom find predictor variables that are perfectly related or data that do not contain some
random error component; nevertheless, the implications just noted for our idealized example still have
relevance.
1. Multicollinearity will not inhibit the ability to obtain a good fit to the data. So, it will not affedt
inferences about mean responses or prediction of new observations.
2. The estimated regression coefficients tend to have large sampling variability when the predictor
variables are highly correlated. As a result, many of the estimated regression coefficients
individually may be statistically not significant even though a definite statistical relation exists
between the response variable and the set of predictor variables. Think about our body fat example.
3. The common interpretation of a regression coefficient as measuring the change in the expected value
of the response variable when the given predictor variable is increased by one unit while other
predictor variables are held constant is not fully applicable when multicollinearity exists.
Example: Consider our body fat example:
Proc corr;
Proc reg data=fat;
proc reg data=fat;
proc reg data=fat;
var X1 X2 X3;
model Y=X2;
model Y=X1 X2;
model Y=X1 X2 X3/ss1
run;
run;
run;
ss2;
proc reg data=fat;
run;
model Y=X1;
run;
Pearson Correlation Coefficients, N = 20
X1
X2
X3
X1
1.00000
0.92384
0.45778
X2
0.92384
1.00000
0.08467
X3
0.45778
0.08467
1.00000
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-1.49610
3.31923
-0.45
0.6576
X1
1
0.85719
0.12878
6.66
<.0001
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
-23.63449
5.65741
-4.18
0.0006
X2
1
0.85655
0.11002
7.79
<.0001
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Type I SS
Type II SS
Intercept
1
-19.17425
8.36064
-2.29
0.0348
8156.76050
34.01785
X1
1
0.22235
0.30344
0.73
0.4737
352.26980
3.47289
X2
1
0.65942
0.29119
2.26
0.0369
33.16891
33.16891
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Type I SS
Type II SS
Intercept
1
117.08469
99.78240
1.17
0.2578
8156.76050
8.46816
X1
1
4.33409
3.01551
1.44
0.1699
352.26980
12.70489
X2
1
-2.85685
2.58202
-1.11
0.2849
33.16891
7.52928
X3
1
-2.18606
1.59550
-1.37
0.1896
11.54590
11.54590
From the output the some of the predictor variables are highly correlated. Multicollinearity exists in our
example and the effects are the following:
1. The regression coefficients depend on which other variables are included in the model.
2. The marginal contribution of the predictor variable in reducing the error sum of squares (increasing
the regression sum of squares) varies depending on which other variables are already in the model.
For example: SSR( X 1 )  352.27 , SSR( X 1 | X 2 )  3.47 and SSR( X 1 | X 2 , X 3 )  12.70489 .
The reason why SSR( X 1 | X 2 ) is so small compared with SSR( X 1 ) is that X 1 and X 2 are
highly correlated with each other and with response variable. And X 2 contains much of the same
information as X 1 .
3. More imprecise the estimated regression coefficients become as more predictor variables are added
to the regression model.