RemedialCollinearityComplete-1.pdf

J.M. Hughes-Oliver
ST708 Applied Least Squares
Remedial Measures for Collinearity
Impact of collinearity wrt inferential goals:
• causes variance inflation of estimated parameters
“. . . collinearity creates serious problems if the purpose of the regression is to
understand the process, to identify important variables in the process, or to
obtain meaningful estimates of the regression coefficients.” (Rawlings,
Pantula, Dickey, 1998, p. 458)
• ineffective determination of “relative importance” of explanatory variables
“The best recourse to the collinearity problem when the objective is to assign
relative importance is to recognize that the data are inadequate for the
purpose and obtain better data, perhaps from controlled experiments.”
(Rawlings, Pantula, Dickey, 1998, p. 446)
• no effect on precision of estimated responses (and predictions) at observed
points in the X-space
“If the regression analysis is intended solely for prediction of the dependent
variable, the presence of near singularities in the data does not create serious
problems as long as certain very important conditions are met . . . ” (Rawlings,
Pantula, Dickey, 1998, p. 457)
Fall 2006
1
J.M. Hughes-Oliver
ST708 Applied Least Squares
• causes variance inflation of estimated responses (and predictions) at
unobserved points in the X-space
OLS estimator is BLUE:
• collinearity maintains unbiasedness
• collinearity causes increase in variances
Sacrifice unbiasedness to get smaller variance ; Biased Regression:
• improved (wrt MSE) estimation of parameters
• improved (wrt MSE) estimation of estimated responses at unobserved
points in certain regions of the X-space
• (still) ineffective determination of “relative importance” of explanatory
variables
Some biased regression approaches:
• Principal Components Regression
• Ridge Regression
• Partial Least Squares
Fall 2006
2
Principal Components Regression
J.M. Hughes-Oliver
ST708 Applied Least Squares
• Assuming X is n × (p + 1) with a column of ones,
Y
=
Xβ + ²
=
1n β0 + X ∗ β ∗ + ²
=
1n β0 + X β ∗ + Xc β ∗ + ²
0 0
where X ∗ is n × p, X = [1n X ∗ ], β 0 = (β0 β ∗ )
∗
where X
∗
= [1n X 1 1n X 2 · · · 1n X p ] is n × p,
Xc = X ∗ − X
=
∗
1n β0 + X β ∗ + Xc D −1/2 D 1/2 β ∗ +²
{z
} | {z }
|
{z
} |
Z
1n Y
=
∗
is centered version of X ∗
where D = diag(Xc0 Xc )
δ
1n Y + Zδ + ²
and Z is centered and scaled version of X, omitting the intercept column
• Z = U L1/2 V 0 by singular value decomposition
• W = ZV
converts the p columns of Z into p principal components
corresponding to eigenvalues λ1 ≥ · · · ≥ λp .
– columns of W are orthogonal
– reasonable to drop principal components with “small” eigenvalues
• Y = 1n Y + Zδ + ²
⇐⇒
Y = 1n Y + ZV V 0 δ +² = 1n Y + W γ + ²
|{z} |{z}
W
Fall 2006
γ
3
J.M. Hughes-Oliver
ST708 Applied Least Squares
Steps:
b
1. Regress Y on W to get γ
2. Eliminate all principal components that (a) have condition index > 10 and (b)
b(g) .
have nonsignificant coefficients γj . End-product is V(g) and γ
3. Convert back to centered and scaled variables:
+
b(g)
δ(g) = V(g) γ
+
−1
0
var(δ(g) ) = V(g) L(g) V(g)
M SE,
with
where M SE is from the reduced model (this is what SAS does).
4. Convert back to original variables:
+
+
∗
b(g) = D −1/2 δ(g)
= D −1/2 V(g) γ
β(g)
Why is this called biased regression?
+
0 b
δOLS
δ(g) = V(g) V(g)
In fact,
⇒
+
0
E(δ(g) ) = V(g) V(g)
δ 6= δ.
+
bias = E(δ(g) − δ) = −V(s) γ(s) ,
where V(s) is the set of principal components dropped earlier and γ(s) is the
corresponding set of coefficients.
By only omitting nonsignificant γj ’s in step 2, we ensure a small bias.
Fall 2006
4
J.M. Hughes-Oliver
ST708 Applied Least Squares
Algae Example–13-degree non-orthogonal case
Summary:
• VIF flags all explanatory variables. Drop them all?
• COLLINOINT flags 9 condition indices > 10, with all explanatory variables
appearing in the flagged principal components
• Test if we can drop the bottom 9 principal components:
F
=
[(RM SE9 )2 (14 + 9) − (RM SE)2 (14)]/9
= 2.6809
2
(RM SE)
p-value
=
9
> 2.6809) = .04781
Pr(F14
Cannot drop bottom 9 principal components.
• Test if we can drop the bottom 8 principal components:
F = 0.3184
8
p-value = Pr(F14
> 0.3184) = 0.94571
Drop bottom 8 principal components.
• After dropping bottom 8 principal components, get estimate as
Fall 2006
5
J.M. Hughes-Oliver
+
β(g)
+
s.e.(β(g) )
ST708 Applied Least Squares
intercept
day
day2
day3
day4
...
3.75067
.39956
−.01968
.00065
−.00023
−.00004
0
0
0
0
0
0
0
0
.05671
.02522
.00349
.00075
.00003
.00002
0
0
0
0
0
0
0
0
options nodate ls=85 ps=25 nonumber;
data algae;
input day density @@;
datalines;
1 .530 1 .184
2 1.183 2 .664
3 1.603 3 1.553
4 1.994 4 1.910
5 2.708 5 2.585
6 3.006 6 3.009
7 3.867 7 3.403
8 4.059 8 3.892
9 4.349 9 4.367
10 4.699 10 4.551
11 4.983 11 4.656
12 5.100 12 4.754
13 5.288 13 4.842
14 5.374 14 4.969
;
title1 height=.15in "Algae Example: Polynomial Regression";
run;
Fall 2006
6
J.M. Hughes-Oliver
ST708 Applied Least Squares
data algae2; set algae;
day=day-7.5;
day2=day*day; day3=day2*day; day4=day3*day; day5=day4*day; day6=day5*day;
day7=day6*day; day8=day7*day; day9=day8*day; day10=day9*day; day11=day10*day;
day12=day11*day; day13=day12*day; run;
/* 13-degree polynomial, based on "day-7.5" variable *//* Principal components regression */
proc reg data=algae2 outest=fixcoll noprint;
title2 height=.15in "13-degree polynomial around day 7.5";
title3 height=.15in "Principal Components Regression";
model density=day day2 day3 day4 day5 day6 day7 day8 day9 day10 day11 day12 day13/
vif collin pcomit=1 to 12 edf outseb;
run;
proc print data=fixcoll; run;
data testdrop; set fixcoll; if _n_=1 or _type_="IPC";
if _n_=1 then do;
mse0=_rmse_*_rmse_;
edf0=_edf_;
end;
retain mse0 edf0;
run;
data testdrop; set testdrop;
j=_n_-1;
edf=edf0+j;
ftest=(_rmse_*_rmse_*edf - mse0*edf0)/( j*mse0 );
pvalue=1-cdf(’F’,ftest,j,edf0);
run;
proc print; run;
Fall 2006
7
SAS Output
Algae Example: Polynomial Regression
13-degree polynomial around day 7.5
Principal Components Regression
Obs
_MODEL_
_TYPE_
_DEPVAR_
_RIDGE_
_PCOMIT_
_RMSE_
Intercept
day
day2
day3
day4
day5
day6
day7
day8
day9
day10
day11
day12
density
_IN_
_P_
_EDF_
_RSQ_
1
MODEL1
PARMS
density
.
.
0.21287
3.83499
0.31061
-0.12773
0.12901
0.036240
-0.038954
-.004839719
0.004613589
0.000282030
-.000249945
-.000007271
0.000006163
0.000000068
-5.57374E-8
day13
-1
13
14
14
0.99091
2
MODEL1
SEB
density
.
.
0.21287
0.13527
0.26248
0.18359
0.22876
0.061370
0.061351
0.007760111
0.006891571
0.000436953
0.000361705
0.000011034
0.000008738
0.000000101
7.795563E-8
-1
.
.
.
.
3
MODEL1
IPC
density
.
1
0.20919
3.83499
0.46060
-0.12773
-0.02351
0.036240
0.003337
-.004839719
-.000178178
0.000282030
0.000002015
-.000007271
0.000000077
0.000000068
-1.47597E-9
-1
.
.
.
.
4
MODEL1
IPCSEB
density
.
1
0.20919
0.13293
0.14753
0.18042
0.06516
0.060308
0.008712
0.007625813
0.000418654
0.000429392
0.000004170
0.000010843
0.000000173
0.000000100
3.140033E-9
-1
.
.
.
.
5
MODEL1
IPC
density
.
2
0.20549
3.78899
0.46060
-0.02126
-0.02351
-0.002636
0.003337
0.000191853
-.000178178
-.000002818
0.000002015
-.000000079
0.000000077
0.000000002
-1.47597E-9
-1
.
.
.
.
6
MODEL1
IPCSEB
density
.
2
0.20549
0.11142
0.14493
0.08104
0.06401
0.014064
0.008558
0.000797163
0.000411266
0.000010305
0.000004096
0.000000310
0.000000170
0.000000006
3.084627E-9
-1
.
.
.
.
7
MODEL1
IPC
density
.
3
0.20058
3.78899
0.41038
-0.02126
0.00336
-0.002636
-0.000422
0.000191853
0.000004307
-.000002818
0.000000213
-.000000079
0.000000002
0.000000002
-1.1229E-10
-1
.
.
.
.
8
MODEL1
IPCSEB
density
.
3
0.20058
0.10876
0.08840
0.07910
0.02022
0.013727
0.001187
0.000778107
0.000008473
0.000010059
0.000000523
0.000000303
0.000000004
0.000000006
2.64953E-10
-1
.
.
.
.
9
MODEL1
IPC
density
.
4
0.19530
3.80343
0.41038
-0.03928
0.00336
0.000784
-0.000422
-.000005966
0.000004307
-.000000277
0.000000213
-.000000002
0.000000002
1.38218E-10
-1.1229E-10
-1
.
.
.
.
10
MODEL1
IPCSEB
density
.
4
0.19530
0.09036
0.08607
0.03431
0.01969
0.002735
0.001156
0.000035186
0.000008250
0.000001179
0.000000510
0.000000006
0.000000004
6.29802E-10
2.57978E-10
-1
.
.
.
.
11
MODEL1
IPC
density
.
5
0.19083
3.80343
0.43635
-0.03928
-0.00375
0.000784
0.000011
-.000005966
0.000001365
-.000000277
0.000000022
-.000000002
-5.7976E-11
1.38218E-10
-1.6296E-11
-1
.
.
.
.
12
MODEL1
IPCSEB
density
.
5
0.19083
0.08829
0.05001
0.03352
0.00516
0.002673
0.000063
0.000034381
0.000002507
0.000001152
0.000000032
0.000000006
3.49858E-10
6.15397E-10
3.2638E-11
-1
.
.
.
.
13
MODEL1
IPC
density
.
6
0.18640
3.78959
0.43635
-0.03051
-0.00375
0.000031
0.000011
0.000003618
0.000001365
0.000000049
0.000000022
-1.9966E-10
-5.7976E-11
-3.4562E-11
-1.6296E-11
-1
.
.
.
.
14
MODEL1
IPCSEB
density
.
6
0.18640
0.07187
0.04884
0.01259
0.00504
0.000297
0.000062
0.000006155
0.000002448
0.000000045
0.000000031
0.000000002
3.4172E-10
8.43127E-11
3.18788E-11
-1
.
.
.
.
15
MODEL1
IPC
density
.
7
0.18541
3.78959
0.39956
-0.03051
0.00065
0.000031
-0.000042
0.000003618
-.000000790
0.000000049
-.000000005
-1.9966E-10
2.12317E-10
-3.4562E-11
1.14131E-11
-1
.
.
.
.
16
MODEL1
IPCSEB
density
.
7
0.18541
0.07149
0.02533
0.01252
0.00075
0.000295
0.000017
0.000006122
0.000000185
0.000000045
0.000000002
0.000000002
1.50804E-10
8.38678E-11
5.50334E-12
-1
.
.
.
.
17
MODEL1
IPC
density
.
8
0.18461
3.75067
0.39956
-0.01968
0.00065
-0.000233
-0.000042
-.000001891
-.000000790
0.000000011
-.000000005
0.000000001
2.12317E-10
3.97994E-11
1.14131E-11
-1
.
.
.
.
18
MODEL1
IPCSEB
density
.
8
0.18461
0.05671
0.02522
0.00349
0.00075
0.000033
0.000017
0.000000240
0.000000184
0.000000014
0.000000002
5.04088E-10
1.50155E-10
1.46208E-11
5.47965E-12
-1
.
.
.
.
19
MODEL1
IPC
density
.
9
0.27408
3.75067
0.27437
-0.01968
0.00460
-0.000233
0.000050
-.000001891
0.000000167
0.000000011
-.000000012
0.000000001
-5.5104E-10
3.97994E-11
-1.7135E-11
-1
.
.
.
.
20
MODEL1
IPCSEB
density
.
9
0.27408
0.08419
0.01407
0.00518
0.00020
0.000048
0.000002
0.000000357
0.000000064
0.000000021
0.000000002
7.48362E-10
7.02193E-11
2.17058E-11
1.89009E-12
-1
.
.
.
.
21
MODEL1
IPC
density
.
10
0.32105
3.57408
0.27437
-0.00346
0.00460
-0.000087
0.000050
-.000002107
0.000000167
-.000000050
-.000000012
-.000000001
-5.5104E-10
-2.7115E-11
-1.7135E-11
-1
.
.
.
.
22
MODEL1
IPCSEB
density
.
10
0.32105
0.07360
0.01649
0.00067
0.00023
0.000017
0.000002
0.000000410
0.000000075
0.000000010
0.000000003
2.26086E-10
8.22544E-11
5.27968E-12
2.21404E-12
-1
.
.
.
.
23
MODEL1
IPC
density
.
11
0.97296
3.57408
0.03958
-0.00346
0.00147
-0.000087
0.000040
-.000002107
0.000000986
-.000000050
0.000000024
-.000000001
5.61273E-10
-2.7115E-11
1.32346E-11
-1
.
.
.
.
24
MODEL1
IPCSEB
density
.
11
0.97296
0.22306
0.00585
0.00204
0.00022
0.000052
0.000006
0.000001243
0.000000146
0.000000029
0.000000003
6.85169E-10
8.29035E-11
1.60004E-11
1.95483E-12
-1
.
.
.
.
25
MODEL1
IPC
density
.
12
1.00738
3.36007
0.03958
0.00000
0.00147
0.000000
0.000040
0
0.000000986
0
0.000000024
0
5.61273E-10
0
1.32346E-11
-1
.
.
.
.
26
MODEL1
IPCSEB
density
.
12
1.00738
0.19038
0.00605
0.00000
0.00022
0.000000
0.000006
0
0.000000151
0
0.000000004
0
8.5836E-11
0
2.02398E-12
-1
.
.
.
.
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputPCR.html (1 of 2)11/29/2005 9:29:08 AM
SAS Output
Algae Example: Polynomial Regression
13-degree polynomial around day 7.5
Principal Components Regression
Obs
_MODEL_
_TYPE_
_DEPVAR_
_RIDGE_
_PCOMIT_
_RMSE_
Intercept
day
day2
day3
day4
day5
day6
day7
day8
day9
day10
day11
day12
day13
density
_IN_
_P_
_EDF_
_RSQ_
mse0
edf0
j
edf
1
MODEL1
PARMS
density
.
.
0.21287
3.83499
0.31061
-0.12773
0.12901
0.036240
-0.038954
-.004839719
0.004613589
0.000282030
-.000249945
-.000007271
0.000006163
6.765007E-8
-5.57374E-8
-1
13
14
14
0.99091
0.045313
14
0
14
.
.
2
MODEL1
IPC
density
.
1
0.20919
3.83499
0.46060
-0.12773
-0.02351
0.036240
0.003337
-.004839719
-.000178178
0.000282030
0.000002015
-.000007271
0.000000077
6.765008E-8
-1.47597E-9
-1
.
.
.
.
0.045313
14
1
15
0.4853
0.49743
3
MODEL1
IPC
density
.
2
0.20549
3.78899
0.46060
-0.02126
-0.02351
-0.002636
0.003337
0.000191853
-.000178178
-.000002818
0.000002015
-.000000079
0.000000077
1.704715E-9
-1.47597E-9
-1
.
.
.
.
0.045313
14
2
16
0.4553
0.64335
4
MODEL1
IPC
density
.
3
0.20058
3.78899
0.41038
-0.02126
0.00336
-0.002636
-0.000422
0.000191853
0.000004307
-.000002818
0.000000213
-.000000079
0.000000002
1.704715E-9
-1.1229E-10
-1
.
.
.
.
0.045313
14
3
17
0.3647
0.77955
5
MODEL1
IPC
density
.
4
0.19530
3.80343
0.41038
-0.03928
0.00336
0.000784
-0.000422
-.000005966
0.000004307
-.000000277
0.000000213
-.000000002
0.000000002
1.38218E-10
-1.1229E-10
-1
.
.
.
.
0.045313
14
4
18
0.2879
0.88096
6
MODEL1
IPC
density
.
5
0.19083
3.80343
0.43635
-0.03928
-0.00375
0.000784
0.000011
-.000005966
0.000001365
-.000000277
0.000000022
-.000000002
-5.7976E-11
1.38218E-10
-1.6296E-11
-1
.
.
.
.
0.045313
14
5
19
0.2540
0.93074
7
MODEL1
IPC
density
.
6
0.18640
3.78959
0.43635
-0.03051
-0.00375
0.000031
0.000011
0.000003618
0.000001365
0.000000049
0.000000022
-1.9966E-10
-5.7976E-11
-3.4562E-11
-1.6296E-11
-1
.
.
.
.
0.045313
14
6
20
0.2225
0.96287
8
MODEL1
IPC
density
.
7
0.18541
3.78959
0.39956
-0.03051
0.00065
0.000031
-0.000042
0.000003618
-.000000790
0.000000049
-.000000005
-1.9966E-10
2.12317E-10
-3.4562E-11
1.14131E-11
-1
.
.
.
.
0.045313
14
7
21
0.2760
0.95322
9
MODEL1
IPC
density
.
8
0.18461
3.75067
0.39956
-0.01968
0.00065
-0.000233
-0.000042
-.000001891
-.000000790
0.000000011
-.000000005
0.000000001
2.12317E-10
3.97994E-11
1.14131E-11
-1
.
.
.
.
0.045313
14
8
22
0.3184
0.94571
10
MODEL1
IPC
density
.
9
0.27408
3.75067
0.27437
-0.01968
0.00460
-0.000233
0.000050
-.000001891
0.000000167
0.000000011
-.000000012
0.000000001
-5.5104E-10
3.97994E-11
-1.7135E-11
-1
.
.
.
.
0.045313
14
9
23
2.6809
0.04781
11
MODEL1
IPC
density
.
10
0.32105
3.57408
0.27437
-0.00346
0.00460
-0.000087
0.000050
-.000002107
0.000000167
-.000000050
-.000000012
-.000000001
-5.5104E-10
-2.7115E-11
-1.7135E-11
-1
.
.
.
.
0.045313
14
10
24
4.0593
0.00878
12
MODEL1
IPC
density
.
11
0.97296
3.57408
0.03958
-0.00346
0.00147
-0.000087
0.000040
-.000002107
0.000000986
-.000000050
0.000000024
-.000000001
5.61273E-10
-2.7115E-11
1.32346E-11
-1
.
.
.
.
0.045313
14
11
25
46.2079
0.00000
13
MODEL1
IPC
density
.
12
1.00738
3.36007
0.03958
0.00000
0.00147
0.000000
0.000040
0
0.000000986
0
0.000000024
0
5.61273E-10
0
1.32346E-11
-1
.
.
.
.
0.045313
14
12
26
47.3571
0.00000
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputPCR.html (2 of 2)11/29/2005 9:29:08 AM
ftest
pvalue
J.M. Hughes-Oliver
ST708 Applied Least Squares
Ridge Regression
• Assuming X is n × (p + 1) with a column of ones,
Y
=
Xβ + ²
=
1n β0 + X ∗ β ∗ + ²
=
1n β0 + X β ∗ + Xc β ∗ + ²
0 0
where X ∗ is n × p, X = [1n X ∗ ], β 0 = (β0 β ∗ )
∗
where X
∗
= [1n X 1 1n X 2 · · · 1n X p ] is n × p,
Xc = X ∗ − X
=
∗
1n β0 + X β ∗ + Xc D −1/2 D 1/2 β ∗ +²
{z
} | {z }
|
{z
} |
Z
1n Y
=
∗
is centered version of X ∗
where D = diag(Xc0 Xc )
δ
1n Y + Zδ + ²
and Z is centered and scaled version of X, omitting the intercept column
e
• Ridge Estimator from Ridge Factor k ≥ 0 is δ(k)
= (Z 0 Z + kI)−1 Z 0 Y

;
e0 (k)
β
e

,
β(k) =
∗
e (k)
β
• How to choose k?
– k = p · M SE/
Fall 2006

Pp
e∗ (k),
e0 (k) = Y − X ∗ β
β
e j ]2
[
δ(0)
j=1
e∗ (k) = D −1/2 δ(k)
e
β
Hoerl, Kennard, Baldwin (1975)
1
J.M. Hughes-Oliver
ST708 Applied Least Squares
e
– Ridge Trace: plot β(k)
j versus k for each j = 1, . . . , p. Select k when
estimates stop changing rapidly (not when get a flat line!)
– When V IFj stabilizes wrt k, for each j = 1, . . . , p.
Why is this called biased regression?
e
= (Z 0 Z + kI)−1 (Z 0 Z)δbOLS
δ(k)
h
i
0
−1 0
e
⇒ bias = E[δ(k) − δ] = (Z Z + kI) Z Z − I δ = −k(Z 0 Z + kI)−1 δ,
so that bias increases as Ridge Factor k increases.
Variance decreases as Ridge Factor k increases:
e
var[δ(k)]
= σ 2 (Z 0 Z + kI)−1 (Z 0 Z)(Z 0 Z + kI)−1
Why is this called a shrinkage estimator?
e
• δ(k)
→ 0 as k increases
• Bayesian interpretation:
σ2
– prior is δ ∼ 0, k I
– large k indicates strong belief that δ ≈ 0; prior dominates
– small k indicates little prior knowledge; data dominates
Fall 2006
2
J.M. Hughes-Oliver
ST708 Applied Least Squares
Algae Example–13-degree non-orthogonal case
Summary:
• VIF flags all explanatory variables. Drop them all?
• COLLINOINT flags 9 condition indices > 10, with all explanatory variables
appearing in the flagged principal components
• PCR: drop bottom 8 principal components, get estimate as
+
β(g)
+
s.e.(β(g) )
intercept
day
day2
day3
day4
...
3.75067
.39956
−.01968
.00065
−.00023
−.00004
0
0
0
0
0
0
0
0
.05671
.02522
.00349
.00075
.00003
.00002
0
0
0
0
0
0
0
0
• Ridge Reg’n: what k to use?
– k = 2.1077 × 10−9
Hoerl, Kennard, Baldwin (1975)
– Ridge Trace: k = .001
– V IFj : k = .004.
e
β(.004)
e
s.e.[β(.004)]
Fall 2006
I choose this one. Get estimate as
intercept
day
day2
day3
day4
...
3.77660
.39375
−.02762
.00001
−.00000
−.00002
0
0
0
0
0
0
0
0
.07919
.03081
.01152
.00166
.00030
.00003
0
0
0
0
0
0
0
0
3
J.M. Hughes-Oliver
ST708 Applied Least Squares
options nodate ls=85 ps=25 nonumber;
data algae;
input day density @@;
datalines;
1 .530 1 .184
2 1.183 2 .664
3 1.603 3 1.553
4 1.994 4 1.910
5 2.708 5 2.585
6 3.006 6 3.009
7 3.867 7 3.403
8 4.059 8 3.892
9 4.349 9 4.367
10 4.699 10 4.551
11 4.983 11 4.656
12 5.100 12 4.754
13 5.288 13 4.842
14 5.374 14 4.969
;
title1 height=.15in "Algae Example: Polynomial Regression"; run;
data algae2; set algae;
day=day-7.5;
day2=day*day; day3=day2*day; day4=day3*day; day5=day4*day; day6=day5*day;
day7=day6*day; day8=day7*day; day9=day8*day; day10=day9*day; day11=day10*day;
day12=day11*day; day13=day12*day;
run;
Fall 2006
4
J.M. Hughes-Oliver
ST708 Applied Least Squares
/* 13-degree polynomial, based on "day-7.5" variable *//* Ridge regression */
proc reg data=algae2 outest=fixcoll noprint;
title2 height=.15in "13-degree polynomial around day 7.5";
title3 height=.15in "Ridge Regression";
model density=day day2 day3 day4 day5 day6 day7 day8 day9 day10 day11 day12 day13/
ss1 ss2 vif collinoint ridge=0 to 0.02 by .001 edf outseb outstb outvif;
plot / ridgeplot vref=0 nomodel nostat;
title4 height=.15in "Ridge Trace";
run;
data choosek; set fixcoll; run;
proc gplot data=choosek;
where (_type_="RIDGEVIF" and day13<30);
plot (day--day13) * _ridge_ / overlay legend;
title4 "Ridge VIFs";
run;
proc print data=fixcoll; title4; run;
data chooseka; set fixcoll;
if (_type_="PARMS");
run;
data choosekb; set fixcoll(drop=_in_ _p_ _edf_ _rsq_);
if (_type_="RIDGESTB" and _ridge_=0.00);
run;
data choosek; merge chooseka choosekb;
k=_in_*_rmse_*_rmse_/ ( (day**2+day2**2+day3**2+day4**2+day5**2+day6**2+
day7**2+day8**2+day9**2+day10**2+day11**2+day12**2+day13**2) *
(_edf_*_rmse_*_rmse_/(1-_rsq_)) );
run;
title4 "Hoerl, Kennard, Baldwin Recommendation for k";
proc print; run;
Fall 2006
5
SAS Output
The REG Procedure
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputRidge.html (1 of 5)11/30/2005 4:51:48 AM
SAS Output
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputRidge.html (2 of 5)11/30/2005 4:51:48 AM
SAS Output
Algae Example: Polynomial Regression
13-degree polynomial around day 7.5
Ridge Regression
Obs
_MODEL_
_TYPE_
_DEPVAR_
_RIDGE_
_PCOMIT_
_RMSE_
Intercept
day
day2
day3
day4
day5
day6
day7
day8
day9
day10
day11
day12
day13
density
_IN_
_P_
_EDF_
_RSQ_
1
MODEL1
PARMS
density
.
.
0.21287
3.83499
0.311
-0.13
0.13
0.04
-0.04
-0.00
0.00
0.00
-0.00
-0.00
0.00
0.00
-0.00
-1
13
14
14
0.99091
2
MODEL1
SEB
density
.
.
0.21287
0.13527
0.262
0.18
0.23
0.06
0.06
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
3
MODEL1
RIDGEVIF
density
0.000
.
.
.
691.776
4332.20
524275.06
883079.00
54373099.72
24912055.76
1110720884.98
140632757.42
5205889507.87
160682024.12
5293103867.19
24351249.58
742862608.10
-1
.
.
.
.
4
MODEL1
RIDGE
density
0.000
.
0.21287
3.83499
0.311
-0.13
0.13
0.04
-0.04
-0.00
0.00
0.00
-0.00
-0.00
0.00
0.00
-0.00
-1
.
.
.
.
5
MODEL1
RIDGESTB
density
0.000
.
0.21287
0.00000
0.793
-1.17
10.41
14.14
-119.31
-79.33
568.56
195.06
-1270.56
-212.86
1307.64
83.95
-496.60
-1
.
.
.
.
6
MODEL1
RIDGESEB
density
0.000
.
0.21287
0.13527
0.262
0.18
0.23
0.06
0.06
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
7
MODEL1
RIDGEVIF
density
0.001
.
.
.
15.065
26.40
107.68
76.78
33.62
33.13
57.14
33.89
19.60
5.11
5.12
47.48
55.98
-1
.
.
.
.
8
MODEL1
RIDGE
density
0.001
.
0.22364
3.78767
0.416
-0.03
-0.00
0.00
-0.00
0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-0.00
-1
.
.
.
.
9
MODEL1
RIDGESTB
density
0.001
.
0.22364
0.00000
1.061
-0.28
-0.13
0.05
-0.06
0.03
0.06
-0.01
0.07
-0.01
0.02
0.00
-0.06
-1
.
.
.
.
10
MODEL1
RIDGESEB
density
0.001
.
0.22364
0.08392
0.041
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
11
MODEL1
RIDGEVIF
density
0.002
.
.
.
11.056
20.34
54.41
35.01
16.76
19.05
27.60
10.96
8.50
3.49
3.26
24.17
28.16
-1
.
.
.
.
12
MODEL1
RIDGE
density
0.002
.
0.22495
3.78308
0.406
-0.03
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-0.00
0.00
-0.00
-0.00
-1
.
.
.
.
13
MODEL1
RIDGESTB
density
0.002
.
0.22495
0.00000
1.036
-0.27
-0.06
0.02
-0.06
0.03
0.01
0.01
0.04
-0.01
0.02
-0.01
-0.01
-1
.
.
.
.
14
MODEL1
RIDGESEB
density
0.002
.
0.22495
0.08179
0.035
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
15
MODEL1
RIDGEVIF
density
0.003
.
.
.
9.337
17.20
34.17
23.79
11.36
14.45
16.43
5.74
4.90
2.87
2.58
17.44
17.94
-1
.
.
.
.
16
MODEL1
RIDGE
density
0.003
.
0.22616
3.77957
0.399
-0.03
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-0.00
0.00
-0.00
0.00
-1
.
.
.
.
17
MODEL1
RIDGESTB
density
0.003
.
0.22616
0.00000
1.019
-0.26
-0.02
0.01
-0.07
0.03
-0.01
0.01
0.02
-0.00
0.02
-0.01
0.01
-1
.
.
.
.
18
MODEL1
RIDGESEB
density
0.003
.
0.22616
0.08031
0.032
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
19
MODEL1
RIDGEVIF
density
0.004
.
.
.
8.357
14.96
24.25
18.44
8.78
11.85
11.02
3.70
3.23
2.47
2.23
13.99
12.96
-1
.
.
.
.
20
MODEL1
RIDGE
density
0.004
.
0.22738
3.77660
0.394
-0.03
0.00
-0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-0.00
0.00
-1
.
.
.
.
21
MODEL1
RIDGESTB
density
0.004
.
0.22738
0.00000
1.005
-0.25
0.00
-0.00
-0.07
0.03
-0.02
0.02
0.01
0.00
0.02
-0.01
0.02
-1
.
.
.
.
22
MODEL1
RIDGESEB
density
0.004
.
0.22738
0.07919
0.031
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
23
MODEL1
RIDGEVIF
density
0.005
.
.
.
7.694
13.23
18.61
15.15
7.29
10.05
7.98
2.68
2.32
2.18
2.00
11.77
10.12
-1
.
.
.
.
24
MODEL1
RIDGE
density
0.005
.
0.22863
3.77399
0.389
-0.03
0.00
-0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-0.00
0.00
-1
.
.
.
.
25
MODEL1
RIDGESTB
density
0.005
.
0.22863
0.00000
0.994
-0.25
0.02
-0.01
-0.07
0.02
-0.03
0.02
0.00
0.00
0.02
-0.01
0.02
-1
.
.
.
.
26
MODEL1
RIDGESEB
density
0.005
.
0.22863
0.07833
0.030
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
27
MODEL1
RIDGEVIF
density
0.006
.
.
.
7.195
11.83
15.05
12.87
6.32
8.70
6.10
2.08
1.77
1.96
1.84
10.16
8.32
-1
.
.
.
.
28
MODEL1
RIDGE
density
0.006
.
0.22995
3.77164
0.385
-0.03
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-0.00
0.00
-1
.
.
.
.
29
MODEL1
RIDGESTB
density
0.006
.
0.22995
0.00000
0.983
-0.24
0.04
-0.02
-0.06
0.02
-0.03
0.02
-0.00
0.01
0.02
-0.00
0.02
-1
.
.
.
.
30
MODEL1
RIDGESEB
density
0.006
.
0.22995
0.07769
0.029
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
31
MODEL1
RIDGEVIF
density
0.007
.
.
.
6.793
10.68
12.64
11.15
5.63
7.64
4.85
1.69
1.40
1.77
1.72
8.92
7.10
-1
.
.
.
.
32
MODEL1
RIDGE
density
0.007
.
0.23132
3.76952
0.381
-0.03
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-0.00
0.00
-1
.
.
.
.
33
MODEL1
RIDGESTB
density
0.007
.
0.23132
0.00000
0.974
-0.24
0.05
-0.02
-0.06
0.02
-0.04
0.02
-0.01
0.01
0.01
-0.00
0.02
-1
.
.
.
.
34
MODEL1
RIDGESEB
density
0.007
.
0.23132
0.07721
0.028
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
35
MODEL1
RIDGEVIF
density
0.008
.
.
.
6.454
9.72
10.90
9.81
5.11
6.77
3.98
1.42
1.15
1.62
1.61
7.94
6.21
-1
.
.
.
.
36
MODEL1
RIDGE
density
0.008
.
0.23275
3.76757
0.378
-0.03
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
37
MODEL1
RIDGESTB
density
0.008
.
0.23275
0.00000
0.965
-0.23
0.06
-0.02
-0.06
0.02
-0.04
0.02
-0.01
0.01
0.01
0.00
0.02
-1
.
.
.
.
38
MODEL1
RIDGESEB
density
0.008
.
0.23275
0.07687
0.028
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
39
MODEL1
RIDGEVIF
density
0.009
.
.
.
6.159
8.91
9.61
8.72
4.70
6.05
3.35
1.22
0.97
1.49
1.53
7.13
5.54
-1
.
.
.
.
40
MODEL1
RIDGE
density
0.009
.
0.23423
3.76576
0.375
-0.03
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
41
MODEL1
RIDGESTB
density
0.009
.
0.23423
0.00000
0.957
-0.23
0.07
-0.03
-0.06
0.01
-0.04
0.02
-0.01
0.01
0.01
0.00
0.02
-1
.
.
.
.
42
MODEL1
RIDGESEB
density
0.009
.
0.23423
0.07664
0.027
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputRidge.html (3 of 5)11/30/2005 4:51:48 AM
SAS Output
43
MODEL1
RIDGEVIF
density
0.010
.
.
.
5.898
8.22
8.60
7.82
4.36
5.44
2.87
1.07
0.83
1.38
1.45
6.45
5.02
-1
.
.
.
.
44
MODEL1
RIDGE
density
0.010
.
0.23575
3.76408
0.372
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
45
MODEL1
RIDGESTB
density
0.010
.
0.23575
0.00000
0.949
-0.23
0.08
-0.03
-0.05
0.01
-0.04
0.02
-0.01
0.01
0.01
0.00
0.02
-1
.
.
.
.
46
MODEL1
RIDGESEB
density
0.010
.
0.23575
0.07650
0.027
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
47
MODEL1
RIDGEVIF
density
0.011
.
.
.
5.662
7.63
7.79
7.07
4.07
4.93
2.50
0.95
0.73
1.29
1.39
5.88
4.59
-1
.
.
.
.
48
MODEL1
RIDGE
density
0.011
.
0.23730
3.76251
0.369
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
49
MODEL1
RIDGESTB
density
0.011
.
0.23730
0.00000
0.942
-0.22
0.09
-0.03
-0.05
0.01
-0.04
0.01
-0.01
0.01
0.01
0.01
0.02
-1
.
.
.
.
50
MODEL1
RIDGESEB
density
0.011
.
0.23730
0.07643
0.026
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
51
MODEL1
RIDGEVIF
density
0.012
.
.
.
5.448
7.11
7.14
6.43
3.83
4.48
2.20
0.86
0.65
1.21
1.33
5.39
4.24
-1
.
.
.
.
52
MODEL1
RIDGE
density
0.012
.
0.23888
3.76102
0.366
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
53
MODEL1
RIDGESTB
density
0.012
.
0.23888
0.00000
0.935
-0.22
0.09
-0.04
-0.05
0.01
-0.04
0.01
-0.02
0.01
0.01
0.01
0.02
-1
.
.
.
.
54
MODEL1
RIDGESEB
density
0.012
.
0.23888
0.07643
0.026
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
55
MODEL1
RIDGEVIF
density
0.013
.
.
.
5.251
6.65
6.59
5.88
3.61
4.10
1.96
0.78
0.58
1.14
1.28
4.96
3.95
-1
.
.
.
.
56
MODEL1
RIDGE
density
0.013
.
0.24049
3.75962
0.364
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
57
MODEL1
RIDGESTB
density
0.013
.
0.24049
0.00000
0.928
-0.22
0.10
-0.04
-0.04
0.01
-0.04
0.01
-0.02
0.01
0.00
0.01
0.02
-1
.
.
.
.
58
MODEL1
RIDGESEB
density
0.013
.
0.24049
0.07648
0.026
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
59
MODEL1
RIDGEVIF
density
0.014
.
.
.
5.069
6.25
6.12
5.40
3.42
3.76
1.77
0.72
0.53
1.08
1.23
4.59
3.69
-1
.
.
.
.
60
MODEL1
RIDGE
density
0.014
.
0.24212
3.75829
0.361
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
61
MODEL1
RIDGESTB
density
0.014
.
0.24212
0.00000
0.922
-0.22
0.11
-0.04
-0.04
0.01
-0.04
0.01
-0.02
0.01
0.00
0.01
0.02
-1
.
.
.
.
62
MODEL1
RIDGESEB
density
0.014
.
0.24212
0.07657
0.026
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
63
MODEL1
RIDGEVIF
density
0.015
.
.
.
4.900
5.90
5.72
4.98
3.25
3.46
1.60
0.66
0.48
1.02
1.18
4.27
3.48
-1
.
.
.
.
64
MODEL1
RIDGE
density
0.015
.
0.24376
3.75701
0.359
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
65
MODEL1
RIDGESTB
density
0.015
.
0.24376
0.00000
0.916
-0.21
0.11
-0.04
-0.04
0.00
-0.04
0.01
-0.02
0.01
0.00
0.01
0.01
-1
.
.
.
.
66
MODEL1
RIDGESEB
density
0.015
.
0.24376
0.07670
0.025
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
67
MODEL1
RIDGEVIF
density
0.016
.
.
.
4.743
5.58
5.37
4.61
3.10
3.20
1.47
0.62
0.45
0.97
1.14
3.98
3.28
-1
.
.
.
.
68
MODEL1
RIDGE
density
0.016
.
0.24541
3.75580
0.356
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-1
.
.
.
.
69
MODEL1
RIDGESTB
density
0.016
.
0.24541
0.00000
0.910
-0.21
0.12
-0.04
-0.03
0.00
-0.04
0.01
-0.02
0.01
-0.00
0.01
0.01
-1
.
.
.
.
70
MODEL1
RIDGESEB
density
0.016
.
0.24541
0.07686
0.025
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
71
MODEL1
RIDGEVIF
density
0.017
.
.
.
4.595
5.30
5.06
4.29
2.96
2.97
1.35
0.58
0.42
0.93
1.10
3.73
3.11
-1
.
.
.
.
72
MODEL1
RIDGE
density
0.017
.
0.24708
3.75463
0.354
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-1
.
.
.
.
73
MODEL1
RIDGESTB
density
0.017
.
0.24708
0.00000
0.904
-0.21
0.12
-0.05
-0.03
0.00
-0.04
0.01
-0.02
0.01
-0.00
0.01
0.01
-1
.
.
.
.
74
MODEL1
RIDGESEB
density
0.017
.
0.24708
0.07704
0.025
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
75
MODEL1
RIDGEVIF
density
0.018
.
.
.
4.457
5.05
4.78
4.00
2.83
2.76
1.25
0.54
0.39
0.89
1.06
3.50
2.96
-1
.
.
.
.
76
MODEL1
RIDGE
density
0.018
.
0.24875
3.75351
0.352
-0.02
0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-1
.
.
.
.
77
MODEL1
RIDGESTB
density
0.018
.
0.24875
0.00000
0.899
-0.21
0.13
-0.05
-0.03
0.00
-0.04
0.01
-0.02
0.01
-0.00
0.01
0.01
-1
.
.
.
.
78
MODEL1
RIDGESEB
density
0.018
.
0.24875
0.07725
0.025
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
79
MODEL1
RIDGEVIF
density
0.019
.
.
.
4.327
4.82
4.54
3.74
2.71
2.58
1.16
0.51
0.37
0.85
1.03
3.30
2.82
-1
.
.
.
.
80
MODEL1
RIDGE
density
0.019
.
0.25042
3.75242
0.350
-0.02
0.00
-0.00
-0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-1
.
.
.
.
81
MODEL1
RIDGESTB
density
0.019
.
0.25042
0.00000
0.894
-0.21
0.13
-0.05
-0.02
-0.00
-0.04
0.01
-0.02
0.01
-0.01
0.01
0.01
-1
.
.
.
.
82
MODEL1
RIDGESEB
density
0.019
.
0.25042
0.07747
0.024
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
83
MODEL1
RIDGEVIF
density
0.020
.
.
.
4.204
4.61
4.31
3.51
2.60
2.41
1.08
0.49
0.35
0.82
1.00
3.12
2.69
-1
.
.
.
.
84
MODEL1
RIDGE
density
0.020
.
0.25210
3.75138
0.348
-0.02
0.00
-0.00
-0.00
-0.00
-0.00
0.00
-0.00
0.00
-0.00
0.00
0.00
-1
.
.
.
.
85
MODEL1
RIDGESTB
density
0.020
.
0.25210
0.00000
0.889
-0.20
0.14
-0.05
-0.02
-0.00
-0.04
0.01
-0.02
0.01
-0.01
0.02
0.01
-1
.
.
.
.
86
MODEL1
RIDGESEB
density
0.020
.
0.25210
0.07771
0.024
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-1
.
.
.
.
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputRidge.html (4 of 5)11/30/2005 4:51:48 AM
SAS Output
Algae Example: Polynomial Regression
13-degree polynomial around day 7.5
Ridge Regression
Hoerl, Kennard, Baldwin Recommendation for k
Obs
1
_MODEL_
MODEL1
_TYPE_
RIDGESTB
_DEPVAR_
density
_RIDGE_
_PCOMIT_
_RMSE_
Intercept
0
.
0.21287
0
file:///C|/Documents%20and%20Settings/hughesol/My%20Documents/LAPTOP/Instruction/ST708/2005Fall/Notes/RemedialCollinearity/SASoutputRidge.html (5 of 5)11/30/2005 4:51:48 AM
day
day2
day3
day4
day5
day6
day7
day8
day9
day10
day11
day12
day13
0.79318
-1.16696
10.4058
14.1412
-119.310
-79.3262
568.565
195.058
-1270.56
-212.863
1307.64
83.9458
-496.605
density
_IN_
_P_
_EDF_
_RSQ_
k
-1
13
14
14
0.99091
2.1078E-9