Homework #2 (Multiple and stepwise regression)

HZAU MULTIVARIATE HOMEWORK #2
MULTIPLE AND STEPWISE LINEAR REGRESSION
Using the malt quality dataset on the class’s Web page:
1. Determine the simple linear correlation of extract with the remaining variables.
a. Which of the variables have a correlation with extract that is significantly different from
zero.
b. Would you consider the correlation of the variables identified in part a as having a strong
or weak association with extract? Explain your answer.
2. Determine the partial correlation between extract and viscosity, while controlling for beta-glucan
content.
a. Report the partial correlation value.
b. When controlling for beta-glucan content, would you consider the relationship between
malt extract and viscosity to spurious? Explain your answer.
3. Develop a regression model to predict the percent malt extract using the remaining variables as
the independent variables.
a. What is the regression equation?
b. What independent variables have regression coefficients significantly different from
zero?
c. What percent of the variation in extract is explained by your regression model?
d. Would you consider the regression model to adequately explain the variation in extract?
Explain your answer.
4. Using stepwise regression, develop a regression model that includes those independent variables
that significantly contribute to explaining the variation in extract.
a. What is the regression equation?
b. What percent of the variation in extract is explained by your regression model?
c. Would you consider the regression model to adequately explain the variation in extract?
Explain your answer.
options pageno=1;
data hmwk2;
input Line $
Plump Protein
Solprot
Color FAN
Betagluc
Viscosity
Fructose
datalines;
.
.
Extract
amylase
DP
Glucose
Maltose
Maltotriose;
Insert malt quality data from class
Web page.
kolbach
.
;;
proc corr;
var extract;
with Plump Protein
amylase
DP
kolbach
Solprot
Color FAN
Betagluc Viscosity Fructose
Glucose
Maltose
Maltotriose;
title 'Correlation of Extract with Independent Variables Related to Quality';
run;
Proc corr;
var extract viscosity;
partial betagluc;
title 'Partial Correlation of Extract and Viscosity While Controlling Betaglucan content';
run;
proc reg;
model extract=Plump
Protein
amylase
DP
kolbach
Solprot
Color FAN
Betagluc
Viscosity
Fructose
Glucose
Maltose
Maltotriose;
title 'Multiple Regression of Extract with the Remaining Quality Traits';
run;
proc stepwise;
model extract=Plump
Protein
amylase
DP
kolbach
Solprot
Color FAN
Betagluc
Viscosity
Fructose
Glucose
Maltose
Maltotriose;
title 'Stepwise Regression of Extract with the Remaining Quality Traits';
run;
12:28 Thursday, June 21, 2012 1
Correlation of Extract with Independent Variables Related to Quality
The CORR Procedure
14 With
Variables:
Plump
Protein amylase DP
kolbach Solprot
Viscosity Fructose Glucose Maltose Maltotriose
1
Extract
Variables:
Color
FAN
Simple Statistics
Variable
N
Mean
Std Dev
Sum Minimum Maximum
Plump
61
93.01475
1.97753
5674
89.10000 96.55000
Protein
61
12.53934
0.36721 764.90000
11.75000 13.45000
amylase
61
67.36557
6.41523
55.95000 81.45000
DP
61 143.04590 12.44975
8726 118.30000 176.20000
kolbach
61
50.69426
3092
Solprot
61
6.35467
0.45995 387.63500
4.83500
7.14000
Color
61
2.57869
0.52974 157.30000
1.80000
3.75000
FAN
61 320.06803 31.37553
19524 258.05000 406.80000
Betagluc
61 175.49262 36.41330
10705 124.40000 261.95000
Viscosity
61
1.46426
0.02101 89.32000
Fructose
61
0.08822
0.07869
Glucose
61
Maltose
3.69884
4109
40.60000 57.80000
1.41500
1.51500
5.38150
0
0.31700
1.04295
0.09717 63.62000
0.89000
1.28500
61
3.99230
0.16880 243.53000
3.50500
4.27000
Maltotriose 61
0.98893
0.07334 60.32500
0.68000
1.25500
Extract
61
78.31721
0.70651
4777
76.80000 79.70000
Betagluc
Pearson Correlation Coefficients, N = 61
Prob > |r| under H0: Rho=0
Extract
Plump
-0.10224
0.4330
Protein
-0.26267
0.0408
amylase
0.04370
0.7381
DP
-0.26047
0.0426
kolbach
0.16305
0.2093
Solprot
0.05783
0.6580
Color
0.18480
0.1539
FAN
-0.07966
0.5417
Betagluc
-0.45925
0.0002
Viscosity
-0.31965
0.0120
Fructose
0.12030
0.3557
Glucose
0.19140
0.1395
Maltose
-0.05729
0.6610
Maltotriose
0.18916
0.1443
12:28 Thursday, June 21, 2012 3
Partial Correlation of Extract and Viscosity While Controlling Beta-glucan content
The CORR Procedure
1 Partial Variables: Betagluc
2
Variables:
Extract
Viscosity
Simple Statistics
Variable
Std Dev
Sum Minimum Maximum
Betagluc 61 175.49262 36.41330
10705 124.40000 261.95000
Extract
N
Mean
61
78.31721
Viscosity 61
1.46426
0.70651
4777
0.02101 89.32000
76.80000 79.70000
1.41500
Viscosity
0.40055 0.63289
1.51500 0.0004387 0.02095
Pearson Partial Correlation Coefficients, N = 61
Prob > |r| under H0: Partial Rho=0
Extract
Partial Partial
Variance Std Dev
Extract
Viscosity
1.00000
-0.28477
0.0274
-0.28477
0.0274
1.00000
Number of Observations Read 61
Number of Observations Used
61
Analysis of Variance
Sum of
Squares
Mean
Square F Value Pr > F
Source
DF
Model
14 12.79922 0.91423
Error
46 17.15020 0.37283
Corrected Total
60 29.94943
Root MSE
2.45 0.0113
0.61060 R-Square 0.4274
Dependent Mean 78.31721 Adj R-Sq 0.2531
Coeff Var
0.77965
Parameter Estimates
Variable
DF
Parameter Standard
Estimate
Error t Value Pr > |t|
Intercept
1 104.48451 28.92954
3.61 0.0007
Plump
1
0.02014
0.05121
0.39 0.6959
Protein
1
-0.75220
2.00083
-0.38 0.7087
amylase
1
0.00776
0.01664
0.47 0.6432
DP
1
-0.01021
0.01091
-0.94 0.3544
kolbach
1
-0.06250
0.48064
-0.13 0.8971
Solprot
1
0.52908
3.91020
0.14 0.8930
Color
1
0.13624
0.31566
0.43 0.6680
FAN
1
-0.00303
0.00299
-1.01 0.3175
Betagluc
1
-0.00673
0.00276
-2.44 0.0187
Viscosity
1 -12.02975
4.45003
-2.70 0.0096
Fructose
1
1.44893
1.32733
1.09 0.2807
Glucose
1
-0.44204
1.17175
-0.38 0.7077
Maltose
1
0.36908
0.72849
0.51 0.6148
Maltotriose
1
0.41147
1.45583
0.28 0.7787
12:28 Thursday, June 21, 2012 5
Stepwise Regression of Extract with the Remaining Quality Traits
The STEPWISE Procedure
Model: MODEL1
Dependent Variable: Extract
Number of Observations Read 61
Number of Observations Used
61
Stepwise Selection: Step 1
Variable Betagluc Entered: R-Square = 0.2109 and C(p) = 6.3871
Analysis of Variance
Source
DF
Model
1
Sum of
Squares
Mean
Square F Value Pr > F
6.31677 6.31677
Error
59 23.63265 0.40055
Corrected Total
60 29.94943
Variable
15.77 0.0002
Parameter Standard
Estimate
Error Type II SS F Value Pr > F
Intercept
79.88098
0.40203
Betagluc
-0.00891
0.00224
15814 39479.1 <.0001
6.31677
15.77 0.0002
Bounds on condition number: 1, 1
Stepwise Selection: Step 2
Variable Viscosity Entered: R-Square = 0.2749 and C(p) = 3.2468
Analysis of Variance
Source
DF
Model
2
Sum of
Squares
Mean
Square F Value Pr > F
8.23325 4.11663
Error
58 21.71617 0.37442
Corrected Total
60 29.94943
Variable
10.99 <.0001
Parameter Standard
Estimate
Error Type II SS F Value Pr > F
Intercept
92.34866
5.52445 104.62588 279.44 <.0001
Betagluc
-0.00816
0.00219
5.17312
13.82 0.0005
Viscosity
-8.60486
3.80337
1.91648
5.12 0.0274
Bounds on condition number: 1.0235, 4.0941
Stepwise Selection: Step 3
Variable DP Entered: R-Square = 0.3424 and C(p) = -0.1711
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square F Value Pr > F
Model
3 10.25321 3.41774
Error
57 19.69621 0.34555
Corrected Total
60 29.94943
Variable
9.89 <.0001
Parameter Standard
Estimate
Error Type II SS F Value Pr > F
Intercept
97.60920
5.73588 100.06673 289.59 <.0001
DP
-0.01539
0.00636
2.01996
5.85 0.0188
Betagluc
-0.00714
0.00215
3.80458
11.01 0.0016
Viscosity
-10.81702
3.76662
2.84984
8.25 0.0057
Bounds on condition number: 1.0898, 9.7265
12:28 Thursday, June 21, 2012 7
Stepwise Regression of Extract with the Remaining Quality Traits
The STEPWISE Procedure
Model: MODEL1
Dependent Variable: Extract
Stepwise Selection: Step 3
Stepwise Selection: Step 4
Variable Protein Entered: R-Square = 0.3786 and C(p) = -1.0835
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square F Value Pr > F
Model
4 11.33903 2.83476
Error
56 18.61039 0.33233
Corrected Total
60 29.94943
Variable
8.53 <.0001
Parameter Standard
Estimate
Error Type II SS F Value Pr > F
Intercept 104.18019
6.69752
Protein
-0.39460
0.21830
1.08582
3.27 0.0760
DP
-0.01341
0.00634
1.48878
4.48 0.0388
Betagluc
-0.00621
0.00217
2.71760
8.18 0.0059
Viscosity
-12.22975
3.77565
3.48674
10.49 0.0020
80.40991 241.96 <.0001
Bounds on condition number: 1.1602, 18.191
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1500 significance level for entry into the model.
Summary of Stepwise Selection
Variable Variable Number
Partial
Model
Step Entered Removed Vars In R-Square R-Square
C(p) F Value Pr > F
1 Betagluc
1
0.2109
0.2109 6.3871
15.77 0.0002
2 Viscosity
2
0.0640
0.2749 3.2468
5.12 0.0274
3 DP
3
0.0674
0.3424 -0.1711
5.85 0.0188
4 Protein
4
0.0363
0.3786 -1.0835
3.27 0.0760