ch06-sec1-4.pdf

ST430 Introduction to Regression Analysis
ST430: Introduction to Regression Analysis, Chapter 6,
Sections 6.1-6.4
Luo Xiao
October 21, 2015
1 / 23
ST430 Introduction to Regression Analysis
Variable Screening Methods
2 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Variable screening
You will often have many candidate variables to use as independent
variables in a regression model.
Using all of them may be infeasible (more parameters than observations).
Even if feasible, a prediction equation with many parameters may not
perform well:
in validation;
in application.
3 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Stepwise regression
How to choose the subset to use?
One approach: stepwise regression.
Example
Executive salary, with 10 candidate variables:
setwd("/Users/xiaoyuesixi/Dropbox/teaching/2015Fall/R_datasets
load("EXEXSAL2.Rdata")# load in data
dim(EXEXSAL2) #dimensions of data
EXEXSAL2[1:5,] #look at first 5 rows
pairs(EXEXSAL2[,-1]) #pairwise plots
4 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Variables
X1 Experience (years)
X2 Education (years)
X3 Gender (1 if male, 0 if female)
X4 Number of employees supervised
X5 Corporate assets ($ millions)
X6 Board member (1 if yes, 0 if no)
X7 Age (years)
X8 Company profits (past 12 months, $ millions)
X9 Has international responsibility (1 if yes, 0 if no)
X10 Company’s total sales (past 12 months, $ millions)
5 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Note that X3, X6, and X9 are indicator variables.
The complete second-order model is quadratic in the 7 quantitative
variables, with interactions with all combinations of the indicator variables.
A quadratic function of 7 variables has 36 coefficients.
The complete second-order model has 36 × 8 = 288 parameters.
Infeasible: the data set has only 100 observations.
6 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Forward stepwise selection
First, consider all the one-variable models
E (Y ) = β0 + βj Xj , j = 1, 2, . . . , k.
For each, test the hypothesis H0 : βj = 0 at some level α.
If none is significant, the model is E (Y ) = β0 .
Otherwise, choose the best (in terms of R 2 , Ra2 , |t|, |r |; it doesn’t matter);
call the variable Xj1 .
7 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Now consider all two-variable models that include Xj1 :
E (Y ) = β0 + βj1 Xj1 + βj Xj ,
j 6= j1 .
For each, test the significance of the new coefficient βj .
If none is significant, the model is E (Y ) = β0 + βj1 Xj1 .
Otherwise, choose the best new variable; call it Xj2 .
Continue adding variables until no remaining variable is significant at level
α.
8 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Backward stepwise elimination
Alternatively, begin with the model containing all the variables, the full
first-order model (assuming you can fit it).
Test the significance of each coefficient at some level α.
If all are significant, use that model.
Otherwise, eliminate the least significant variable (smallest |t|, smallest
reduction in R 2 , . . . ; again, it doesn’t matter which).
Continue eliminating variables until all remaining variables are significant at
level α.
9 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Either forward selection or backward elimination could be used to select a
subset of variables for further study.
Problem: forward selection and backward elimination may identify different
subsets.
10 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Bidirectional stepwise regression
A combination of forward selection and backward elimination.
Choose a starting model; it could be:
no independent variables;
all independent variables;
some other subset of independent variables suggested a priori.
Look for a variable to add to the model, by adding each candidate, one at a
time, and testing the significance of the coefficient.
Then look for a variable to eliminate, by testing all coefficients.
You could use a different α-to-enter and α-to-remove, with
αenter < αremove .
11 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Repeat both steps until no variable can be added or eliminated.
The final model is one at which both forward selection and backward
elimination would terminate.
But it is still possible that you get different final models depending on the
choice of initial model.
12 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Criterion-based stepwise regression
In hypothesis-test-based subset selection, many tests are used.
Each test, in isolation, has a specified error rate α.
The per-test error rate α controls the choice of final subset, but in an
indirect way.
13 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Modern methods are instead based on improving a criterion such as:
Adjusted coefficient of determination, Ra2 ;
MSE, s 2 (equivalent to Ra2 );
Mallows’s Cp criterion;
PRESS criterion;
Akaike’s information criterion, AIC.
PRESS and AIC are equivalent when n is large.
14 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
R code for stepwise regression with AIC
start = lm(Y~1,data=EXEXSAL2) # initial model
full = Y~X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10
#forward stepwise selection
# see "output1.txt"
forward = step(start,scope=full,method="forward")
summary(forward)
#backward stepwise selection
# see "output2.txt"
backward = step(lm(full,data=EXEXSAL2),method="backward")
summary(backward)
15 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Note:
AIC = n log(σ̂ 2 ) + 2(k + 1)
This works well when choosing from nested models.
But in the example, the 5-variable model is the best of
models.
10
5
= 252 possible
Some statisticians prefer the Bayesian Information Criterion
BIC = n log(σ̂ 2 ) + (log n)(k + 1)
16 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
BIC imposes a higher penalty on the number of parameters in the model.
In R:
Use ’step(start, scope = all, k = log(nrow(EXEXSAL2)))’; default for ’k’ is
2, corresponding to AIC.
Final model is the same in this case, and will never be larger, but may be
smaller.
R function for extracting AIC/BIC from a fit:
’extractAIC(fit,k=2)’ for AIC; ’extract(fit, k = log(n))’ for BIC.
17 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Best subset regression
When used with a criterion, stepwise regression terminates with a subset of
variables that cannot be improved by adding or dropping a single variable.
That it, it is locally optimal.
But some other subset may have a better value of the criterion.
In R, the bestglm package implements best subset regression for various
criteria.
18 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Concerning best subset methods, the text asserts that
these techniques lack the objectivity of a stepwise regression
procedure.
I disagree.
Finding the subset of variables that optimizes some criterion is
completely objective.
In fact, because of the opaque way that choosing α controls the
procedure, I argue that stepwise regression lacks the transparency of
best subset regression
19 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Why not use stepwise methods to build a complete model?
We need to try second-order terms like products of independent variables
(interactions) and squared terms (curvatures).
Some software tools do not know that an interaction should be included
only if both main effects are also included–but ’step()’ does.
Try the full second-order model:
all <- Y ~ ((X1 + X2 + X4 + X5
+ I(X1^2) + I(X2^2) +
+ I(X7^2) + I(X8^2) +
summary(step(start, scope = all, k
20 / 23
+ X7 + X8 + X10)^2
I(X4^2) + I(X5^2)
I(X10^2)) * X3 * X6 * X9
= log(nrow(EXEXSAL2))))
Variable Screening Methods
ST430 Introduction to Regression Analysis
Oops!
The model includes I(X 42 ) but not X 4.
The R function ’step()’ is smart enough not to include an interaction
without all its main effects, but does not know that I(X 42 ) should not be
included without X 4.
We can fix it manually, by forcing X 4 into the model:
start <- lm(Y ~ X4, EXEXSAL2)
summary(step(start, scope=list(lower = Y ~ X4, upper = all),
k = log(nrow(EXEXSAL2))))
21 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Footnote
The model that we arrived at using BIC-based stepwise regression is the
same as was used in Example 4.10, where it was proposed with no
discussion.
Testing for gender bias:
lmFull <- lm(Y ~ X4 + X1 + X3 + X2 + X5 + I(X1^2) + X4:X3,
data = EXEXSAL2)
lmReduced <- lm(Y ~ X4 + X1 + X2 + X5 + I(X1^2),
data = EXEXSAL2)
anova(lmReduced, lmFull)
22 / 23
Variable Screening Methods
ST430 Introduction to Regression Analysis
Alternative method: penalized regression with LASSO.
LASSO minimizes
SSE + λ
k
X
|βj |,
j=1
where λ is a tuning parameter.
If λ = 0: ususal least squares estimation;
If λ = ∞: all parameters are 0;
A finite λ might be ideal: forcing some parameters to be zero.
23 / 23
Variable Screening Methods