Lecture 25

Statistics 350
Lecture 25
Today
• Last Day: Start Chapter 9 (9.1-9.3)…please read 9.1 and 9.2 thoroughly
• Today: More Chapter 9…stepwise regression
Stepwise Variable Selection
•
Three categories of stepwise variable selection:
1.
2.
3.
Forward Selection
•
For all methods considered today, have P-1 possible predictors and 2P-1
possible models
•
•
•
Start with no variables in model
Select a significance level at which variables can be included in the model
Find the critical value of F for this level: FENTER
Forward Selection
•
Consider every possible 1-variable model,
Forward Selection
•
Each time a variable is entered into the model (i.e. the maximum F is big
enough), then use the newly-augmented model as the base model
•
Check extra SS for each remaining variable
•
For example, if Xa is entered, then at the next step, check all SSR(Xk|Xa) for
all variables (other than Xa, which is already in the model)
•
Keep adding variables and revising the base model until at some step F* <
FENTER. Then no more variables can be added.
Forward Selection
•
The final model is the last base model.
•
Procedure gives a single model, declared best by the procedure
•
Also, once a variable is added, it can never be removed, even if subsequent
additions render it unimportant (e.g. through multi-collinearity)
Backward Elimination
•
Start with all varaibles:
Backward Elimination
•
Consider all possible 1-varaible reduction in the model size
Backward Elimination
•
Each time a variable is dropped, use the revised model as the base model and
check all the extra SS for variables remaining in the model
•
Keep eliminating variables and revising the model until all variables
remaining in the model have Fk > FSTAY
Backward Elimination
•
Gives a best model according to this criterion
•
It may differ from the one given in Forward Selection
•
Once a variable is removed, it remains out of the model, even if subsequent
eliminations render it useful
Stepwise Selection
•
Alternates between Forward and Backward steps to address the problems
noted above
•
Start with no variables in the model
Stepwise Selection
•
After each Backward phase, use the revised model as the base model from
which to begin another round of Forward/Backward
•
Continue until no further variables can be added or removed
Stepwise Selection
•
Note that in each forward phase, only one variable can be added before the
new model is trimmed with (possibly multiple steps of) backward elimination
•
The final model may or may not match either of the models obtained using
Forward Selection or Backward Elimination alone
Comments
•
In all cases, methods based on insertion or deletion criteria
•
In forward steps:
•
In backward steps:
Comments
•
Significance level is a personal decision
•
Common practice in regression to use slightly higher levels of α in allowing
variables to enter into or remain in the model than in other testing situations
Comments
•
Note that in Stepwise Selection, you must arrange for αENTER <= α STAY
•
or, equivalently, FENTER >=FSTAY
•
Otherwise, a variable's p-value could be small enough to include but large
enough to eliminate in each step, leading to an infinite loop
•
One suggestion is to use α STAY = 2 α ENTER