Linear Regression

Regression
Lesson 12
Predicting Outcomes
Predictions
 Based on what has happened in past
 Predict length of stay (LOS) in hospital
after a heart attack
 Use mean LOS as predictor
 More information  better prediction
 Predictors: age, gender, smoker,
treatment, etc.
 Regression:
 Predictor(s)  Outcome ~

The General Linear Model

Relationship b/n predictor & outcome
variables form straight line
 Correlation, regression, t-tests,
analysis of variance
 Other more complex models ~
Describing Lines
All lines defined by simple equation
 Relationship b/n X and Y
 Only 2 points required
 Slope (or gradient)
 Amount Y changes, when X increases
by 1
 Intercept
 Value of Y when X = 0
~

Describing Lines
Intercept: = 2
If X = 2, then Y = 4
Slope: = 1
8
Y
6
4
2
0
0
2
4
8
6
X
10
12
Regression
Correlation
 Measures strength of relationship
 Regression
 Predict value of variable
 Predictor (X)  outcome (Y)
 Multiple predictor variables (Xn)
 More complex model, but...
 Same logic and basic process
 Regression equation
 Defines regression line ~

Regression Coefficients
Give slope & intercept of regression line
 b1 (or b)
 Slope (or gradient)
 Amount Y changes, when  X by 1
 b0 (or a)
 Intercept
 Value of Y when X = 0


ei = residual or error
• Theoretical, not used in calculation ~
Regression Model
outcomei = model + error
Yi  (b0  b1 X i )  e i
or
Yi  b X  a
Method of Least Squares


Residuals (ei )
 Like deviation score
 Error between predicted score & actual
score
Best fit line
 Minimizes residuals ~
Assessing Fit of Model


Model = regression line
R2
 Coefficient of determination

Goodness of Fit
F test
 Is regression model better predictor than
mean?
 If p < .05: model better predictor of Y than
the mean ~


variance explained by model
R 
total variance
2
Regression Equation & Prediction

My yearly YMCA costs
 Y = my total annual cost
 X = # premium classes taken
Each
pilates or tae kwan do
class


Annual fee: $500
 Intercept (b0)
Extra $10 for each
 Slope (b1) ~
Y  b0  b1 X
Y  500  10(6)
Y  500  60
Y  560
Regression Models

Simple regression


Yi  (b0  b1 X i )  e i
Multiple regression

Yi  (b0  b1 X 1  b2 X 2  ...bn X n )  e i
Correlation Coefficients

b0




b1


is the intercept
value of the Y when all Xs = 0
where regression plane crosses the Y-axis
regression coefficient for predictor variable
1 (X1)
b2

regression coefficient for predictor variable
2 (X2) ~
Interpreting Regression
Model summary
 R = r (correlation coefficient)
2
 R = % variance explained by model
 ANOVA (analysis of variance)
 F test
 Tests H0: model = mean as predictor.
*H1 : model better predictor
 Sig.: < .05 then model is better
predictor than mean ~

Regression in SPSS
Data entry
 1 column per variable, like correlation
 Menus
 Analyze  Regression Linear
 Dialog box
 Outcome variable  Dependent
 Predictor variable  Independent(s)

Only one for simple regression
 do not use options ~

SPSS: Multiple Regression



Data entry
 1 column per variable, like simple
Menus
 Analyze  Regression Linear
Dialog box
 Dependent  Outcome variable
 Independent(s)  Predictor variables
 Method: Stepwise
 Options
~