Psychology 282 Lecture #3 Outline Simple Linear Regression (SLR

Psychology 282
Lecture #3 Outline
Simple Linear Regression (SLR)
Given variables X, Y.
Sample of n observations.
In study and use of correlation coefficients, X and Y
are interchangeable.
In regression analysis, one variable is defined as
dependent (Y) and the other as independent (X).
SLR represents Y as linear function of X.
(If we have more than one IV, then method is called
multiple linear regression, or MLR.)
Relationship may be causal, but not necessarily.
Use regression for various purposes:
• Explanation
• Prediction
Will study SLR in terms of both geometric and
algebraic representations.
2
Geomtric representation of SLR
Recall scatterplot:
130
120
110
100
90
Y
80
70
-2
0
2
4
6
8
X
Consider representation or approximation of
relationship between X and Y using straight line.
130
120
110
100
90
Y
80
70
-2
0
2
4
6
8
X
Using this line, for any selected individual, can
obtained predicted Y, called Yˆ .
3
Can then define residual or error in prediction, e, as
vertical deviation of point from line.
Can define residuals for entire sample.
Note that different lines produce different residuals.
Consider objective of choosing the best line: Find
line that produces smallest residuals for sample.
Algebraic representation of SLR
Equation for a straight line:
Yˆ = B0 + BYX X
where Yˆ is predicted value of Y.
B0 is intercept: value of Y where line crosses Y-axis,
or predicted value of Y when X=0.
BYX is regression coefficient, or slope of line.
Slope is defined as the ratio of vertical change to
horizontal change, or ∆Y ∆X .
This value represents the change in predicted Y
corresponding to a 1-unit increase in X.
For any individual, given X and Y and an equation of
the form Yˆ = B0 + BYX X , we can then obtain predicted
value Yˆ and residual e = Y − Yˆ .
4
Can obtain residuals for each of n individuals in
sample.
Problem: Would like to find equation Yˆ = B0 + BYX X
that makes these residuals small, thus providing best
approximation of observed Y values.
Need to define aggregate measure of size of residuals
in sample.
Cannot just sum them to obtain ∑ e , because
positive and negative residuals will cancel out.
2
e
Consider squaring residuals, then summing: ∑
This value will become smaller as selected equation,
and corresponding line, approximates relationship
better.
Objective: Given scores for n individuals on X and Y,
2
e
find B0 and BXY such that ∑ is as small as
possible, where e = Y − Yˆ and Yˆ = B0 + BYX X .
2
e
This is the principle of least squares, and ∑ is the
sum of squared residuals, to be minimized.
5
Minimizing the sum of squared residuals:
2
θ
=
e
∑ .
Define
= ∑ (Y − Yˆ ) 2
= ∑ (Y − ( B0 + BYX X )) 2
= ∑ (Y − B0 − BYX X ) 2
Problem: Find B0 and BYX that minimize this quantity.
Solution obtained by basic calculus:
Obtain partial derivatives of θ with respect to B0 and
BYX.
Set these two partial derivatives equal to zero.
Solve this system of two equations for B0 and BYX.
Result:
BYX =
∑ XY − nXY
∑ X − nX
2
2
and B0 = Y − BYX X
These equations provide linear regression coefficient
and intercept for SLR that minimize sum of squared
residuals. Any other intercept and slope would
2
produce higher ∑ e .
See applet:
http://hadm.sph.sc.edu/COURSES/J716/demos/LeastSquares/LeastSquaresDemo.html
6
•
Relationship between regression coefficient and
correlation coefficient:
 sd 
BYX = r  Y 
 sd X 
Given r, can compute BYX easily.
•
Expressing regression equation in terms of Y:
Given Yˆ = B0 + BYX X
Residuals: e = Y − Yˆ
Then:
Y = Yˆ + e
Implies: Y = B0 + BYX X + e
•
Relationships among X, Y, Yˆ , and e:
rXY observed from data.
rXYˆ = 1
rXe = 0
rYYˆ = rXY
7
SLR for standardized variables
Development of SLR above was for raw scores on X
and Y. Suppose we wished to do SLR after
standardizing IV and DV:
zX =
(X − X )
sd X
zY =
(Y − Y )
sdY
Means are zero: z X = 0 and zY = 0
∑z
∑z
Variances are 1.0: (n − 1) = 1 and (n − 1) = 1
2
X
2
Y
Regression equation:
zˆY = B0* + B z*Y z X z X
Residuals:
e zY = zY − zˆY
Wish to find SLR equation that will make residuals
as small as possible. Apply principle of least squares.
*
*
B
B
Problem: Find 0 and zY z X that will minimize
∑e
2
zY
8
Solution: Use solution for raw score regression
coefficient and intercept, converting to standardized
variables.
Results: Standardized regression coefficient:
z X zY − nz X zY
∑
*
Bz z =
2
2
z
−
n
z
∑ X X
Y X
z
∑
=
X Y
z
∑
=
X Y
z − n(0)(0)
(n − 1) − n(0)
z
= rXY
(n − 1)
Standardized intercept:
B0* = zY − B z*Y z X z X
= 0 − Bz*Y z X (0) = 0
In SLR for standardized variables, the intercept will
be zero, meaning the regression line must pass
through the origin.
The slope, or standardized regression coefficient, will
be equal to the Pearson correlation coefficient.
9
SLR for standard scores can be represented as:
zˆY = rXY z X
z Y = rXY z X + e ZY
Thus, the Pearson correlation coefficient has two
distinct interpretations and uses:
• Measure of linear relationship.
• Standardized regression coefficient in SLR.
SLR using raw vs. standard scores:
Choice based on context, purpose.
Desire to predict raw score, vs. desire to predict
relative standing with respect to the mean.
Relationship:
BYX = rXY
sd Y
sd X
rXY = BYX
sd X
sdY
10
Regression toward the mean
Standard scores represent deviation from the mean, in
sd units.
SLR for standard scores:
zˆY = rXY z X
Bounds on value of rXY:
rXY ≤ 1.0
Implies that:
zˆY ≤ z X
Predicted score on Y must be relatively closer to
mean than was observed score on X.
This is regression to the mean. A statistical
phenomenon associated with regression and least
squares.
Substantive interpretation of this phenomenon is not
justified.
11
Measures of strength of association
In SLR we are interested in strength of association
between IV and DV.
2
Can be measured by rXY and by ∑ e .
Consider other possible measures.
Use notion of partitioning variance in Y.
Variance in Y is partially accounted for by X, with the
remainder unaccounted for. Determine these portions.
Consider first in standardized SLR:
z Y = rXY z X + e ZY
zY = zˆY + eZ Y
Var ( zY ) = Var ( zˆY ) + Var (eZ Y )
Var ( zY ) = Var ( rXY z X ) + Var (eZ Y )
2
1 = rXY
Var ( z X ) + Var (eZ Y )
2
1 = rXY
+ Var (eZ Y )
12
This expression shows that the variance in zY, which
is 1.0, can be partitioned into two parts:
Variance accounted for by X, which is equal to r2.
Variance not accounted for by X, which is equal to
(1-r2).
Thus, r2 indicates proportion of variance in DV
accounted for by its linear relationship with IV.
This is an important device for interpreting strength
of relationship implied by r.
We can define similar partitioning of variance in Y
using raw variance. Divide that variance into two
portions:
Variance in Y accounted for by its linear relationship
2
2
2
sd
=
r
sd
Y
with X:
Yˆ
Variance in Y not accounted for by its linear
relationship with X:
sd (2Y −Yˆ ) = (1 − r 2 ) sdY2
13
In regression analysis we are often interested in the
standard deviation of the residuals. From this last
term we can define standard deviation of raw
residuals:
sd (Y −Yˆ ) = (1 − r 2 ) sd Y2
This is a sample statistic. We often wish to estimate
the standard deviation of residuals that would be
obtained in the population. This value is called the
standard error of estimate (SE).
The sample value of sd (Y −Yˆ ) tends to overestimate
this value; i.e., it is a biased estimate of the true
standard error of estimate.
An unbiased estimate of SE can be obtained by:
SE =
∑ (Y − Yˆ )
2
(n − 2)
This value is provided by regression software and is
used to construct confidence intervals for predicted
scores and for other purposes.