Dr. Thomas Sanocki, PSY 3204
"Regress&CorrKeys"
* Use tabs at bottom of this Excel page to change topics *
Outline
Relations between variables -- between X and Y
Regression ("Linear Regression")
Regression as a line
slope and intercept
error
Predicting with regression
Three pitfalls with regression
Correlation
Correlation is the strength of a relation
Pitfalls with correlation
Correlation and Error
(not on exam: )
Multiple Regression
Correlation matrix
Unique variance of combination
Regression and Correlation…
concern relations between two variables, X and Y.
Let's begin with a typical experiment (not Regression/Correlation):
Relation in an Experiment
Minutes helping in two conditions
Good
Control
mood
Minutes Helping
(DV)
20
15
Y
8
9
12
7
13
11
10
16
12
14
11
15
10
13
2.4
2.4
10
Means
5
SD
0
1
2
Mood Condition (IV)
X
^ Note:
< manipulated IV in an experiment
1. Only value of Means are shown
2. Only Two Levels of the X Variable used.
What if we showed each subject, instead of means?
"Scatterplot" w/ each subject
18
16
Y
Minutes Helping
14
12
10
8
6
4
2
0
0
1
Mood Value
X
2
Control
8
9
12
7
13
11
Good mood
1
1
1
1
1
1
10
16
12
14
11
15
2
2
2
2
2
2
NEW STUDY…
Now, what if X varied continuously?
>> This is possible with NEW data: Regression/correlation data
X - Mood
Y - Helping (minutes)
(Ratings - 1 to 7 [best])
Each subject >>
has 2 data points,
x and y >>
4
5
2
6
7
1
2
6
3
8
14
2
11
22
1
7
19
6
Now we can do a real scatterplot..
A Typical Scatterplot
25
Y
Minutes Helping
20
15
10
5
0
0
1
2
3
4
Rated Mood
X
5
6
7
8
Now, the big Regression & Correlation Question:
How can we describe the relation between X and Y?
(using a simple math concept?)
Let's Regress!
X - Mood
Y - Helping (minutes)
(Ratings - 1 to 7 [best])
4
5
2
6
7
1
2
6
3
8
14
2
11
22
1
7
19
6
Line formula: Y = mX + b ,
or Y = slope*X + intercept
m = slope = how much Y for each unit X
b = intercept = where line starts, when X = 0
Regression finds the best line to describe data, I.e., m and b values.
m = slope =
b = intercept =
3.14
-2.55
A Typical Scatterplot
Y
Minutes Helping
25
20
Y = mX + b
15
Y = 3.14X + (-2.55)
10
5
0
0
1
2
3
4
-5
Rated Mood
X
5
6
7
8
Using regression for prediction…
Y = mX + b
Y = 3.14X + (-2.55)
Just plug in X and calculate Y
e.g., X = 8?
Y = 3.14*8 + (-2.55)
Easier example to try:
Y = 3X + 1
(simple ones on exam)
Y = (slope = 3) X + (intercept = 1)
if x = 2?
X - Mood
Y - Helping (minutes)
(Ratings - 1 to 7 [best])
1
2
3
4
5
6
7
4
7
10
13
16
19
22
Regression line is the set
of predicted values
25
20
15
Y
10
5
0
0
5
X
10
Summary: Regression is based on best "line" description,
I.e., m = slope
How much Y for each unit X?
(e.g., how much helping for each increase in mood?)
b = intercept
Where does line start (x = 0)?
Error is also present, what is error?
(Error is how much points depart from line,
I.e., Y - predicted Y)
Error shown in red —
it is deviation from predicted Y
25
Minutes Helping
20
15
10
Note: The regression line is
the line (values of m and b) that
produces the least error for all points
5
0
0
1
2
3
4
-5
Rated Mood
5
6
7
8
Another example
Years in College
Lifetime Earnings (thousands of dollars)
0
2
4
8
1200
1450
1800
2200
What is the relation?
College and earnings
2500
2000
1500
$$
1000
Y
500
0
0
2
4
6
Years of College
X
8
10
b = intercept = 1220 (earnings with no college)
Most important:
m = slope = how earnings (Y) change per unit of X (year in college):
m = 126 = 126K more per year of college
For one year
of college (X)..
How much do
earnings (Y) change?
Three pitfalls to know about
1. Only linear relations are handled (full name is "Linear Regression")
This ain't no line, yet…
15
Y
10
5
0
0
5
X
You do get regression values:
m = slope = .45
b = intercept = 5.1
10
Not a great "fit"
12
10
8
Y
6
4
2
0
0
5
X
Linear regression should NOT be used here!
2. Even if there is a good relation, it does not imply causality
(e.g., in the mood example, maybe people were in a good mood because
they had less to do, and were able to spend more time helping;
in the earning example, maybe intelligence is the cause -more intelligent people take more college and earn more money)
The Third Variable Problem
A third variable could cause changes in both X and Y
10
Minutes Helping
3. Regression does not measure the strength of a relation
25
25
20
20
15
15
10
Y 10
5
5
0
0
0
1
2
3
4
5
6
7
8
0
1
2
3
4
-5
-5
Rated Mood
These two data sets have the same regression lines (dashed lines),
yet one relation is much stronger!
How do you measure the strength of a relation?
X
5
6
7
8
Correlation
Inuitive Formula
1. Get z-scores on each measure
Measures the strength of relation between X and Y
2. Calc. "cross-products" for each person
3. Take mean of c-p's to get r
Values (r) range -1 …… 0 …… +1
0 means no relation, close to 0 is weak,
closer to -1 or +1 is stronger
Minutes Helping
r = 0.92
r = 0.69
25
25
20
20
15
15
10
Y 10
5
5
0
0
-5
0
1
2
3
4
5
Rated Mood
6
7
8
-5
0
1
2
3
4
X
5
6
7
8
r = -0.83
Y
r = ??
10
10
8
8
6
Y
4
6
4
2
2
0
0
0
5
10
0
5
X
X
r = 0.30
10
Three problems to know, as in regression
1. Only linear relations are handled (sometimes called "Linear Correlation")
This ain't no line, yet..
12
10
Y
8
0.45
6
0.372517161
4
2
0
0
5
10
X
You can get correlation values:
r = 0.37
This ain't no line, yet..
Y
20
0
0
1
2
3
4
5
X
6
7
8
9
10
2. Even if there is a good relation, it does not imply causality
The Third Variable Problem
A third variable could cause changes in both X and Y
3. Correlation is not equal to regression, they are complementary
Correlation desribes how strong a relation is,
Regression describes how Y changes per unit of change in X
Correlation and Error
Recall that in ANOVA, there is "variance" for the IV and for error,
and that IV + Error = Total. Is there something like this for Regression/Correlation?
Yes!
The correlation value, r, can be converted to R2
Proportion of variance accounted for: R2 = r * r
This the part of the total variance explained by variable X.
And, Error = 1 - R2.
Error is the proportion due to nuisance variables -- all the other possible variables that
could have been used in the correlation.
END
© Copyright 2026 Paperzz