Correlation Until now, we have been concerned with describing the

Correlation
Until now, we have been concerned with describing the central tendency and/or
variability of 1 variable at a time. Correlation provides a description of the association(s) or
relationship(s) between 2 or more variables. Correlation procedures describe how changes in
1 variable are associated with changes in a 2nd, 3rd, or more variable(s). The statistic is called
the correlation coefficient and it reduces what can be complex relationships between variables
to 1 number which describes both the strength (strong,weak, or none) and direction (positive
or negative) of the association between the variables. Correlation coefficients have the
following properties (which will be depicted in a follow-up slide).
1) Correlation coefficients may have positive values- indicating that the 2 or more
variables change in the same direction- negative values- indicating that the variables
change in opposite directions- or zero- indicating no association between the variables.
2) Correlation coefficients may range from -1.00- indicating perfect negative
correlation (as 1 variable changes by a given amount in one direction, the other
variable(s) change by a proportional amount in the opposite direction)- through zeroindicating that there is no discernible relationship between the variables- to 1.00indicating perfect positive correlation (as 1 variable changes by a given amount in one
direction, the other variable(s) change by a proportional amount in the same direction.
The sample correlation coefficient is symbolized by the letter r. Hence, we can state the
limits of the correlation coefficient as:
-1.00 < r < 1.00
The following graphic depicts how we interpret the correlation coefficient.
NEGATIVE CORRELATION
NO
POSITIVE CORRELATION
<<<Stronger
CORRELATION
Stronger > > >
-1.00___________-0.50_____________0_____________0.50_______________1.00
Weaker>>> <<<Weaker
As this graphic suggests, the further away from zero the correlation coefficient is in either
direction, the stronger the correlation between the variables. Conversely, the closer the
correlation coefficient is to zero, the weaker the correlation between the variables.
Here are some arbitrary guidelines (emphasis on arbitrary) for interpreting the
correlation coefficient.
If r = +.70 or higher Very strong positive
relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or lower Very strong negative
relationship
An example of perfect positive correlation.
Scores on tests of mathematical (Math) and verbal (Verb) skills of 10 high school students.
Example A
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Example B
Verb
20
22
24
26
28
30
32
34
36
38
r=
30
32
34
36
38
40
42
44
46
48
1.00
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
38
36
34
32
30
28
26
24
22
20
r=
48
46
44
42
40
38
36
34
32
30
1.00
In Example A, note that as each student’s Math skills score increased by 2 points,
his/her Verbal skills score also increased by 2 points.
In Example B, note that as each students Math skills score decreased by 2
points, his/her Verbal skills score also decreased by 2 points.
Before commenting further, let us consider another example.
Example C
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Example D
Verb
20
22
24
26
28
30
32
34
36
38
r=
30
35
40
45
50
55
60
65
70
75
1.00
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
38
36
34
32
30
28
26
24
22
20
r=
75
70
65
60
55
50
45
40
35
30
1.00
In Example C here, as each student’s Math skills score increased by 2 points,
his/her Verbal skills score increased by 5 points.
In Example D here, as each student’s Math skills score decreased by 2 points,
his/her Verbal skills score decreased by 5 points.
In all 4 of these examples, both variables (Math and Verbal skills scores) changed in
the same direction (from lowest to highest in Examples A and C; from highest to
lowest in Examples B and D) and in proportionally the same amounts.
Now let’s look at a more realistic example of positive correlation between variables.
Here are more realistic scores on the Math and Verbal skills score tests from the
preceding examples.
Example E
Example E’
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
33
36
21
19
32
25
20
37
24
29
r=
42
39
32
29
40
30
35
36
31
30
0.69
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Increase
Decrease
Decrease
Increase
Decrease
Decrease
Increase
Decrease
Increase
Verb
33
36
21
19
32
25
20
37
24
29
r=
42
39 Decrease
32 Decrease
29 Decrease
40 Increase
30 Decrease
35 Increase
36 Increase
31 Decrease
30 Decrease
0.69
In Example E, we see a somewhat typical set of Math and Verbal skills scores for the
same students as in the previous examples. Note that not all the scores change in the
same direction nor by the same proportional amount.
Example E’ is a more analytical look at these sets of scores and how they change from
student to student. In particular:
a) In 3 cases, as a student’s Math score changed in 1 direction compared to the
preceding student, his/her Verbal score changed in the opposite direction;
b) Even in cases where both scores changed in the same direction, they did not change
in proportionally equal amounts.
While the overall correlation is positive, each of these facts (a and b above) weakens
the correlation coefficient from perfect positive to strong (see previous slide).
An example of perfect negative correlation.
Scores on tests of mathematical (Math) and verbal (Verb) skills of 10 high school students.
Example F
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Example G
Verb
20
22
24
26
28
30
32
34
36
38
r=
48
46
44
42
40
38
36
34
32
30
-1.00
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
38
36
34
32
30
28
26
24
22
20
r=
30
32
34
36
38
40
42
44
46
48
-1.00
Example H
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Example I
Verb
20
22
24
26
28
30
32
34
36
38
r=
75
70
65
60
55
50
45
40
35
30
-1.00
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
38
36
34
32
30
28
26
24
22
20
r=
30
35
40
45
50
55
60
65
70
75
-1.00
In all of these examples, the Math and Verbal scores change in opposite directionshence the negative sign before each correlation coefficient. In Example F, as each
Math score increases by 2 points, the corresponding Verbal score decreases by 2
points. In Example G, as each Math score decreases by 2 points, the corresponding
Math score increases by 2 points. In Example H, as each Math score increases by
2 points, the corresponding Verbal score decreases by 5 points. In Example I, as
each Math score decreases by 2 points, the corresponding Verbal score increases by
5 points. In other words, as one set of scores increases by a certain amount, the
corresponding set of scores decreases by a consistently proportional amount. It is
the consistency of these proportional changes that makes the correlation perfect.
Now let’s look at a more realistic example.
Example J
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Increase
Decrease
Decrease
Increase
Decrease
Decrease
Increase
Decrease
Increase
Verb
33
36
21
19
32
25
20
37
24
29
r=
30
31 Increase
36 Increase
35 Decrease
30 Decrease
40 Increase
29 Decrease
32 Increase
39 Increase
42 Increase
-0.32
Here we see that :
a) in 4 cases the Math and Verbal scores change in opposite directions; and
b) the changes in Math and Verbal scores don’t increase or decrease by the same
or consistent amounts.
As a result of these observations, the resulting correlation coefficient is negative and,
according to the (informal) guidelines suggested in a previous slide, it is moderately
strong.
Both this example and Example E from a previous slide represent outcomes you are
most likely to find in your own research.
Now that we have an understanding of the concept of correlation (in particular, the
Pearson product-moment correlation coefficient), we might ask if there’s anything
else we can do with this statistic.
Let us consider this question in the next few slides.
Let’s start with Example A from a previous slide.
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
We can see that, for this group, as their Math scores change, so
do their Verbal scores. The changeability of these sets of scores
is referred to as variation. We find the correlation coefficient to
determine if there is some relationship or connection between
the variation in 1 set of scores and the variation in the other
set of scores. The correlation coefficient summarizes the
strength and direction of the relationship between these
variations. However , there is another question for which the
correlation coefficient does not directly provide an answer. That
question might be phrased this way: how much of the variation in 1 set or scores is
related or connected to the variation in the other set of scores? In the present case,
we might ask how much of the variation in these students’ Verbal scores is related to
the variation in their Math scores. We could answer “all of it”, “a lot of it”, “some of it”,
“a little of it”, or “none of it”. These verbal answers may (or may not) be correct, but
they are not precise enough for a statistical analysis. To begin to answer our question,
let us phrase it this way: What percentage of the variation in 1 set of scores is related
or connected to the variation in the other set of scores? The term “percentage” calls
for a numerical answer and the correlation coefficient provides a basis for our
answer.
20
22
24
26
28
30
32
34
36
38
r=
30
32
34
36
38
40
42
44
46
48
1.00
Here is another presentation of Example C from a previous slide.
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Up 2
Up 2
Up 2
Up 2
Up 2
Up 2
Up 2
Up 2
Up 2
20
22
24
26
28
30
32
34
36
38
r=
Verb
V <-->
A <-->
R <-->
I <-->
A <-->
T <-->
I <-->
O <-->
N <-->
V
A
R
I
A
T
I
O
N
30
35 Up 5
40 Up 5
45 Up 5
50 Up 5
55 Up 5
60 Up 5
65 Up 5
70 Up 5
75 Up 5
1.00
This presentation shows that every time a Verbal score increases by a certain amount,
the corresponding Math score also increases by a consistent amount. This is another
way of saying that 100 percent of the variation in Verbal scores is related or connected
to variation in Math scores. [Pause here briefly and convince yourselves that this
statement is correct.]
How did we arrive at this 100 percent figure? Actually, it was relatively simple. We
took the correlation coefficient (r = 1.00), squared it (r2 = 1.00), and multiplied this
number times 100. (You don’t have to worry about why we squared r, just know that
it is a necessary step.)
R-squared (r2 ) is called the COEFFICIENT OF DETERMINATION and, when we
multiply it by 100, it gives the percentage of variation in one set of quantitative scores
that is associated with or related to the variation in another set of quantitative scores.
A more realistic application of this statistic follows on the next slide.
Here is Example E from a previous slide.
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
33
36
21
19
32
25
20
37
24
29
r=
42
39
32
29
40
30
35
36
31
30
0.69
In this more realistic example of positive correlation between these sets of scores, we
quickly find the coefficient of determination.
Coefficient of determination = r2 * 100 = 0.692 *100 = .4761 *100 = 47.61%
In this illustration, 47.61% of the variation in these students’ Verbal scores is related to
the variation in their Math scores.
It is important to point out that we can also conclude that 47.61% of the variation in the
students’ Math scores is related to the variation in their Verbal scores. While the 2
sets of scores may be related, we are not justified in concluding that Math ability is
causally related to Verbal ability or that Verbal ability is causally related to Math
ability. We may have a common sense belief that because people are good at math,
they are also good verbally. (And, conversely, if they are bad in 1, they are bad in the
other.) But, we all know people who may be good at one, but not the other. And, in
any event, we must caution you that correlation is not the same as causation.
To show a final property of the coefficient of determination, we need to re-examine
Example J from a previous slide.
Student
Alice
Bobby
Charlie
Dave
Eddie
Fran
Gary
Hillary
Inez
Jim
Math
Verb
33
36
21
19
32
25
20
37
24
29
r=
30
31
36
35
30
40
29
32
39
42
-0.32
Here the correlation coefficient is -0.32 - indicating a negative relationship between
these scores. Is the coefficient of determination applicable here? Yes, and the
procedure for finding it is the same as in the previous example.
Coefficient of determination = r2 *100 = -0.322 * 100 = .0996 * 100 = 9.96%
(Remember that when we square a (real) number, whether the number is positive or
negative, the result is always a positive number.)
We can conclude that even though the association between the variation in Math and
Verbal scores is negative, 9.96% (approx. 10%) of the variation in Math scores is
associated with variation in Verbal scores.