Correlation Until now, we have been concerned with describing the central tendency and/or variability of 1 variable at a time. Correlation provides a description of the association(s) or relationship(s) between 2 or more variables. Correlation procedures describe how changes in 1 variable are associated with changes in a 2nd, 3rd, or more variable(s). The statistic is called the correlation coefficient and it reduces what can be complex relationships between variables to 1 number which describes both the strength (strong,weak, or none) and direction (positive or negative) of the association between the variables. Correlation coefficients have the following properties (which will be depicted in a follow-up slide). 1) Correlation coefficients may have positive values- indicating that the 2 or more variables change in the same direction- negative values- indicating that the variables change in opposite directions- or zero- indicating no association between the variables. 2) Correlation coefficients may range from -1.00- indicating perfect negative correlation (as 1 variable changes by a given amount in one direction, the other variable(s) change by a proportional amount in the opposite direction)- through zeroindicating that there is no discernible relationship between the variables- to 1.00indicating perfect positive correlation (as 1 variable changes by a given amount in one direction, the other variable(s) change by a proportional amount in the same direction. The sample correlation coefficient is symbolized by the letter r. Hence, we can state the limits of the correlation coefficient as: -1.00 < r < 1.00 The following graphic depicts how we interpret the correlation coefficient. NEGATIVE CORRELATION NO POSITIVE CORRELATION <<<Stronger CORRELATION Stronger > > > -1.00___________-0.50_____________0_____________0.50_______________1.00 Weaker>>> <<<Weaker As this graphic suggests, the further away from zero the correlation coefficient is in either direction, the stronger the correlation between the variables. Conversely, the closer the correlation coefficient is to zero, the weaker the correlation between the variables. Here are some arbitrary guidelines (emphasis on arbitrary) for interpreting the correlation coefficient. If r = +.70 or higher Very strong positive relationship +.40 to +.69 Strong positive relationship +.30 to +.39 Moderate positive relationship +.20 to +.29 weak positive relationship +.01 to +.19 No or negligible relationship -.01 to -.19 No or negligible relationship -.20 to -.29 weak negative relationship -.30 to -.39 Moderate negative relationship -.40 to -.69 Strong negative relationship -.70 or lower Very strong negative relationship An example of perfect positive correlation. Scores on tests of mathematical (Math) and verbal (Verb) skills of 10 high school students. Example A Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Example B Verb 20 22 24 26 28 30 32 34 36 38 r= 30 32 34 36 38 40 42 44 46 48 1.00 Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 38 36 34 32 30 28 26 24 22 20 r= 48 46 44 42 40 38 36 34 32 30 1.00 In Example A, note that as each student’s Math skills score increased by 2 points, his/her Verbal skills score also increased by 2 points. In Example B, note that as each students Math skills score decreased by 2 points, his/her Verbal skills score also decreased by 2 points. Before commenting further, let us consider another example. Example C Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Example D Verb 20 22 24 26 28 30 32 34 36 38 r= 30 35 40 45 50 55 60 65 70 75 1.00 Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 38 36 34 32 30 28 26 24 22 20 r= 75 70 65 60 55 50 45 40 35 30 1.00 In Example C here, as each student’s Math skills score increased by 2 points, his/her Verbal skills score increased by 5 points. In Example D here, as each student’s Math skills score decreased by 2 points, his/her Verbal skills score decreased by 5 points. In all 4 of these examples, both variables (Math and Verbal skills scores) changed in the same direction (from lowest to highest in Examples A and C; from highest to lowest in Examples B and D) and in proportionally the same amounts. Now let’s look at a more realistic example of positive correlation between variables. Here are more realistic scores on the Math and Verbal skills score tests from the preceding examples. Example E Example E’ Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 33 36 21 19 32 25 20 37 24 29 r= 42 39 32 29 40 30 35 36 31 30 0.69 Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Increase Decrease Decrease Increase Decrease Decrease Increase Decrease Increase Verb 33 36 21 19 32 25 20 37 24 29 r= 42 39 Decrease 32 Decrease 29 Decrease 40 Increase 30 Decrease 35 Increase 36 Increase 31 Decrease 30 Decrease 0.69 In Example E, we see a somewhat typical set of Math and Verbal skills scores for the same students as in the previous examples. Note that not all the scores change in the same direction nor by the same proportional amount. Example E’ is a more analytical look at these sets of scores and how they change from student to student. In particular: a) In 3 cases, as a student’s Math score changed in 1 direction compared to the preceding student, his/her Verbal score changed in the opposite direction; b) Even in cases where both scores changed in the same direction, they did not change in proportionally equal amounts. While the overall correlation is positive, each of these facts (a and b above) weakens the correlation coefficient from perfect positive to strong (see previous slide). An example of perfect negative correlation. Scores on tests of mathematical (Math) and verbal (Verb) skills of 10 high school students. Example F Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Example G Verb 20 22 24 26 28 30 32 34 36 38 r= 48 46 44 42 40 38 36 34 32 30 -1.00 Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 38 36 34 32 30 28 26 24 22 20 r= 30 32 34 36 38 40 42 44 46 48 -1.00 Example H Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Example I Verb 20 22 24 26 28 30 32 34 36 38 r= 75 70 65 60 55 50 45 40 35 30 -1.00 Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 38 36 34 32 30 28 26 24 22 20 r= 30 35 40 45 50 55 60 65 70 75 -1.00 In all of these examples, the Math and Verbal scores change in opposite directionshence the negative sign before each correlation coefficient. In Example F, as each Math score increases by 2 points, the corresponding Verbal score decreases by 2 points. In Example G, as each Math score decreases by 2 points, the corresponding Math score increases by 2 points. In Example H, as each Math score increases by 2 points, the corresponding Verbal score decreases by 5 points. In Example I, as each Math score decreases by 2 points, the corresponding Verbal score increases by 5 points. In other words, as one set of scores increases by a certain amount, the corresponding set of scores decreases by a consistently proportional amount. It is the consistency of these proportional changes that makes the correlation perfect. Now let’s look at a more realistic example. Example J Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Increase Decrease Decrease Increase Decrease Decrease Increase Decrease Increase Verb 33 36 21 19 32 25 20 37 24 29 r= 30 31 Increase 36 Increase 35 Decrease 30 Decrease 40 Increase 29 Decrease 32 Increase 39 Increase 42 Increase -0.32 Here we see that : a) in 4 cases the Math and Verbal scores change in opposite directions; and b) the changes in Math and Verbal scores don’t increase or decrease by the same or consistent amounts. As a result of these observations, the resulting correlation coefficient is negative and, according to the (informal) guidelines suggested in a previous slide, it is moderately strong. Both this example and Example E from a previous slide represent outcomes you are most likely to find in your own research. Now that we have an understanding of the concept of correlation (in particular, the Pearson product-moment correlation coefficient), we might ask if there’s anything else we can do with this statistic. Let us consider this question in the next few slides. Let’s start with Example A from a previous slide. Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb We can see that, for this group, as their Math scores change, so do their Verbal scores. The changeability of these sets of scores is referred to as variation. We find the correlation coefficient to determine if there is some relationship or connection between the variation in 1 set of scores and the variation in the other set of scores. The correlation coefficient summarizes the strength and direction of the relationship between these variations. However , there is another question for which the correlation coefficient does not directly provide an answer. That question might be phrased this way: how much of the variation in 1 set or scores is related or connected to the variation in the other set of scores? In the present case, we might ask how much of the variation in these students’ Verbal scores is related to the variation in their Math scores. We could answer “all of it”, “a lot of it”, “some of it”, “a little of it”, or “none of it”. These verbal answers may (or may not) be correct, but they are not precise enough for a statistical analysis. To begin to answer our question, let us phrase it this way: What percentage of the variation in 1 set of scores is related or connected to the variation in the other set of scores? The term “percentage” calls for a numerical answer and the correlation coefficient provides a basis for our answer. 20 22 24 26 28 30 32 34 36 38 r= 30 32 34 36 38 40 42 44 46 48 1.00 Here is another presentation of Example C from a previous slide. Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Up 2 Up 2 Up 2 Up 2 Up 2 Up 2 Up 2 Up 2 Up 2 20 22 24 26 28 30 32 34 36 38 r= Verb V <--> A <--> R <--> I <--> A <--> T <--> I <--> O <--> N <--> V A R I A T I O N 30 35 Up 5 40 Up 5 45 Up 5 50 Up 5 55 Up 5 60 Up 5 65 Up 5 70 Up 5 75 Up 5 1.00 This presentation shows that every time a Verbal score increases by a certain amount, the corresponding Math score also increases by a consistent amount. This is another way of saying that 100 percent of the variation in Verbal scores is related or connected to variation in Math scores. [Pause here briefly and convince yourselves that this statement is correct.] How did we arrive at this 100 percent figure? Actually, it was relatively simple. We took the correlation coefficient (r = 1.00), squared it (r2 = 1.00), and multiplied this number times 100. (You don’t have to worry about why we squared r, just know that it is a necessary step.) R-squared (r2 ) is called the COEFFICIENT OF DETERMINATION and, when we multiply it by 100, it gives the percentage of variation in one set of quantitative scores that is associated with or related to the variation in another set of quantitative scores. A more realistic application of this statistic follows on the next slide. Here is Example E from a previous slide. Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 33 36 21 19 32 25 20 37 24 29 r= 42 39 32 29 40 30 35 36 31 30 0.69 In this more realistic example of positive correlation between these sets of scores, we quickly find the coefficient of determination. Coefficient of determination = r2 * 100 = 0.692 *100 = .4761 *100 = 47.61% In this illustration, 47.61% of the variation in these students’ Verbal scores is related to the variation in their Math scores. It is important to point out that we can also conclude that 47.61% of the variation in the students’ Math scores is related to the variation in their Verbal scores. While the 2 sets of scores may be related, we are not justified in concluding that Math ability is causally related to Verbal ability or that Verbal ability is causally related to Math ability. We may have a common sense belief that because people are good at math, they are also good verbally. (And, conversely, if they are bad in 1, they are bad in the other.) But, we all know people who may be good at one, but not the other. And, in any event, we must caution you that correlation is not the same as causation. To show a final property of the coefficient of determination, we need to re-examine Example J from a previous slide. Student Alice Bobby Charlie Dave Eddie Fran Gary Hillary Inez Jim Math Verb 33 36 21 19 32 25 20 37 24 29 r= 30 31 36 35 30 40 29 32 39 42 -0.32 Here the correlation coefficient is -0.32 - indicating a negative relationship between these scores. Is the coefficient of determination applicable here? Yes, and the procedure for finding it is the same as in the previous example. Coefficient of determination = r2 *100 = -0.322 * 100 = .0996 * 100 = 9.96% (Remember that when we square a (real) number, whether the number is positive or negative, the result is always a positive number.) We can conclude that even though the association between the variation in Math and Verbal scores is negative, 9.96% (approx. 10%) of the variation in Math scores is associated with variation in Verbal scores.
© Copyright 2026 Paperzz