How to calculate correlation

How to calculate correlation
Now you have an idea what correlation r is: it’s a number between -1 and 1 that measures the strength of
the linear relationship between two variables. But how is it computed? The procedure is based on
deviations. If the correlation is close to 1, then a high deviations of x should correspond with a high
deviations of y. The following example will show the steps in calculating correlation.
Example: Is there a relationship between blood alcohol level and response time? To find out, researchers
used a breathalyzer to measure individual’s blood alcohol level, then gave them a test to measure how
long it took to push a button after hearing a beep. Let x = blood alcohol level and let y = response time.
Compute the correlation. Below is the data:
Subject
Blood Alcohol
Level (x)
Response Time
A
.03
0.9
B
.06
1.3
C
.08
1.9
D
.09
2.2
(y)
Here is a scatterplot of the data:
From the scatterplot, it looks like a strong positive linear relationship, so we suspect the correlation will
be close to 1.
Here is how to compute it:
Step 1: find the mean and standard deviation of x and y. This step was done for you. Usually, use a
calculator for this part to save time.
x  0.065
y  1.575
sx  0.0265  rounded off to 4 digits 
s y  0.5852  rounded off to 4 digits 
Step 2: make a table with five columns. The headings, and the calculations for the first row are done for
you. Complete the rest of the table yourself:

The first column is for the values of x



The second column is for the values of y
̅)
The third column is for the deviations of x (in other words, 𝒙 − 𝒙
̅)
The fourth column is for the deviations of y (in other words, 𝒚 − 𝒚

The fifth column is for multiplying the deviations
Deviations of y
go in this
column
deviations of
x go in this
column
Blood Alcohol
Level (x)
Response Time
.03
0.9
.06
1.3
.08
1.9
.09
2.2
̅
𝒙−𝒙
Multiply the
deviations together,
and put the results in
this column
̅
𝒚−𝒚
̅)(𝒚 − 𝒚
̅)
(𝒙 − 𝒙
-0.675
0.0236 (rounded off to
4 digits)
(y)
-0.035
̅)(𝒚 − 𝒚
̅)
Step 3: Find the total of all the numbers in the last column. The formula for this step is ∑(𝒙 − 𝒙
Step 4: divide by sx (the standard deviation of x), then divide by sy (the standard deviation of y) and then
divide by n – 1 (in this case n = the number of individuals, which is 4. So n – 1 = 3.)
The answer should be about r = 0.98 (rounded off to 2 digits)
The four steps described above can be summarized by this formula:
 (x  x )( y  y )
( sx )( s y )(n  1)
Why does this procedure work? Let’s look again at the original data and the scatterplot
Subject
Blood Alcohol
Level (x)
Response Time
A
.03
0.9
B
.06
1.3
C
.08
1.9
D
.09
2.2
(y)
Notice subject D has a blood-alcohol level that is quite high compared to the others. In statistical
terms, you would say subject D has a high deviation. Notice his or her response time is also high.
In other words, the response time for subject D has a high deviation as well. This brings us to an
important concept about correlation:
When r is close to 1, then (in terms of deviations) large values of x correspond to large values of
y. When r is close to 0, then large values of x do not necessarily correspond with large values of
y.
Multiplying the deviations together gives us a way to combine the high deviations together.
Then, when dividing by the standard deviations and by n – 1, the number will be close to 1.
Question 1: Scan the following data. Without doing any calculations, should r be negative or
positive? Why?
sugar per
serving
(grams)
Consumer
Reports
rating
Bran_Flakes
5
53
Cap'n'Crunch
12
18
Cheerios
1
51
Cinnamon_Toast_Crunch
9
20
Clusters
7
40
name of cereal
Question 2: use the 4 steps to calculate the correlation of this data. How are the deviations related
in this example?
Take It Home
Name_______________________________
Question 1 Calculate the correlation of the following data. Follow the 4 steps as shown in the
lesson
x
23
42
51
60
96
y
2.3
6.5
9.8
11.5
10.1
Question 2 Calculate the correlation of the following data. Follow the 4 steps as shown in the
lesson
x
120
254
169
y
32
24
13
Question 3 Calculate the correlation of the following data. Follow the 4 steps as shown in the
lesson
x
3.2
5.9
9.8
10.5
12.9
y
65
62
41
36
35
Answers: 1) r = 0.748
2) r = -0.275
3) r = -0.963