How to calculate correlation Now you have an idea what correlation r is: it’s a number between -1 and 1 that measures the strength of the linear relationship between two variables. But how is it computed? The procedure is based on deviations. If the correlation is close to 1, then a high deviations of x should correspond with a high deviations of y. The following example will show the steps in calculating correlation. Example: Is there a relationship between blood alcohol level and response time? To find out, researchers used a breathalyzer to measure individual’s blood alcohol level, then gave them a test to measure how long it took to push a button after hearing a beep. Let x = blood alcohol level and let y = response time. Compute the correlation. Below is the data: Subject Blood Alcohol Level (x) Response Time A .03 0.9 B .06 1.3 C .08 1.9 D .09 2.2 (y) Here is a scatterplot of the data: From the scatterplot, it looks like a strong positive linear relationship, so we suspect the correlation will be close to 1. Here is how to compute it: Step 1: find the mean and standard deviation of x and y. This step was done for you. Usually, use a calculator for this part to save time. x 0.065 y 1.575 sx 0.0265 rounded off to 4 digits s y 0.5852 rounded off to 4 digits Step 2: make a table with five columns. The headings, and the calculations for the first row are done for you. Complete the rest of the table yourself: The first column is for the values of x The second column is for the values of y ̅) The third column is for the deviations of x (in other words, 𝒙 − 𝒙 ̅) The fourth column is for the deviations of y (in other words, 𝒚 − 𝒚 The fifth column is for multiplying the deviations Deviations of y go in this column deviations of x go in this column Blood Alcohol Level (x) Response Time .03 0.9 .06 1.3 .08 1.9 .09 2.2 ̅ 𝒙−𝒙 Multiply the deviations together, and put the results in this column ̅ 𝒚−𝒚 ̅)(𝒚 − 𝒚 ̅) (𝒙 − 𝒙 -0.675 0.0236 (rounded off to 4 digits) (y) -0.035 ̅)(𝒚 − 𝒚 ̅) Step 3: Find the total of all the numbers in the last column. The formula for this step is ∑(𝒙 − 𝒙 Step 4: divide by sx (the standard deviation of x), then divide by sy (the standard deviation of y) and then divide by n – 1 (in this case n = the number of individuals, which is 4. So n – 1 = 3.) The answer should be about r = 0.98 (rounded off to 2 digits) The four steps described above can be summarized by this formula: (x x )( y y ) ( sx )( s y )(n 1) Why does this procedure work? Let’s look again at the original data and the scatterplot Subject Blood Alcohol Level (x) Response Time A .03 0.9 B .06 1.3 C .08 1.9 D .09 2.2 (y) Notice subject D has a blood-alcohol level that is quite high compared to the others. In statistical terms, you would say subject D has a high deviation. Notice his or her response time is also high. In other words, the response time for subject D has a high deviation as well. This brings us to an important concept about correlation: When r is close to 1, then (in terms of deviations) large values of x correspond to large values of y. When r is close to 0, then large values of x do not necessarily correspond with large values of y. Multiplying the deviations together gives us a way to combine the high deviations together. Then, when dividing by the standard deviations and by n – 1, the number will be close to 1. Question 1: Scan the following data. Without doing any calculations, should r be negative or positive? Why? sugar per serving (grams) Consumer Reports rating Bran_Flakes 5 53 Cap'n'Crunch 12 18 Cheerios 1 51 Cinnamon_Toast_Crunch 9 20 Clusters 7 40 name of cereal Question 2: use the 4 steps to calculate the correlation of this data. How are the deviations related in this example? Take It Home Name_______________________________ Question 1 Calculate the correlation of the following data. Follow the 4 steps as shown in the lesson x 23 42 51 60 96 y 2.3 6.5 9.8 11.5 10.1 Question 2 Calculate the correlation of the following data. Follow the 4 steps as shown in the lesson x 120 254 169 y 32 24 13 Question 3 Calculate the correlation of the following data. Follow the 4 steps as shown in the lesson x 3.2 5.9 9.8 10.5 12.9 y 65 62 41 36 35 Answers: 1) r = 0.748 2) r = -0.275 3) r = -0.963
© Copyright 2026 Paperzz