A Correlation

S1 – Correlation
Chapter 5
A: Measuring Correlation
Measure the strength of any connection between two variables
What do the graphs tell you about
Data
that
involves
any connection between the two
variables?
Graphs
2 variables
= Bivariate
Snails:
Coins:
data
Snails with larger foot areas weigh more.
Generally older coins weigh less, although the
relationship is weak.
Reactions:
With a higher heart rate they generally
have a shorter reaction time.
Blood pressure:
Little relationship between
blood pressure and weight.
Would you say that this data
has a poor, good, excellent or
perfect positive correlation?
y
x
We need a measure of how well correlated data sets are.
If we look at the diagram again
we can divide it into 4 quadrants.
y
2
1
3
4
y
x
If we treat the quadrant lines as our new axes, the
coordinates of the points now are all of the form:
x
( x  x, y  y)
xx
Now, let us consider
in each quadrant
y
2
Quad ( x  x ) ( y  y ) **
( x  x )( y  y)
1
y
3
xx
x
+ for every point
- for every point
 x  x  y  y 
y y
4
1
+
+
2
-
+
3
+
-
4
+
+
-
x
no have
or very
weak correlation:
SoFor
if we
a
positive
correlation:
For negative correlation:
• The points are scattered randomly in all four
•
The majority of the points lie in the first and third
• The majority of the points
lie in the second and Fourth
quadrants
quadrants.
• The values of ** will be
quadrants.
both positive and negative and
• The sum of these terms will be positive and large.
• The sum
Will
of mostly
these terms
cancel
will
each
be negative
other out.and large
The sum of the values will be very small.
S xy   x  x  y  y 
It appears that the sum could be used to
measure how strong a correlation is.
 The actual sum itself is not very useful since
it does not take into account:

The number of data items
 The units of x and y and hence the spread of
the data.

• But a good starting point!
S xy   x  x  y  y 
The sign of ‘the sum’ tells us what type of correlation there
is between x and y.
S xy  0
S xy  0
+
-
S xy  0
no
Pearson’s product moment
correlation coefficient (PMCC)

Takes
Number of data items and
 Spread of data
Into account by standardising the values
( x  x ) and ( y  y)

This is done in a similar way as standardising
the normal distribution (getting standard
deviation)
S xy
r
S xx   x 
2
S yy   y 
2
S xx S yy
( x )
2
S xy
n
( y )
n
2
x y

  xy 
n
r
S xy
S xx S yy
The value of r always lies in the interval:
1  r  1
If
r = 1 THEN perfect positive correlation
r = -1 THEN perfect negative correlation
r = 0 THEN two variables are uncorrelated
Example 1
Calculate the Pearson's product moment
correlation coefficient.
x
85
50
75
56
65
y
70
66
68
40
50
 x = 331  y = 294
x
2
= 22711
2
y
 = 17980
 xy
= 19840
Work through Example 1
p73
Exercise A page 73
Number 1 and 2