R Yi - Cengage

Chapter 5
Correlation
I Introduction to Correlation and
Regression
A. Describing the Linear Relationship
Between Two Variables, X and Y
1. Pearson product-moment correlation coefficient (r)
1
2. Bivariate frequency distributions (scatterplots)
for various correlation coefficients (r)
Y
50
40
30
20
10
r= + 1
•
•
•
•
••
•
•
•
10 20 30 40 50
X
50
40
30
20
10
r = .80
• ••
• •
• •
• •
•• •
•
10 20 30 40 50
X
2
50
40
Y 30
20
10
r = .30
•
• •• •
• •
•
•
• •
•
•
•
Y
50
40
30
20
10
10 20 30 40 50
X
Y
50
40
30
20
10
r = –.20
•
••
•
•
• •
• • •
• • ••
•
•
•
10 20 30 40 50
X
r =0
• •
•• • •
• • •
• •
• •
• •
•
• ••
• •
• •
••
10 20 30 40 50
X
50
40
Y 30
20
10
• r = –1
•
••
••
•
•
•
10 20 30 40 50
X
3
3. Upper and lower limits for r: +1 to –1
B. Correlation and Regression Distinguished
1. Characteristics of regression situations
 One dependent variable, Y, and one or
more independent variables, X
 Levels of independent variables are
selected in advance
 The value of the dependent variable for a
given level of the independent variable is free
to vary
4
 The researcher is primarily interested in
predicting Y from a knowledge of X
2.
Characteristics of correlation situation
 Neither variable is considered the independent
variable
 The researcher is primarily interested in
assessing the strength of the relationship
between X and Y
 X and Y are both free to vary
5
II Correlation
A. Formula for Pearson Product-Moment
Correlation Coefficient
n
 ( X i  X )(Yi  Y )
r
S XY
S X SY
i1
n

n
n
 ( X i  X )  (Yi  Y )2
2
i1
i1
n
n
6
1. Understanding the formula for r; what the
numerator tells you
 Covariance
n
S XY 
 ( X i  X )(Yi  Y )
i1
n
 Information in the cross products
n
 ( X i  X )(Yi  Y )
i1
7
a.
Quadrant 2
Quadrant 1
Variable Y
( X i – X ) (Yi – Y ) < 0 ( X i – X) (Yi – Y ) > 0
Y
• •
•
• •
•
• •
•
• •
•
• •
•
•
•
•
Quadrant 3
Quadrant 4
( Xi – X ) (Yi – Y ) > 0 ( X i – X ) (Yi – Y ) < 0
X
Variable X
8
b.
Quadrant 2
Quadrant 1
Variable Y
( X – X ) (Yi – Y ) < 0 (X i – X) (Yi – Y ) > 0
i
•
Y
•
•
••
•
•
•
•
•
•
••
•
•
••
•
Quadrant 3
Quadrant 4
( Xi – X ) (Yi – Y ) > 0 ( X i – X ) (Yi – Y ) < 0
X
Variable X
9
2. If the majority of the data points fall in quadrants
1 and 3, the cross product is positive and r > 0
3. If the majority of the data points fall in quadrants
2 and 4, the cross product is negative and r < 0
4. If the data points are equally dispersed over the
four quadrants, the cross product equals zero and r
=0
5. The cross product is largest when the data points
fall on a straight line
6. The cross product is small when the data points
fall in an elongated circle (ellipse)
10
Table 1. Height and Weight of Girl’s Basketball Team
(1)
Girl
1
2
3
4
5
6
7
8
9
10
(2)
Xi
( X i  X )2
(5)
(Yi  Y )2
(6)
( X i  X )(Yi  Y )
.64
.09
.09
.09
.09
.04
.04
.04
.49
.49
289
49
289
49
9
9
49
169
529
169
13.6
2.1
5.1
2.1
–0.9
0.6
–1.4
2.6
16.1
9.1
X  6.2 Y  123   2.10
  1610
  49.0
7.0
6.5
6.5
6.5
6.5
6.0
6.0
6.0
5.5
5.5
(3)
Yi
140
130
140
130
120
120
130
110
100
110
(4)
11
Weight
B. Scatterplot for Data in Table 1
140
130
120
110
100
90
5.5
6.0
6.5
Height
7.0
12
C. Computation of r for Data in Table 1
n
 ( X i  X )(Yi  Y )
i1
n
r
n
n
 ( X i  X )  (Yi  Y )2
2
i1
i1
n
n
49.0
6.30
10


 .84
 2.10   1610  5.8152
 10   10 
13
III Interpretation of the Correlation Coefficient
A. Coefficient of Determination, r2 , and
Nondetermination, k2
SY2
SY2
r2
k2
 Total Y variance  Proportion of Y  
Proportion of Y 
 expressed as a    variance explained    variance not explained 

 
 

 proportion   by X variance  

by X variance
14
B. Visual Representation of r2 and k2
a. r = .84
Variance in Y
k2 = .29
b. r = .40
Variance in X
r2 = .71
k2 = .29
Variance in Y
k2 = .84
Variance in X
r2 = .16
k2 = .84
15
c. r= 1
Variance in Y
Variance in X
r2= 1
d.
r= 0
Variance in Y
k 2= 1
Variance in X
r 2= 0
k 2= 1
k2 = 0
16
IV Common Errors in Interpreting r
A. Interpreting r in Direct Proportion to its Size
B. Interpreting r in Terms of Arbitrary Labels

r  .90
very high
 r  .70 Š .89
high
 r  .30 Š .69
medium
 r  .30
low
17
1. Typical reliability coefficients
2. Typical validity coefficients
C. Inferring Causation from Correlation
V Some Factors That Affect the Correlation
Coefficient
18
A. Nature of the Relationship Between X and Y
a.
b.
•• •
• ••
•
•• •
•
Y
Y
•
•
•
• •
•
• ••
•
•
•• •
•
• •
• •
X
•
•
•
•
• ••
•
•
•
X
•
•
c.
• • ••
Y
• • • • ••
• •
•
•
•
•
••
•
•
•
•
•
X
1. Eta or eta squared can be used to describe the
curvilinear relation between X and Y
19
B. Truncated Range
Y
day
perday
units
Productionun
its per
Production
110
100
90
80
70
60
• •
• ••
•
• •
•
•
• • •
• •
•
•
•
•
•
• •
• • •
• •
•
•
•
•
•
• • •
•
•
•
•
30
40
50
60
70
80
90
X
Aptitude score
20
C. Subgroups with Different Means or Standard
Deviations
School achievement
a. Combined r is spuriously high.
Y
–
YM
–
YL
L
L
L
L
L
L
L
M
M M
M
M
M
M
M
L
–
XL
–
XM
b. Combined r is spuriously low.
Y
A
– –
YA Y
X
A
B
B
A
A AA
B
B
B
A
A A BA
B
B B
A
B
–
XA
B B
B
–
XB
X
Anxiety
21
c. Combined r is spuriously
high for B and low for A.
Y
d. Combined r is spuriously low.
Y
A
A A
A
B B B B
A A
B
A B
B
B
BA
B BB A
A
B AB A
A
A A
A A A
B A A A
A B AB B
A
AB
AA A A B B
A A
X
f.
Y
e.
Y
–
YA
–
YB
X
r=+
• ••
•
• ••
••
r
r= –
•
•
•
• •• •
••
r
combined = –
–
–
X
X
A
–
YB
=+
B
–
YA
X
• ••
•
• •• • •
•
r= +
•
• •• •
• •• •
•
rcombined = +
–
–
X
X
A
B
X
22
Son's authori tarianism
D. Discontinuous Distribution
44
42
40
38
36
34
32
30
28
26
24
22
20
18
16
•
•
Region of discontinuity
•
•
•
•
•
16 18 20
•
22 24
26 28 30 32 34
36 38 40
Father's authoritarianism
23
E. Non-Normal Distributions
Y
Y
Most scores
will fall in
this quadrant
–
Y
–
Y
Most scores
will fall in
this quadrant
–
X
X
–
X
Y
X
Y
Most scores
will fall in
this quadrant
–
Y
–
Y
Most scores
will fall in
this quadrant
–
X
X
–
X
X
24
F. Heterogeneous & Homogeneous Array
Variances
b.
a.
Y
Y
X
X
d.
c.
Y
Y
X
X
25
VI Spearman Rank Correlation (rs)
A. Strength of Monotonic Relationship Based On
Ranks, RXi and RYi
n
rs  1 

6 R X i  RYi
i 1


2

n n2 1
B. Computational Example
26
Table 2. Progress of Patients in Therapy as Ranked by
Occupational Therapist, RX, and Physical Therapist, RY
(1)
(2)
(3)
(4)
Patient
RX
RY
RX  RY
1
2
3
4
5
6
7
8
5
3
1
7
4
2
8
6
i
i
i
RX
i
(5)
i
 RYi
–2
0
–1
1
–1
1
0
2
4
0
1
1
1
1
0
4
 (RX  RY )  0
 (RX  RY )2  11
7
3
2
6
5
1
8
4
i
i
i

2
i
27
C. Computation of rs
6 R X
n
rs  1 
i 1

i
2

 RY

i
n n2 1
6(11)
66
rs  1
 1
 .87
504
8 (8)2  1


1. Dealing with tied ranks
28
VII Other Kinds of Correlation Coefficients
Coefficient
Symbol
1. Eta

2. Biserial
rb
3. Cramér’s
correlation
4. Multiple
correlation
V
R
Characteristics
X and Y quantitative,
curvilinear relationship
X and Y quantitative, but
one variable forced into a
dichotomy
X and Y both dichotomous
All X’s and Y’s quantitative,
linear relationships
29