Chapter 5 Correlation I Introduction to Correlation and Regression A. Describing the Linear Relationship Between Two Variables, X and Y 1. Pearson product-moment correlation coefficient (r) 1 2. Bivariate frequency distributions (scatterplots) for various correlation coefficients (r) Y 50 40 30 20 10 r= + 1 • • • • •• • • • 10 20 30 40 50 X 50 40 30 20 10 r = .80 • •• • • • • • • •• • • 10 20 30 40 50 X 2 50 40 Y 30 20 10 r = .30 • • •• • • • • • • • • • • Y 50 40 30 20 10 10 20 30 40 50 X Y 50 40 30 20 10 r = –.20 • •• • • • • • • • • • •• • • • 10 20 30 40 50 X r =0 • • •• • • • • • • • • • • • • • •• • • • • •• 10 20 30 40 50 X 50 40 Y 30 20 10 • r = –1 • •• •• • • • 10 20 30 40 50 X 3 3. Upper and lower limits for r: +1 to –1 B. Correlation and Regression Distinguished 1. Characteristics of regression situations One dependent variable, Y, and one or more independent variables, X Levels of independent variables are selected in advance The value of the dependent variable for a given level of the independent variable is free to vary 4 The researcher is primarily interested in predicting Y from a knowledge of X 2. Characteristics of correlation situation Neither variable is considered the independent variable The researcher is primarily interested in assessing the strength of the relationship between X and Y X and Y are both free to vary 5 II Correlation A. Formula for Pearson Product-Moment Correlation Coefficient n ( X i X )(Yi Y ) r S XY S X SY i1 n n n ( X i X ) (Yi Y )2 2 i1 i1 n n 6 1. Understanding the formula for r; what the numerator tells you Covariance n S XY ( X i X )(Yi Y ) i1 n Information in the cross products n ( X i X )(Yi Y ) i1 7 a. Quadrant 2 Quadrant 1 Variable Y ( X i – X ) (Yi – Y ) < 0 ( X i – X) (Yi – Y ) > 0 Y • • • • • • • • • • • • • • • • • • Quadrant 3 Quadrant 4 ( Xi – X ) (Yi – Y ) > 0 ( X i – X ) (Yi – Y ) < 0 X Variable X 8 b. Quadrant 2 Quadrant 1 Variable Y ( X – X ) (Yi – Y ) < 0 (X i – X) (Yi – Y ) > 0 i • Y • • •• • • • • • • •• • • •• • Quadrant 3 Quadrant 4 ( Xi – X ) (Yi – Y ) > 0 ( X i – X ) (Yi – Y ) < 0 X Variable X 9 2. If the majority of the data points fall in quadrants 1 and 3, the cross product is positive and r > 0 3. If the majority of the data points fall in quadrants 2 and 4, the cross product is negative and r < 0 4. If the data points are equally dispersed over the four quadrants, the cross product equals zero and r =0 5. The cross product is largest when the data points fall on a straight line 6. The cross product is small when the data points fall in an elongated circle (ellipse) 10 Table 1. Height and Weight of Girl’s Basketball Team (1) Girl 1 2 3 4 5 6 7 8 9 10 (2) Xi ( X i X )2 (5) (Yi Y )2 (6) ( X i X )(Yi Y ) .64 .09 .09 .09 .09 .04 .04 .04 .49 .49 289 49 289 49 9 9 49 169 529 169 13.6 2.1 5.1 2.1 –0.9 0.6 –1.4 2.6 16.1 9.1 X 6.2 Y 123 2.10 1610 49.0 7.0 6.5 6.5 6.5 6.5 6.0 6.0 6.0 5.5 5.5 (3) Yi 140 130 140 130 120 120 130 110 100 110 (4) 11 Weight B. Scatterplot for Data in Table 1 140 130 120 110 100 90 5.5 6.0 6.5 Height 7.0 12 C. Computation of r for Data in Table 1 n ( X i X )(Yi Y ) i1 n r n n ( X i X ) (Yi Y )2 2 i1 i1 n n 49.0 6.30 10 .84 2.10 1610 5.8152 10 10 13 III Interpretation of the Correlation Coefficient A. Coefficient of Determination, r2 , and Nondetermination, k2 SY2 SY2 r2 k2 Total Y variance Proportion of Y Proportion of Y expressed as a variance explained variance not explained proportion by X variance by X variance 14 B. Visual Representation of r2 and k2 a. r = .84 Variance in Y k2 = .29 b. r = .40 Variance in X r2 = .71 k2 = .29 Variance in Y k2 = .84 Variance in X r2 = .16 k2 = .84 15 c. r= 1 Variance in Y Variance in X r2= 1 d. r= 0 Variance in Y k 2= 1 Variance in X r 2= 0 k 2= 1 k2 = 0 16 IV Common Errors in Interpreting r A. Interpreting r in Direct Proportion to its Size B. Interpreting r in Terms of Arbitrary Labels r .90 very high r .70 Š .89 high r .30 Š .69 medium r .30 low 17 1. Typical reliability coefficients 2. Typical validity coefficients C. Inferring Causation from Correlation V Some Factors That Affect the Correlation Coefficient 18 A. Nature of the Relationship Between X and Y a. b. •• • • •• • •• • • Y Y • • • • • • • •• • • •• • • • • • • X • • • • • •• • • • X • • c. • • •• Y • • • • •• • • • • • • •• • • • • • X 1. Eta or eta squared can be used to describe the curvilinear relation between X and Y 19 B. Truncated Range Y day perday units Productionun its per Production 110 100 90 80 70 60 • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 30 40 50 60 70 80 90 X Aptitude score 20 C. Subgroups with Different Means or Standard Deviations School achievement a. Combined r is spuriously high. Y – YM – YL L L L L L L L M M M M M M M M L – XL – XM b. Combined r is spuriously low. Y A – – YA Y X A B B A A AA B B B A A A BA B B B A B – XA B B B – XB X Anxiety 21 c. Combined r is spuriously high for B and low for A. Y d. Combined r is spuriously low. Y A A A A B B B B A A B A B B B BA B BB A A B AB A A A A A A A B A A A A B AB B A AB AA A A B B A A X f. Y e. Y – YA – YB X r=+ • •• • • •• •• r r= – • • • • •• • •• r combined = – – – X X A – YB =+ B – YA X • •• • • •• • • • r= + • • •• • • •• • • rcombined = + – – X X A B X 22 Son's authori tarianism D. Discontinuous Distribution 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 • • Region of discontinuity • • • • • 16 18 20 • 22 24 26 28 30 32 34 36 38 40 Father's authoritarianism 23 E. Non-Normal Distributions Y Y Most scores will fall in this quadrant – Y – Y Most scores will fall in this quadrant – X X – X Y X Y Most scores will fall in this quadrant – Y – Y Most scores will fall in this quadrant – X X – X X 24 F. Heterogeneous & Homogeneous Array Variances b. a. Y Y X X d. c. Y Y X X 25 VI Spearman Rank Correlation (rs) A. Strength of Monotonic Relationship Based On Ranks, RXi and RYi n rs 1 6 R X i RYi i 1 2 n n2 1 B. Computational Example 26 Table 2. Progress of Patients in Therapy as Ranked by Occupational Therapist, RX, and Physical Therapist, RY (1) (2) (3) (4) Patient RX RY RX RY 1 2 3 4 5 6 7 8 5 3 1 7 4 2 8 6 i i i RX i (5) i RYi –2 0 –1 1 –1 1 0 2 4 0 1 1 1 1 0 4 (RX RY ) 0 (RX RY )2 11 7 3 2 6 5 1 8 4 i i i 2 i 27 C. Computation of rs 6 R X n rs 1 i 1 i 2 RY i n n2 1 6(11) 66 rs 1 1 .87 504 8 (8)2 1 1. Dealing with tied ranks 28 VII Other Kinds of Correlation Coefficients Coefficient Symbol 1. Eta 2. Biserial rb 3. Cramér’s correlation 4. Multiple correlation V R Characteristics X and Y quantitative, curvilinear relationship X and Y quantitative, but one variable forced into a dichotomy X and Y both dichotomous All X’s and Y’s quantitative, linear relationships 29
© Copyright 2025 Paperzz