Outline Correlation Correlation∗ Alan T. Arnholt Department of Mathematical Sciences Appalachian State University [email protected] Spring 2006 R Notes ∗ 1 c 2006 Alan T. Arnholt Copyright The R Script Outline Correlation Correlation Overview of Correlation The R Script 2 The R Script Outline Correlation The R Script Correlation The correlation coefficient, denoted by r, measures the strength and direction of the linear relationship between two numeric variables X and Y and is defined by n 1 X r= n−1 i=1 3 xi − x̄ sX yi − ȳ sY (1) Outline Correlation Correlation Properties • The value for r will always be between −1 and +1. 4 The R Script Outline Correlation Correlation Properties • The value for r will always be between −1 and +1. • When r is close to +1, it indicates a strong positive linear relationship. That is, when x increases so does y and vice versa. 5 The R Script 6 Outline Correlation The R Script Correlation Properties • The value for r will always be between −1 and +1. • When r is close to +1, it indicates a strong positive linear relationship. That is, when x increases so does y and vice versa. • When the value of r is close to −1, it indicates a strong negative linear relationship. Values of r close to zero indicate weak linear relationships. Outline Correlation The R Script Correlation Properties • The value for r will always be between −1 and +1. • When r is close to +1, it indicates a strong positive linear relationship. That is, when x increases so does y and vice versa. • When the value of r is close to −1, it indicates a strong negative linear relationship. Values of r close to zero indicate weak linear relationships. • To compute the correlation between two numeric vectors with R, one may use the function cor(x,y). 7 Outline Correlation Example Use the data frame Correlat from the BSDA package to: 1. Create a scatterplot of Y versus X. 8 The R Script Outline Correlation Example Use the data frame Correlat from the BSDA package to: 1. Create a scatterplot of Y versus X. 2. Compute the sample correlation coefficient r using the definition. 9 The R Script Outline Correlation Example Use the data frame Correlat from the BSDA package to: 1. Create a scatterplot of Y versus X. 2. Compute the sample correlation coefficient r using the definition. 3. Compute the sample correlation coefficient r using the R function cor(). > library(BSDA) > attach(Correlat) > plot(X,Y,col="blue",main="Scatterplot") 10 The R Script 11 Outline Correlation The R Script Scatterplot of Y Versus X 40 50 60 Y 70 80 90 Scatterplot 20 40 60 X 80 Outline Correlation Using the Formula > m.x <- mean(X) > m.y <- mean(Y) > s.x <- sd(X) > s.y <- sd(Y) > Z.x <- (X-m.x)/s.x > Z.y <- (Y-m.y)/s.y > ZxZy <- Z.x*Z.y > r <- (1/(length(X)-1))*sum(ZxZy) > r [1] -0.813717 > cor(X,Y) [1] -0.813717 12 The R Script Outline Correlation Code Continued > c(m.x, m.y, s.x, s.y) [1] 54.53846 68.92308 21.81596 18.81693 > stuff <- cbind(X,Y,Z.x,Z.y,ZxZy) > stuff X Y Z.x Z.y ZxZy [1,] 42 75 -0.57473814 0.3229497 -0.18561153 [2,] 61 49 0.29618407 -1.0587846 -0.31359513 [3,] 12 95 -1.94987849 1.3858223 -2.70218502 [4,] 71 64 0.75456419 -0.2616302 -0.19741675 [5,] 52 83 -0.11635803 0.7480987 -0.08704730 [6,] 48 84 -0.29971007 0.8012424 -0.24014041 [7,] 74 38 0.89207822 -1.6433645 -1.46600964 [8,] 65 58 0.47953612 -0.5804919 -0.27836684 [9,] 53 81 -0.07052002 0.6418115 -0.04526056 [10,] 63 47 0.38786010 -1.1650718 -0.45188487 [11,] 55 78 0.02115601 0.4823806 0.01020525 [12,] 94 51 1.80883845 -0.9524973 -1.72291376 [13,] 19 93 -1.62901241 1.2795350 -2.08437841 13 The R Script Outline Correlation The R Script Correlation and Causation • A positive correlation between two variables means that large values of one variable tend to be associated with large values in the other variable. 14 Outline Correlation The R Script Correlation and Causation • A positive correlation between two variables means that large values of one variable tend to be associated with large values in the other variable. • This does not necessarily mean that the large values of the first variable caused the large values of the other variable. 15 Outline Correlation The R Script Correlation and Causation • A positive correlation between two variables means that large values of one variable tend to be associated with large values in the other variable. • This does not necessarily mean that the large values of the first variable caused the large values of the other variable. • Correlation measures the linear association between two variables, not the causal effect. 16 Outline Correlation The R Script Link to the R Script • Go to my web page Script for Correlation • Homework: problems 2.20, 2.21, 2.23, 2.27, 2.28, 2.30 and 2.31 • See me if you need help! 17
© Copyright 2026 Paperzz