Inferences for Correlation Quantitative Methods II Plan for Today • Recall: correlation coefficient • Bivariate normal distributions • Hypotheses testing for population correlation • Confidence intervals for population correlation 1 Bivariate Analysis • Is there a relationship between two variables? • For example, is there a relationship between a person’s income and his level of educational attainment? • These types of questions are studied by the bivariate analysis, where “bi” indicates two variables. • Statisticians also work with more than two variables, leading to a multivariate analysis. Correlation Coefficient r • It was developed by Karl Pearson in the early 1900s as a numerical measure of strength and direction of the linear association between the independent variable x and the dependent variable y. • The value of r is always between −1 and 1. • If r > 0 , the correlation is positive (when x increases, y increases as well). • If r < 0 , the correlation is negative (when x increases, y decreases). 2 Correlation Coefficient r • r ≈ −1 : perfect negative correlation • −1 < r < −0.6 : strong negative correlation • −0.6 < r < −0.3 : moderate negative correlation • −0.3 < r < 0 : weak negative correlation • r ≈ 0 : no correlation • 0 < r < 0.3 : weak positive correlation • 0.3 < r < 0.6 : moderate positive correlation • 0.6 < r < 1 : strong positive correlation • r ≈ 1 : perfect positive correlation A formula for the coefficient of correlation 𝑠𝑥 ∙ 𝑏 𝑠𝑦 where 𝑠𝑥 and 𝑠𝑦 are the standard deviations for the x- and y- data respectively, and the slope 𝑟= 𝑏 = 𝑥𝑦 − 𝑥ҧ ∙ 𝑦ത 𝑥 2 − 𝑥ҧ 2 However, the correlation coefficient is much faster and easier to compute using your scientific calculator ! 3 Salaries and education •The table on the left represents a sample of 11 individuals working for the government of Quebec, their annual salary in thousands of $, and their educational attainment, in years. Computing the correlation coefficient Using the formulas from before or the built-in calculator functions, compute: 𝒓 = 𝟎. 𝟔𝟑𝟗𝟓 • This is an example of a strong positive correlation. 4 Bivariate normal distribution We shall always assume that the set (x, y) of ordered pairs of data comes from a bivariate normal distribution. It means that for a fixed value of x, the values of y are normally distributed, and for a fixed value of y the values of x are normally distributed as well. In most cases, the results are still accurate if the distributions are bell-shaped and symmetrical, and the y-variances are approximately equal. Hypothesis testing for correlation. The population correlation is denoted by the Greek letter ρ (“rho”) and the sample correlation by r. The null-hypothesis is always going to be that the values of x and y have no linear correlation, that is 𝐻0 : 𝜌 = 0. The alternate hypotheses will always be 𝐻𝐴 : 𝜌 ≠ 0 (two-tailed test) 5 The test statistic To test hypotheses for a population correlation, we are going to use the Student’s t distribution with (n-2) degrees of freedom: df = 𝑛 − 2 And the test statistic 𝑡∗ is given by 𝑡∗ = 𝑟∙ 𝑛−2 1 − 𝑟2 Here n is the number of pairs of data (x, y). Example: study hours and grades Five students have recorded the number of hours they studied for an exam and their grades: Hours 2 5 1 4 2 Grade 80 80 70 90 60 Assuming a bivariate normal population, test at a 5% level of significance whether the correlation between the number of hours of study and the grade is significant. 6 Example: study hours and grades First of all, let us compute the sample correlation coefficient, using formulas or a calculator. We have 𝑟 = 0.6138 . State the hypotheses: 𝐻0 : 𝜌 = 0 , 𝐻𝐴 : 𝜌 ≠ 0. (A two-tailed test.) The test statistic: 𝑡∗ = 0.6138∙ 5−2 1−0.61382 = 1.35 The number of degrees of freedom is df = 3. The critical values: ±𝑡 3, 0.025 = ±3.182 The p-value = 2 ∙ 0.142 = 0.284 > 0.05 = 𝛼 Decision: fail to reject H0. Example: reading time and TV Do reading an TV viewing compete for leisure time? To find out, a psychologist interviewed a random sample of 15 children regarding the number of books they had read during the last year and the number of hours they had spent watching TV on a daily basis. If a correlation coefficient of −0.715 is obtained, is the correlation significant at the 5% level of significance? Assume that it’s a bivariate normal population. 7 Example: reading time and TV Let us state the hypotheses: 𝐻0 : 𝜌 = 0 , 𝐻𝐴 : 𝜌 ≠ 0. (A two-tailed test.) The test statistic: 𝑡∗ = −0.715∙ 15−2 1−(−0.715)2 = −3.69 The number of degrees of freedom is df = 13. The critical values: ±𝑡 13, 0.025 = ±2.16 (Sketch the curve and the regions of rejection.) The p-value = 2 ∙ 0.002 = 0.004 < 0.05 = 𝛼 Decision: reject H0. The confidence intervals We start with the Fisher transformation: 1 1+𝑟 ∙ ln 2 1−𝑟 It turns out, that for a bivariate normal population, Z is (approximately) normally distributed with the st. deviation of 1Τ 𝑛 − 3 So the confidence interval for 𝜇𝑧 is 𝑍= 𝑐=𝑍− 𝑧(𝛼 Τ2) 𝑛−3 < 𝜇𝑍 < 𝑍 + 𝑧(𝛼 Τ2) 𝑛−3 =𝑑 8 The confidence intervals Now we perform the inverse Fisher transformation to get the confidence interval for the population correlation ρ: e2∙𝑐 − 1 e2∙𝑑 − 1 < 𝜌 < 2∙𝑑 e2∙𝑐 + 1 e +1 Make sure you can compute these quantities correctly on your scientific calculator! Let us consider examples. Example: study hours and grades Let us find the 95% confidence interval for ρ. Recall that we have 𝑟 = 0.6138 . Do the Fisher 1 2 transformation: 𝑍 = ∙ ln 1+0.6138 1−0.6138 = 0.7150 𝑧 𝛼 Τ2 = 1.96 . The confidence interval for 𝜇𝑍 : 1.96 1.96 0.7150 − < 𝜇 < 0.7150 + 2 2 so 𝑐 = −0.6709 < 𝜇𝑍 < 2.1009 = 𝑑 9 Example: study hours and grades Now we’ll do the inverse Fisher transformation. e2∙(−0.6709) − 1 e2∙2.1009 − 1 < 𝜌 < 2∙2.1009 e +1 e2∙(−0.6709) + 1 After computation, we find the 95% confidence interval for the population correlation coefficient ρ: −0.5856 < 𝜌 < 0.9705 Example: reading time and TV Let us find the 94% confidence interval for ρ. Recall that we have 𝑟 = −0.715 . Do the Fisher 1 2 transform: 𝑍 = ∙ ln 1+ −0.715 1− −0.715 = −0.8973 𝑧 𝛼 Τ2 = 1.88 . The confidence interval for 𝜇𝑍 : 1.88 1.88 −0.8973 − < 𝜇 < −0.8973 + 12 12 so 𝑐 = −1.4400 < 𝜇𝑍 < −0.3546 = 𝑑 10 Example: reading time and TV Now we’ll do the inverse Fisher transformation. e2∙(−1.44) − 1 e2∙(−0.3546) − 1 < 𝜌 < 2∙(−0.3546) e2∙(−1.44) + 1 e +1 After computation, we find the 94% confidence interval for the population correlation coefficient ρ: −0.8937 < 𝜌 < −0.3404 Example: salt and anxiety (practice) Is there a correlation between one’s salt intake and his or her level of stress and anxiety? A study of 32 volunteers has found a correlation coefficient of −𝟎. 𝟐𝟔 between the participants’ salt intake and the amplitude of their adrenaline spikes. (a) Test at a 5% level of significance whether the population correlation is significant. (b) Construct a 95% confidence interval for the population correlation coefficient. Assume a bivariate normal population. 11 Example: immigration and GDP (practice) Do immigration rates correlate with GDP (gross domestic product)? A researcher took data from 40 different countries and found the correlation coefficient for her sample to be equal to 0.44. (a) Test at a 1% level of significance whether there is a significant correlation between immigration rates and GDP. (b) Construct a 98% confidence interval for the population correlation coefficient. 12
© Copyright 2026 Paperzz