Estimation and testing for Corr(X, Y ) • Problem set-up. Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ). We would like to – estimate ρ and – test H0 : ρ = 0 versus H1 : ρ 6= 0 based on (X1 , Y1 ), . . ., (Xn , Yn ). • Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ), then ρ can be estimated by the sample correlation coefficient Pn i=1 (Xi − X̄)(Yi − Ȳ ) p p r= , (1) 2 (n − 1)SX (n − 1)SY2 where n X̄ = 2 (n − 1)SX = n X 1X Xi , n i=1 n Ȳ = 1X Yi , n i=1 (Xi − X̄)2 and (n − 1)SY2 = i=1 n X (Yi − Ȳ )2 . i=1 • The sample correlation coefficient r defined in (1) is called the sample correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ). • The quantity n 1 X (Xi − X̄)(Yi − Ȳ ) n − 1 i=1 is called the sample covariance between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ). Therefore, (1) can be written as r= p sample covariance between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) p . sample variance for (X1 , . . . , Xn ) sample variance for (Y1 , . . . , Yn ) • Note that the sample correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) is the same as the sample correlation coefficient between (aX1 + b, . . . , aXn + b) and (cY1 + d, . . . , cYn + d). Pn Pn • i=1 (Xi − X̄)(Yi − Ȳ ) = ( i=1 Xi Yi ) − nX̄ Ȳ . 1 • Example 1. 下表為101/04/19 – 101/04/24 期間, 中華電和台灣大的每日股 票收盤價. 計算這段期間中華電收盤價和台灣大收盤價的sample correlation coefficient. 101/04/19 101/04/20 101/04/23 101/04/24 中華電收盤價 88.3 87.6 88.3 89.7 台灣大收盤價 91.1 90.5 91.6 93 Sol. 考慮將中華電收盤價-87, 台灣大收盤價-90後計算 sample correlation coefficient. 中華電收盤價-87後 sample mean 為(1.3 + 0.6 + 1.3 + 2.7)/4 = 5.9/4, sample variance 為 (1.32 + 0.62 + 1.32 + 2.72 − 4(5.9/4)2 )/3 = 9.31/12. 台灣大收盤價-90後 sample mean 為(1.1 + 0.5 + 1.6 + 3.0)/4 = 6.2/4, sample variance 為 (1.12 + 0.52 + 1.62 + 3.02 − 4(6.2/4)2 )/3 = 13.64/12. Sample correlation coefficient 為 = (1.3 × 1.1 + 0.6 × 0.5 + 1.3 × 1.6 + 2.7 × 3.0) − 4(5.9/4)(6.2/4) p (9.31/4)(13.64/4) 11.06/4 p = 0.9814611. 126.9884/16 • Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ). Suppose that the joint distribution for (X, Y ) is bivariate normal, then ρ = 0 implies that X and Y are independent. In such case, for n ≥ 3, √ r n−2 √ ∼ t(n − 2), 1 − r2 where t(n − 2) denotes the t distribution with n − 2 degrees of freedom. Let √ r n−2 T = √ , 1 − r2 then for testing H0 : ρ = 0 versus H1 : ρ 6= 0, one can reject H0 at level α if |T | > tα/2,n−2 . • Example 2. A babyfinder is a wireless locating device that contains a transmitter and a receiver. The transmitter is carried by a child and the receiver is carried by a parent. The transmitter sends out signals, and the receiver can estimate the distance to the transmitter using the 2 strength of the signals. When the receiver detects that the transmitter is too far away, it gives warnings. Let X be the log distance between the transmitter and the receiver, and Y be the signal strength (also at the log scale) that the receiver receives. Theoretically, Y ≈ a + bX. It is of interest to learn whether such a linear relationship is significant by testing whether Corr(X, Y ) = 0. Suppose that we have 20 pairs of observations for (X, Y ): (X1 , Y1 ), . . ., (X20 , Y20 ), where the sample correlation is 0.4. Suppose that (X, Y ) is bivariate normal and the 20 pairs are IID and each (Xi , Yi ) ∼ (X, Y ). Can we conclude that Corr(X, Y ) 6= 0 at the 0.05 level? Can we conclude that Corr(X, Y ) 6= 0 at the 0.1 level? • Sol. The observed value for |T | is 0.4√20 − 2 = 1.851640. p 1 − (0.4)2 From the table “Quantiles for t distributions”, 1.851640 is between t0.025,18 = 2.101 and t0.05,18 = 1.734. Since t0.05,18 < 1.851640 < t0.025,18 , we can conclude that Corr(X, Y ) 6= 0 at the 0.1 level but not at the 0.05 level. • R commands. 假設兩組樣本(X1 , . . . , Xn )及(Y1 , . . . , Yn )已分別儲存於R中 的兩向量x及 y. R 指令 var(x) cov(x,y) cor(x,y) length(x) 作用 計算 x 的 sample variance 計算 x 和 y 的 sample covariance 計算 x 和 y 的 sample correlation coefficient 計算 x 的長度, 即n. • Example 3. Suppose that two samples (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) are stored in two vectors x and y in R respectively. Suppose that we have the following R outputs: > var(x) [1] 0.7758333 > var(y) [1] 1.136667 > cov(x,y) [1] 0.9216667 3 Find the sample correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ). Sol. The sample p correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) is 0.9216667/ (0.7758333 × 1.136667) = 0.981461. • Example 4. Suppose that two samples (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) are stored in two vectors x and y in R respectively. Suppose that we have the following R outputs: > length(x) [1] 20 > cor(x,y) [1] 0.4 Let ρ be the correlation between X1 and Y1 . Can we conclude that ρ 6= 0 at the 0.05 significant level? Since n =√20 and√r = 0.4, from the solution to Example 2, the observed |T | = |r| n − 2/ 1 − r2 = 1.851640 < t0.025,18 , so we cannot conclude ρ 6= 0 at the 0.05 significant level. • One-side tests. Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ) and √ r n−2 , T = √ 1 − r2 where r is the sample correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ). 1. Testing problem: H0 : ρ ≤ 0 versus H1 : ρ > 0. – We reject H0 at level α if T > tα,n−2 . 2. Testing problem: H0 : ρ ≥ 0 versus H1 : ρ < 0. – We reject H0 at level α if T < −tα,n−2 . • Example 5. Suppose that (X1 , Y1 ), . . ., (X12 , Y12 ) form a random sample of 12 pairs and the sample correlation coefficient is 0.32. Let ρ be the correlation between X1 and Y1 . Can we conclude that ρ > 0 at the 0.05 significant level? Ans. No, the observed T = 1.068092 < t0.05,10 = 1.812. • Example 6. Suppose that (X1 , Y1 ), . . ., (X10 , Y10 ) form a random sample of 10 pairs and the sample correlation coefficient is −0.1. Let ρ be the correlation between X1 and Y1 . Can we conclude that ρ < 0 at the 0.05 significant level? Ans. No, the observed T = −0.2842676 > −t0.05,8 = −1.860. 4
© Copyright 2026 Paperzz