Estimation and testing for Corr(X, Y ) • Problem set

Estimation and testing for Corr(X, Y )
• Problem set-up. Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random
sample of n pairs and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ). We would
like to
– estimate ρ and
– test
H0 : ρ = 0 versus H1 : ρ 6= 0
based on (X1 , Y1 ), . . ., (Xn , Yn ).
• Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs and
(Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ), then ρ can be estimated by the
sample correlation coefficient
Pn
i=1 (Xi − X̄)(Yi − Ȳ )
p
p
r=
,
(1)
2
(n − 1)SX
(n − 1)SY2
where
n
X̄ =
2
(n − 1)SX
=
n
X
1X
Xi ,
n i=1
n
Ȳ =
1X
Yi ,
n i=1
(Xi − X̄)2 and (n − 1)SY2 =
i=1
n
X
(Yi − Ȳ )2 .
i=1
• The sample correlation coefficient r defined in (1) is called the sample
correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ).
• The quantity
n
1 X
(Xi − X̄)(Yi − Ȳ )
n − 1 i=1
is called the sample covariance between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ).
Therefore, (1) can be written as
r= p
sample covariance between (X1 , . . . , Xn ) and (Y1 , . . . , Yn )
p
.
sample variance for (X1 , . . . , Xn ) sample variance for (Y1 , . . . , Yn )
• Note that the sample correlation coefficient between (X1 , . . . , Xn ) and
(Y1 , . . . , Yn ) is the same as the sample correlation coefficient between
(aX1 + b, . . . , aXn + b) and (cY1 + d, . . . , cYn + d).
Pn
Pn
•
i=1 (Xi − X̄)(Yi − Ȳ ) = (
i=1 Xi Yi ) − nX̄ Ȳ .
1
• Example 1. 下表為101/04/19 – 101/04/24 期間, 中華電和台灣大的每日股
票收盤價. 計算這段期間中華電收盤價和台灣大收盤價的sample correlation
coefficient.
101/04/19
101/04/20
101/04/23
101/04/24
中華電收盤價
88.3
87.6
88.3
89.7
台灣大收盤價
91.1
90.5
91.6
93
Sol. 考慮將中華電收盤價-87, 台灣大收盤價-90後計算 sample correlation
coefficient. 中華電收盤價-87後 sample mean 為(1.3 + 0.6 + 1.3 + 2.7)/4 =
5.9/4, sample variance 為 (1.32 + 0.62 + 1.32 + 2.72 − 4(5.9/4)2 )/3 =
9.31/12. 台灣大收盤價-90後 sample mean 為(1.1 + 0.5 + 1.6 + 3.0)/4 =
6.2/4, sample variance 為 (1.12 + 0.52 + 1.62 + 3.02 − 4(6.2/4)2 )/3 =
13.64/12. Sample correlation coefficient 為
=
(1.3 × 1.1 + 0.6 × 0.5 + 1.3 × 1.6 + 2.7 × 3.0) − 4(5.9/4)(6.2/4)
p
(9.31/4)(13.64/4)
11.06/4
p
= 0.9814611.
126.9884/16
• Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs
and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ). Suppose that the joint
distribution for (X, Y ) is bivariate normal, then ρ = 0 implies that X
and Y are independent. In such case, for n ≥ 3,
√
r n−2
√
∼ t(n − 2),
1 − r2
where t(n − 2) denotes the t distribution with n − 2 degrees of freedom.
Let
√
r n−2
T = √
,
1 − r2
then for testing H0 : ρ = 0 versus H1 : ρ 6= 0, one can reject H0 at level
α if
|T | > tα/2,n−2 .
• Example 2.
A babyfinder is a wireless locating device that contains
a transmitter and a receiver. The transmitter is carried by a child and
the receiver is carried by a parent. The transmitter sends out signals,
and the receiver can estimate the distance to the transmitter using the
2
strength of the signals. When the receiver detects that the transmitter
is too far away, it gives warnings. Let X be the log distance between the
transmitter and the receiver, and Y be the signal strength (also at the
log scale) that the receiver receives. Theoretically, Y ≈ a + bX. It is of
interest to learn whether such a linear relationship is significant by testing
whether Corr(X, Y ) = 0. Suppose that we have 20 pairs of observations
for (X, Y ): (X1 , Y1 ), . . ., (X20 , Y20 ), where the sample correlation is 0.4.
Suppose that (X, Y ) is bivariate normal and the 20 pairs are IID and
each (Xi , Yi ) ∼ (X, Y ). Can we conclude that Corr(X, Y ) 6= 0 at the
0.05 level? Can we conclude that Corr(X, Y ) 6= 0 at the 0.1 level?
• Sol. The observed value for |T | is
0.4√20 − 2 = 1.851640.
p
1 − (0.4)2 From the table “Quantiles for t distributions”, 1.851640 is between
t0.025,18 = 2.101 and t0.05,18 = 1.734.
Since
t0.05,18 < 1.851640 < t0.025,18 ,
we can conclude that Corr(X, Y ) 6= 0 at the 0.1 level but not at the 0.05
level.
• R commands. 假設兩組樣本(X1 , . . . , Xn )及(Y1 , . . . , Yn )已分別儲存於R中
的兩向量x及 y.
R 指令
var(x)
cov(x,y)
cor(x,y)
length(x)
作用
計算 x 的 sample variance
計算 x 和 y 的 sample covariance
計算 x 和 y 的 sample correlation coefficient
計算 x 的長度, 即n.
• Example 3. Suppose that two samples (X1 , . . . , Xn ) and (Y1 , . . . , Yn )
are stored in two vectors x and y in R respectively. Suppose that we have
the following R outputs:
> var(x)
[1] 0.7758333
> var(y)
[1] 1.136667
> cov(x,y)
[1] 0.9216667
3
Find the sample correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn ).
Sol. The sample
p correlation coefficient between (X1 , . . . , Xn ) and (Y1 , . . . , Yn )
is 0.9216667/ (0.7758333 × 1.136667) = 0.981461.
• Example 4. Suppose that two samples (X1 , . . . , Xn ) and (Y1 , . . . , Yn )
are stored in two vectors x and y in R respectively. Suppose that we have
the following R outputs:
> length(x)
[1] 20
> cor(x,y)
[1] 0.4
Let ρ be the correlation between X1 and Y1 . Can we conclude that ρ 6= 0
at the 0.05 significant level?
Since n =√20 and√r = 0.4, from the solution to Example 2, the observed
|T | = |r| n − 2/ 1 − r2 = 1.851640 < t0.025,18 , so we cannot conclude
ρ 6= 0 at the 0.05 significant level.
• One-side tests. Suppose that (X1 , Y1 ), . . ., (Xn , Yn ) form a random sample of n pairs and (Xi , Yi ) ∼ (X, Y ). Let ρ = Corr(X, Y ) and
√
r n−2
,
T = √
1 − r2
where r is the sample correlation coefficient between (X1 , . . . , Xn ) and
(Y1 , . . . , Yn ).
1. Testing problem: H0 : ρ ≤ 0 versus H1 : ρ > 0.
– We reject H0 at level α if T > tα,n−2 .
2. Testing problem: H0 : ρ ≥ 0 versus H1 : ρ < 0.
– We reject H0 at level α if T < −tα,n−2 .
• Example 5. Suppose that (X1 , Y1 ), . . ., (X12 , Y12 ) form a random sample
of 12 pairs and the sample correlation coefficient is 0.32. Let ρ be the
correlation between X1 and Y1 . Can we conclude that ρ > 0 at the 0.05
significant level?
Ans. No, the observed T = 1.068092 < t0.05,10 = 1.812.
• Example 6. Suppose that (X1 , Y1 ), . . ., (X10 , Y10 ) form a random sample
of 10 pairs and the sample correlation coefficient is −0.1. Let ρ be the
correlation between X1 and Y1 . Can we conclude that ρ < 0 at the 0.05
significant level?
Ans. No, the observed T = −0.2842676 > −t0.05,8 = −1.860.
4