1. In the linear regression model price = a + b · age + error, b is

1. In the linear regression model
price = a + b · age + error,
b is estimated by
b̂ =
r · Sprice
Sage
=
−0.6 · 1.9
= −0.518
2.2
and a is estimated by
â = price − b̂ · age = 6.9 −
−0.6 · 1.9
2.2
· 8.9 = 11.512,
so we have the estimated regression equation
d = 11.512 − 0.518 · age.
price
2. 36% since r2 = (−0.6)2 = 0.36.
3. Let ρ be the correlation between price and age. Consider the testing
problem
H0 : ρ = 0 v.s. H1 : ρ 6= 0.
The linear relation between price and age is significant if H0 is rejected.
We reject H0 at level 0.05 if
√
r n−2
|Tn | = √
> t28,0.025 = 2.048.
1 − r2
The observed Tn is
√
√
−0.6 · 28
r n−2
√
= √
= −3.9686.
1 − r2
1 − 0.62
Since | − 3.9686| > t28,0.025 = 2.048, we can conclude that the linear
relation between price and age is significant at the 0.05 level.
4. A 95% prediction interval for car price when car age is 8 is of the form
(â + b̂ · 8 − E, â + b̂ · 8 + E),
where
â + b̂ · 8 = 6.9 −
−0.6 · 1.9
2.2
· 8.9 + 8
−0.6 · 1.9
2.2
=
16.206
2.2
and
s
E
=
t0.025,30−2 ·
2
(30 − 1) · Sprice
· (1 − r2 )
s
1+
(30 − 2)
r
(29) · 1.92 · 0.64
1
(8 − 8.9)2
= 2.048
1+
+
28
30
29 · 2.22
r
r
67.0016
1
0.81
= 2.048
1+
+
.
28
30 140.36
r
1
1
(8 − age)2
+
2
30 (30 − 1)Sage
The lower bound for the prediction interval is
r
r
16.206
67.0016
1
0.81
− 2.048
1+
+
= 4.137
2.2
28
30 140.36
and the upper bound for the prediction interval is
r
r
16.206
67.0016
1
0.81
1+
+ 2.048
+
= 10.596,
2.2
28
30 140.36
so the 95% prediction interval for car price when car age is 8 is (4.137, 10.596).
5. A 95% confidence interval for the expected car price when car age is 8 is
of the form
(â + b̂ · 8 − E, â + b̂ · 8 + E),
where
â + b̂ · 8 = 6.9 −
−0.6 · 1.9
2.2
· 8.9 + 8
−0.6 · 1.9
2.2
=
16.206
2.2
and
s
E
= t0.025,30−2 ·
2
(30 − 1) · Sprice
· (1 − r2 )
s
(30 − 2)
r
(29) · 1.92 · 0.64 1
(8 − 8.9)2
= 2.048
+
28
30
29 · 2.22
r
r
67.0016 1
0.81
= 2.048
+
.
28
30 140.36
(8 − age)2
1
+
2
30 (30 − 1)Sage
r
The lower bound for the confidence interval is
r
r
0.81
16.206
67.0016 1
− 2.048
+
= 6.740
2.2
28
30 140.36
and the upper bound for the confidence interval is
r
r
16.206
67.0016
1
0.81
+ 2.048
1+
+
= 7.993,
2.2
28
30 140.36
so the 95% confidence interval for the expected car price when car age is
8 is (6.740, 7.993).
6. (a) The p-value for testing H0 : the coefficient for weight is 0 versus H1 :
the coefficient is not 0 is 0.00373, and this is also the p-value for
testing H0 : the correlation between height and weight is 0 versus
H1 : the correlation is not 0. Since the p-value is less than 0.05, we
can conclude that the linear relation between height and weight is
significant at the 0.05 level.
(b) 99.25% since r2 = 0.9925.
(c) The estimated intercept and coefficient for weight are 108.30301 and
1.04918 respectively, so the estimated regression equation is
d = 108.30301 + 1.04918 · weight.
height
2
8. The alternative hypothesis H1 is that µ1 6= µ2 , so the testing problem is
H0 : µ1 = µ2 v.s. H1 : µ1 6= µ2 .
We reject H0 at level 0.05 if
X̄ − Ȳ > t0.025,15+35−2 .
|Tn | ≡ q
σ̂ n1 + n1 1
2
where X̄ and Ȳ are the two sample means with sample sizes n1 and n2
respectively,
s
2 + (n − 1)S 2
(n1 − 1)SX
2
Y
σ̂ =
,
(n1 + n2 − 2)
2
and SX
and SY2 are the two sample variances with sample sizes n1 and n2
respectively. The observed σ̂ is
r
r
14 · 0.12 + 34 · 0.22
1.5
=
15 + 35 − 2
48
and the observed Tn is
5.5 − 4.5
q
1.5 1
48 ( 15
+
= 18.3303
1
35 )
Since |18.3303| > t48,0.025 = 2.011, we can conclude µ1 6= µ2 at the 0.05
level.
9. The alternative hypothesis H1 is that σ1 6= σ2 , so the testing problem is
H0 : σ1 = σ2 v.s. H1 : σ1 6= σ2 .
We reject H0 at level 0.1 if
F ≡
SY2
1
> f0.05,n2 −1,n1 −1 or
> f0.05,n1 −1,n2 −1 ,
2
SX
F
2
where SX
and SY2 are the two sample variances with sample sizes n1 and
n2 respectively. With n1 = 15 and n2 = 35, we have f0.05,n2 −1,n1 −1 =
f0.05,35−1,15−1 = 2.288691 (from R) and the observed F is
0.22
=4
0.12
Since 4 > 2.288691, we can conclude σ1 6= σ2 at the 0.1 level.
Note that from the F table in Appendix B.4, we cannot find f0.05,34,14 .
However, we can find f0.05,30,14 = 2.31 and f0.05,40,14 = 2.27, so we know
f0.05,34,14 is in the interval (2.27, 2.31) and 4 > f0.05,34,14 .
10. The alternative hypothesis H1 is that µ1 6= µ2 , so the testing problem is
H0 : µ1 = µ2 v.s. H1 : µ1 6= µ2 .
Since the weights in the beginning and the weights at the end are not
independent, we use the pairwise t test. Let (D1 , . . . , D5 ) be the pairwise
3
differences between the weights in the beginning and the weights at the
end and let D̄ and SD be the sample mean and sample standard deviation
for (D1 , . . . , D5 ). The pairwise t test rejects H0 at level 0.05 if
√ 5D̄ |Tn | = > t0.025,5−1 .
SD From
R, t0.025,5−1 = 2.776445 and the observed D̄ and SD are 2.6 and
√
19.3 respectively. The observed Tn is then
√
5 · 2.6
√
= 1.323365.
19.3
Since |1.323365| < 2.776445, we cannot conclude µ1 6= µ2 at the 0.05 level.
11. Note that F1,2 > fa/2,n1−1,n2−1 is equivalent to
a > 2P (F (n1 − 1, n2 − 1) > observed F1,2 )
|
{z
}
I
and 1/F1,2 > fa/2,n2−1,n1−1 is equivalent to
1
.
a > 2P F (n2 − 1, n1 − 1) > observed
F1,2
{z
}
|
II
Therefore,
the test rejects at level a
⇔
a > I or a > II
⇔
a > min(I, II),
so the p-value is the minimum of I and II.
p
12. Since the distribution of T is the distribution of Z/ U/m, where Z ∼
N (0, 1) and U ∼ χ2 (m) are independent variables, the distribution of T 2
is the distribution of
(Z 2 /1)
Z2
=
.
(U/m)
(U/m)
Note that Z 2 ∼ χ2 (1) and is independent of U/m, so in the ratio (Z 2 /1)/(U/m),
the numerator and the denominator are χ2 variables divided by their degrees of freedom (1 and m respectively). Therefore, the distribution of the
ratio is F (1, m) and so is the distribution of T 2 .
13. (a) The null hypothesis is that the mean height for males is the same as
that for females.
(b) The p-value is 0.04366 < 0.05, so we can reject the null hypothesis
at level 0.05.
14. Let µ1 , µ2 , and µ3 be the mean scores for the three classes (the means for
the three score distributions). The testing problem is
H0 : µ1 = µ2 = µ3 v.s. H1 : µ1 , µ2 , and µ3 , are not all the same.
4
The grand mean (total mean) is
10(704.7/10) + 17(1084.1/17) + 10(734.7/10)
= 68.2027,
37
SST
10((704.7/10) − 68.2027)2 + 17((1084.1/17) − 68.2027)2
=
+10((734.7/10) − 68.2027)2
=
SSE
662.7924,
=
(10 − 1)(219.601/9) + (17 − 1)(786.1753/16) + (10 − 1)(257.201/9)
=
1262.9773,
and
F =
SST/(3 − 1)
= 8.921357.
SSE/(37 − 3)
The p-value is
P (F (3 − 1, 37 − 3) > 8.921357) = 0.0007681574 < 0.05,
so we can conclude H1 : there is a significant difference in the mean scores
for the three classes at the 0.05 level.
In the above solution, the probability P (F (3 − 1, 37 − 3) > 8.921357) is
found using R. If we use the table for F distributions in the text (Appendix
B.4), we can find f0.05,2,30 = 3.32 and f0.05,2,40 = 3.23, so f0.05,2,34 is in
the range (3.23, 3.32) and is less than the observed F value 8.921357. Thus
we can conclude H1 at the 0.05 level based on the table as well.
15. (a)
s
σ̂ =
r
p
p
(10 − 1)( 219.601/9)2 + (10 − 1) · ( 257.201/9)2
476.802 √
=
= 26.489
10 + 10 − 2
18
T =
(704.7/10) − (734.7/10)
−3
p
=√
= −1.303387
2
5.2978
σ̂ (1/10 + 1/10)
(b) The grand mean (total mean) is
10(704.7/10) + 10(734.7/10)
= 71.97,
20
2
704.7
734.7
2
SST = 10
− 71.97) + 10(
− 71.97 = 45,
10
10
219.601
257.201
SSE = (10 − 1)
+ (10 − 1)
= 476.802,
9
9
and
F =
SST/(2 − 1)
45
=
= 1.698818.
SSE/(20 − 2)
26.489
(c)
2
T −F =
−3
√
5.2978
2
−
45
9 · 26.489 − 45 · 5.2978
=
= 0.
26.489
5.2978 · 26.489
5
16. (a) The p-value is 3.618 × 10−5 < 0.001, so we can conclude that not all
classes have the same mean scores at level 0.001.
(b) The number of classes is k: the number of treatment levels. The
degrees of freedom for SST is k − 1 = 2, so the number of classes is
k = 2 + 1 = 3.
(c) The degrees of freedom for SSE is n − k = 36, so the total number of
students is n = 36 + 3 = 39.
18. Note that we have
5 = µ + α1 + β1 + r1,1 ,
(1)
7 = µ + α1 + β2 + r1,2 ,
(2)
3 = µ + α1 + β3 + r1,3 ,
(3)
4 = µ + α2 + β1 + r2,1 ,
(4)
5 = µ + α2 + β2 + r2,2 ,
(5)
3 = µ + α2 + β3 + r2,3 ,
(6)
and
where
α1 + α2 = 0 = β1 + β2 + β3
and
r1,1 + r1,2 + r1,3 = 0 = r2,1 + r2,2 + r2,3 .
Adding up (1) – (6) gives 5 + 7 + 3 + 4 + 5 + 3 = 6µ, so
5+7+3+4+5+3
= 4.5.
6
Adding up (1) – (3) and (4) – (6) respectively gives us 5 + 7 + 3 = 3µ + 3α1
and 4 + 5 + 3 = 3µ + 3α2 , so
µ=
α1 =
5+7+3
− µ = 5 − 4.5 = 0.5
3
and
4+5+3
− µ = 4 − 4.5 = −0.5.
3
Similarly, from (1) + (4), (2) + (5) and (3) + (6), we have
α2 =
5+4
− µ = 4.5 − 4.5 = 0,
2
7+5
β2 =
− µ = 6 − 4.5 = 1.5,
2
β1 =
and
3+3
− µ = 3 − 4.5 = −1.5.
2
Plug in (1) – (6) the values of µ, α1 , α2 , β1 , β2 , and β3 , we have
β3 =
γ1,1 = 5 − (4.5 + 0.5 + 0) = 0,
γ1,2 = 7 − (4.5 + 0.5 + 1.5) = 0.5,
γ1,3 = 3 − (4.5 + 0.5 − 1.5) = −0.5,
γ2,1 = 4 − (4.5 − 0.5 + 0) = 0,
γ2,2 = 5 − (4.5 − 0.5 + 1.5) = −0.5,
and
γ2,3 = 3 − (4.5 − 0.5 − 1.5) = 0.5.
6
19. (a) Let α1 , α2 , α3 and α4 be the mean salaries for students of majors
accounting, administration, finance and marketing respectively. The
testing problem is
H0 : α1 = α2 = α3 = α4 v.s. H1 : α1 , α2 , α3 and α4 , are not all the same.
The grand mean (total mean) is
(2.8 + 3.0 + 2.7 + 2.5 + 2.6 + 2.8 + 2.7 + 2.8 + 3.0 + 2.8 + 2.7 + 2.5) × 6
12 × 6
32.9
≈ 2.7417,
12
=
and the sample means for the four major groups are
X̄1. =
8.5
6(2.8 + 3.0 + 2.7)
=
≈ 2.8333,
6×3
3
X̄2. =
6(2.5 + 2.6 + 2.8)
7.9
=
≈ 2.6333,
6×3
3
X̄3. =
6(2.7 + 2.8 + 3.0)
8.5
=
≈ 2.8333,
6×3
3
and
8
6(2.8 + 2.7 + 2.5)
= ≈ 2.6667.
6×3
3
Let Factor A be the major factor, then
X̄4. =
SSA
=
3 × 6 × [(8.5/3 − 32.9/12)2 + ((7.9/3 − 32.9/12)2
+(8.5/3 − 32.9/12)2 + (8/3 − 32.9/12)2 ]
=
0.615.
Since
SSE
=
(6 − 1) × (0.3 + 0.3 + 0.3 + 0.4 + 0.5 + 0.5 + 0.3 + 0.4 + 0.4 + 0.4 + 0.3 + 0.3)
=
22,
F
=
=
SSA/(4 − 1)
SSE/(4 × 3 × 6 − 4 × 3)
0.615/3
≈ 0.5591 < f0.05,3,60 = 2.76
22/60
and we cannot reject H0 at level 0.05. That is, we cannot conclude
that the mean salaries for the four majors are not all the same at
level 0.05.
The same conclusion can be reached based on the p-value for the test.
The p-value is
0.615/3
≈ 0.6441 > 0.05.
P F (3, 60) >
22/60
7
(b) Let β1 , β2 , and β3 be the mean salaries for students with graduation
years 2003, 2004 and 2005 respectively. The testing problem is
H0 : β1 = β2 = β3 v.s. H1 : β1 , β2 , and β3 , are not all the same.
The sample means for the three year groups are
X̄.1 =
X̄.2 =
6(2.8 + 2.5 + 2.7 + 2.8)
= 2.7,
6×4
6(3.0 + 2.6 + 2.8 + 2.7)
= 2.775
6×4
and
6(2.7 + 2.8 + 3.0 + 2.5)
= 2.75.
6×4
Let Factor B be the year factor, then
X̄.3 =
SSB
=
4 × 6 × [(2.7 − 32.9/12)2 + (2.775 − 32.9/12)2
+(2.75 − 32.9/12)2 ]
=
0.07
F
=
and
=
SSB/(3 − 1)
SSE/(4 × 3 × 6 − 4 × 3)
0.07/2
≈ 0.0955 < f0.05,2,60 = 3.15,
22/60
so we can not reject H0 at the 0.05 significance level. That is, we
cannot conclude that the mean salaries for the three graduation years
are not all the same at the 0.05 level.
The same conclusion can be reached based on the p-value for the test.
The p-value is
0.07/2
P F (2, 60) >
≈ 0.9091 > 0.05.
22/60
20. (a) Let α1 , α2 , α3 and α4 be the mean salaries for students of majors
accounting, administration, finance and marketing respectively. The
testing problem is
H0 : α1 = α2 = α3 = α4 v.s. H1 : α1 , α2 , α3 and α4 , are not all the same.
The grand mean (total mean) is
=
2.8 + 3.0 + 2.7 + 2.5 + 2.6 + 2.8 + 2.7 + 2.8 + 3.0 + 2.8 + 2.7 + 2.5
12
32.9
≈ 2.7417,
12
and the sample means for the four major groups are
X̄1. =
2.8 + 3.0 + 2.7
8.5
=
≈ 2.8333,
3
3
8
2.5 + 2.6 + 2.8
7.9
=
≈ 2.6333,
3
3
2.7 + 2.8 + 3.0
8.5
X̄3. =
=
≈ 2.8333,
3
3
X̄2. =
and
8
2.8 + 2.7 + 2.5
= ≈ 2.6667.
3
3
Let Factor A be the major factor, then
X̄4. =
SSA
3 × [(8.5/3 − 32.9/12)2 + ((7.9/3 − 32.9/12)2
=
+(8.5/3 − 32.9/12)2 + (8/3 − 32.9/12)2 ]
=
0.1025.
The sample means for the three year groups are
X̄.1 =
X̄.2 =
2.8 + 2.5 + 2.7 + 2.8
= 2.7,
4
3.0 + 2.6 + 2.8 + 2.7
= 2.775
4
and
2.7 + 2.8 + 3.0 + 2.5
= 2.75.
4
Let Factor B be the year factor, then
X̄.3 =
SSB
=
4 × [(2.7 − 32.9/12)2 + (2.775 − 32.9/12)2
+(2.75 − 32.9/12)2 ]
=
0.07/6 ≈ 0.0117
and
SSE
=
SS total − SSA − SSB
=
[2.82 + 3.02 + 2.72 + 2.52 + 2.62 + 2.82 + 2.72 + 2.82 + 3.02 + 2.82 + 2.72 + 2.52 ]
−4 × 3 × (32.9/12)2 − 0.1025 − 0.07/6 = 0.175.
Thus
F
=
=
SSA/(4 − 1)
SSE/((4 − 1) × (3 − 1))
0.1025/3
≈ 1.1714 < f0.05,3,6 = 4.76
0.175/6
and we cannot reject H0 at level 0.05. That is, we cannot conclude
that the mean salaries for the four majors are not all the same at
level 0.05.
The same conclusion can be reached based on the p-value for the test.
The p-value is
0.1025/3
P F (3, 6) >
≈ 0.3958 > 0.05.
0.175/6
9
(b) From Part (a), the test statistic is
=
SSB/(3 − 1)
SSE/((4 − 1) × (3 − 1))
0.07/(6 × 2)
= 0.2 < f0.05,2,6 = 5.14,
0.175/6
so we cannot conclude that the mean salaries for the three graduation
years are not all the same at level 0.05. The same conclusion can be
reached based on the p-value for the test. The p-value is
P (F (2, 6) > 0.2) ≈ 0.8240 > 0.05.
21. We can test for the interaction effect between major and graduation year
for the data in Problem 19. From the calculation in Problem 19,
SSI
=
SS total − SSA − SSB − SSE
=
SSE + 6 × [2.82 + 3.02 + 2.72 + 2.52 + 2.62 + 2.82
+2.72 + 2.82 + 3.02 + 2.82 + 2.72 + 2.52 ]
−6 × 4 × 3 × (32.9/12)2 − 0.615 − 0.07 − SSE = 1.05
and
F
=
=
SSI/((4 − 1)(3 − 1))
SSE/(4 × 3 × 6 − 4 × 3)
1.05/6
≈ 0.4773 < f0.05,6,60 = 2.25,
22/60
so we cannot conclude that there is a significant major-graduation year
interaction effect on salary at the 0.05 level.
The same conclusion can be reached based on the p-value for the test. The
p-value is
1.05/6
P F (6, 60) >
≈ 0.8226 > 0.05.
22/60
23. (a) The p-value is 0.001223 > 0.001, so we cannot conclude that there
is a significant interaction effect between drug and exercise at level
0.001.
(b) The p-value is 2.2 × 10−16 < 0.001, so we can conclude that drug has
a significant effect on weight loss at level 0.001.
(c) The p-value is 7.027 × 10−5 < 0.001, so we can conclude that exercise
has a significant effect on weight loss at level 0.001.
(d) The number of degrees of freedom for SS due to drug is k − 1 = 4, so
the drug factor has k = 4 + 1 = 5 levels.
(e) The number of degrees of freedom for SS due to exercise is b − 1 = 3,
so the exercise factor has b = 3 + 1 = 4 levels.
(f) The number of degrees of freedom for SSE is n − k × b = 180, so the
total number of students is n = 180 + 5 × 4 = 200.
24. Let (i1 , i2 , i3 , i4 ) be a permutation for (1, 2, 3, 4), then we have the following results.
10
Cases
0 < Xi1 < Xi2 < Xi3 < Xi4
0 < −Xi1 < Xi2 < Xi3 < Xi4
0 < Xi1 < −Xi2 < Xi3 < Xi4
0 < Xi1 < Xi2 < −Xi3 < Xi4
0 < Xi1 < Xi2 < Xi3 < −Xi4
0 < −Xi1 < −Xi2 < Xi3 < Xi4
0 < −Xi1 < Xi2 < −Xi3 < Xi4
0 < −Xi1 < Xi2 < Xi3 < −Xi4
0 < Xi1 < −Xi2 < −Xi3 < Xi4
0 < Xi1 < −Xi2 < Xi3 < −Xi4
0 < Xi1 < Xi2 < −Xi3 < −Xi4
0 < −Xi1 < −Xi2 < −Xi3 < Xi4
0 < −Xi1 < −Xi2 < Xi3 < −Xi4
0 < −Xi1 < Xi2 < −Xi3 < −Xi4
0 < Xi1 < −Xi2 < −Xi3 < −Xi4
0 < −Xi1 < −Xi2 < −Xi3 < −Xi4
Probability
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
R+
10
9
8
7
6
7
6
5
5
4
3
4
3
2
1
0
The possible values for R+ are 0, 1, . . ., 10, and
1/16 if k ∈ {0, 1, 2, 8, 9, 10};
P (R+ = k) =
1/8
if k ∈ {3, 4, 5, 6, 7}.
Can we conclude that the median for the distribution of X1 from 0 at the
0.05 significance level
25. Let µ be the median for the distribution of X1 . For testing
H0 : µ = 0 v.s. H1 : µ 6= 0,
the signed rank test rejects H0 at level 0.05 if
min(R+ , R− ) ≤ 25,
when the sample size is 15. Since R− = 1+2+3 = 6, R+ = 15(15+1)/2−
6 = 114 and min(R+ , R− ) = 6 ≤ 25, we can conclude that the median for
the distribution ofX1 is different from 0 at the 0.05 significance level.
26. (a) Let µ be the population median rent. For testing
H0 : µ ≤ 9000 v.s. H1 : µ > 9000,
the signed rank test rejects H0 if R− ≤ C. The observed R− is 0, so
the p-value for the signed rank test is P (DSR (4) ≤ 0) = 0.0625 < 0.07
and we can conclude that the population median rent is more than
NT$9,000 per month at the 0.07 significance level.
(b) For testing
H0 : µ ≥ 22000 v.s. H1 : µ < 22000,
the signed rank test rejects H0 if R+ ≤ C. The observed R+ is 0, so
the p-value for the signed rank test is P (DSR (4) ≤ 0) = 0.0625 > 0.05
and we cannot conclude that the population median rent is less than
NT$22,000 per month at the 0.05 significance level.
11
(c) For testing
H0 : µ = 13000 v.s. H1 : µ 6= 13000,
the signed rank test rejects H0 if min(R+ , R− ) ≤ C.
The observed min(R+ , R− ) = min(3, 7) = 3, so the p-value for the
signed rank test is 2P (DSR (4) ≤ 3) = 2 × (0.0625 + 0.1250 + 0.1875 +
0.3125) > 0.13 and we can conclude that the population median rent
is different from NT$13000 per month at the 0.13 significance level.
The same conclusion can be reached if the critical value is used. Note
that
P (DSR (4) ≤ 0) = 0.0625 < 0.013/2 < P (DSR (4) ≤ 1),
so the signed rank test rejects H0 at level 0.13 if min(R+ , R− ) ≤ 0.
The test does not reject H0 at level 0.13 based on the given data.
27. Let µ denote the population median weight loss. The testing problem of
interest is
H0 : µ ≤ 0 v.s. H1 : µ > 0.
(7)
For testing (7), the signed rank test rejects H0 at level 0.05 if R− ≤ 2. The
observed R− = 1+2 = 3 > 2, so we cannot conclude that the weight-losing
program is effective at the 0.05 significance level.
28. For every number t,
P (Y > t)
=
P (X + 1 > t)
=
P (X > t − 1) ≥ P (X > t),
so Y X. Note that
P (X + 1 > X) = 1
⇒
X + 1 and X do not have the same distribution (X + 1 6∼ X)
⇒ Y 6∼ X.
Since Y X and Y 6∼ X, Y X.
29. For testing
H0 : DB DA v.s. H1 : DB DA ,
the Wilcoxon rank-sum test rejects H0 if W : the rank sum for the lifetimes
of tires sold by Companies B is large. The observed W is 8+5+7+9 = 29.
The distribution of W under DB = DA can be approximated by N (µ, σ 2 ),
where
4(4 + 5 + 1)
= 20
µ=
2
and
4 × 5 × (4 + 5 + 1)
200
σ2 =
=
≈ 16.6667.
12
12
The observed Z statistic is
p
W − 20
p
= 9 12/200 ≈ 2.2045 > z0.05 = 1.645,
200/12
12
so the approximate test rejects H
p0 at level 0.05. The p-value for the
approximate test is P (N (0, 1) > 9 12/200), which is in the interval
(P (N (0, 1) > 2.21), P (N (0, 1) > 2.20))
=
(0.5 − 0.4864, 0.5 − 0.4861)
=
(0.0136, 0.0139),
so we can also see that the approximate test rejects H0 at level 0.05 based
on the range of the p-value.
We may also use the R command 1-pnorm(9*sqrt(12/200)) to evaluate
the p-value, which gives 0.01374317.
30. For the testing problem
H0 : µA = µB v.s. H1 : µA 6= µB ,
(8)
we still use the Z statistic in Problem 19. The approximate Wilcoxon test
rejects H0 if |Z| isplarge. The p-value for the approximate Wilcoxon test
is 2P (N (0, 1) > 9 12/200), which is in the range
(0.0136 × 2, 0.0139 × 2) = (0.0272, 0.0278).
Since the p-value is less than 0.05, we can conclude that µA and µB are
different at level 0.05.
31. (a) Run the R commands
x <- c(5.5,4.5,4,4.1,4.2); y <- c(6.3,5.1,5.6,6.5)
wilcox.test(x,y, alternative="two.sided")
and the p-value is 0.03175, which is larger than the p-value in Problem
30 (between 0.0272 and 0.0278). It is easier to reject the H0 in (8)
using the p-value in Problem 30.
(b) For testing
H0 : µB ≤ µA v.s. H1 : µB > µA
using the Wilcoxon rank-sum test, we run
x <- c(5.5,4.5,4,4.1,4.2); y <- c(6.3,5.1,5.6,6.5)
wilcox.test(y, x, alternative="greater")
or
x <- c(5.5,4.5,4,4.1,4.2); y <- c(6.3,5.1,5.6,6.5)
wilcox.test(x, y, alternative="less")
The p-value is 0.01587, which is larger than the p-value in Problem
29 (between 0.0136 and 0.0139). It is easier to reject H0 using the
p-value in Problem 29.
32. Let µ1 , µ2 and µ3 be the medians for the three score distributions. For
testing
H0 : µ1 = µ2 = µ3 v.s. H1 : µ1 , µ2 , µ3 are not all the same,
we use the Kruskal-Wallis rank sum test. The observed H test statistic is
12 × [5 · (55/5)2 + 8 · (36/8)2 + 7 · (119/7)2 ]/(20 · 21) − 3 · 21 ≈ 16.71429.
Run the R command 1-pchisq(16.71429, df=2) and we have that the
p-value is 0.0002347135 < 0.05, so we can conclude that the population
median scores for the 3 classes are not all the same at level 0.05.
13
33. (a) We will first compute E(U1 ) and E(V1 ). To find E(U1 ), note that
E(R̄1 )
1
(E(R1,1 ) + · · · + E(R1,n1 ))
n1
1 n+1
n+1
× n1 =
= R̄,
n1
2
2
=
=
so
√
E(U1 ) = p
n1
n(n + 1)/12
E(R̄1 ) − R̄ = 0.
To find E(V1 ), note that for i = 1, . . ., n,
E(Ȳi )
1
(E(Yi,1 ) + · · · + E(Yi,n1 ))
ni
1
(µi × ni ) = µi
ni
=
=
and
1
n1 Ȳ1 + · · · + nk Ȳk
E(Ȳ ) = E
n1 + · · · + nk
1
=
n1 E(Ȳ1 ) + · · · + nk E(Ȳk )
n1 + · · · + nk
n1 µ1 + · · · + nk µk
.
=
n
Therefore,
√
E(V1 ) =
n1
σ
µ1 −
n1 µ1 + · · · + nk µk
n
.
The statement E(V1 ) = 0 does not hold in general, but it holds when
µ1 = · · · = µk . Below we compute V ar(U1 ) and V ar(V1 ).
√
V ar(U1 )
=
=
=
=
=
=
=
!2
n1
V ar
p
1
n+1
(R1,1 + · · · + R1,n1 ) −
n1
2
n(n + 1)/12
!2
√
n1
1
p
V ar (R1,1 + · · · + R1,n1 )
n21
n(n + 1)/12


X
12

Cov(R1,i , R1,j )
n1 n(n + 1)
1≤i,j≤n1


n1
X
X
12

Cov(R1,i , R1,i ) +
Cov(R1,i , R1,j )
n1 n(n + 1) i=1
1≤i,j≤n1 ,i6=j


n
1
2
X
X
12
n
−
1
(n
+
1)


+
−
n1 n(n + 1) i=1 12
12
1≤i,j≤n1 ,i6=j
12
n1 (n2 − 1) (n + 1)(n21 − n1 )
−
n1 n(n + 1)
12
12
12
n1 (n + 1)
n − n1
n1
((n − 1) − (n1 − 1)) =
=1−
.
n1 n(n + 1)
12
n
n
14
To compute V ar(V1 ), note that
Ȳ1 − Ȳ
n1 Ȳ1 + · · · + nk Ȳk
= Ȳ1 −
n
n1 1
=
1−
Ȳ1 −
n2 Ȳ2 + · · · + nk Ȳk
n
n
and for i = 1, . . ., k,
1
(Yi,1 + · · · + Yi,ni )
V ar(Ȳi ) = V ar
ni
σ2
1
1
(V ar(Yi,1 + · · · + V ar(Yi,ni )) = 2 σ 2 × ni = (9),
=
2
ni
ni
ni
so
√
V ar(V1 )
=
=
=
=
=
=
=
n1
σ
2
V ar(Ȳ1 − Ȳ )
n1 1
n1
1−
V ar
Ȳ1 −
n2 Ȳ2 + · · · + nk Ȳk
σ2
n
n
n1
1
n1 2
2
2
V ar(Ȳ1 ) + 2 n2 V ar(Ȳ2 ) + · · · + nk V ar(Ȳk )
1−
σ2
n
n
2
2 2
n1 2 σ
1
n1
σ
σ
2
2
1−
+ 2 n2
+ · · · + nk
2
σ
n
n1
n
n2
nk
2
2 n1
2n1 σ 2
1
σ
σ
1−
+ 2 n21
+ · · · + n2k
2
σ
n
n1
n
n1
nk
2σ 2
σ 2 (n1 + · · · + nk )
n1 σ 2
−
+
σ 2 n1
n
n2
2
2
n1 σ
σ
n1
−
.
=1−
2
σ
n1
n
n
(b)
√
Cov(U1 , U2 )
=
=
=
=
=
=
n1
p
!
√
n2
!
n+1
n+1
Cov R̄1 −
, R̄2 −
2
2
p
n(n + 1)/12
n(n + 1)/12
√
12 n1 n2
Cov(R̄1 , R̄2 )
n(n + 1)
√
12 n1 n2 1
Cov (R1,1 + · · · + R1,n1 , R2,1 + · · · + R2,n2 )
n(n + 1) n1 n2


n1 X
n2
X
12

Cov(R1,i , R2,j )
√
n(n + 1) n1 n2 i=1 j=1


n2
n1 X
X
12
(n
+
1)


−
√
n(n + 1) n1 n2 i=1 j=1
12
√
n1 n2
12
n1 n2 (n + 1)
−
=−
.
√
n(n + 1) n1 n2
12
n
To compute Cov(V1 , V2 ), note that
1
n1 Ȳ1 − Ȳ = 1 −
Ȳ1 −
n2 Ȳ2 + · · · + nk Ȳk
n
n
15
and
Ȳ2 − Ȳ
=
=
n1 Ȳ1 + · · · + nk Ȳk
n
n2 1
1
Ȳ2 −
n3 Ȳ3 + · · · + nk Ȳk ,
− n1 Ȳ1 + 1 −
n
n
n
Ȳ2 −
so
Cov(Ȳ1 − Ȳ , Ȳ2 − Ȳ )
n1 1
1
= Cov
1−
Ȳ1 −
n2 Ȳ2 + · · · + nk Ȳk , − n1 Ȳ1
n
n
n
1
n2 n1
Ȳ1 −
n2 Ȳ2 + · · · + nk Ȳk , 1 −
Ȳ2
+Cov
1−
n
n
n
n1 1
1
+Cov
1−
Ȳ1 −
n2 Ȳ2 + · · · + nk Ȳk , − n3 Ȳ3 + · · · + nk Ȳk
n
n
n
n1
n2 n2 n1
1−
V ar(Ȳ1 ) −
1−
V ar(Ȳ2 )
= −
n n
n
n
1
+V ar − n3 Ȳ3 + · · · + nk Ȳk
n
X
k n1 σ 2
n2 n2 σ 2
n1
n i 2 σ 2
(9)
1−
−
1−
+
= −
−
n
n
n1
n
n
n2
n
ni
i=3
2
2 X
k n1 σ
n2 σ
ni 2 σ 2
= −
−
+
−
n n1
n n2
n
ni
i=1
Pk
σ 2 i=1 ni
σ2
2σ 2
+
=−
= −
2
n
n
n
and
√
Cov(V1 , V2 )
=
=
√ n1
n2
Cov(Ȳ1 − Ȳ , Ȳ2 − Ȳ )
σ
σ
√
√
n1 n2
n1 n2
σ2
−
=−
.
σ2
n
n
34. (a) Since the weight-loss distribution is not expected to be symmetric
about µ, we use the sign test for testing
H0 : µ ≤ 0 v.s. H1 : µ > 0.
5 of the 6 participants lost weights, so S+ = 5. The p-value is
P (Bin(6, 0.5) ≥ 5) = 0.109375 > 0.05, so we cannot conclude µ > 0
at level 0.05.
(b) For testing
H0 : µ= 0 v.s. H1 : µ 6= 0
using the sign test, we reject H0 if max(S+ , S− ) is large. The observed (S+ , S− ) is (5, 1), so the observed max(S+ , S− ) = 5. The
p-value is 2P (Bin(6, 0.5) ≥ 5) = 2 × 0.109375 = 0.21875 > 0.05, so
we cannot conclude that µ 6= 0 at the 0.05 significance level.
16
35. Let µ be the median weight loss, then most people loss weight after finishing the program means that µ > 0. For testing
H0 : µ ≤ 0 v.s. H1 : µ > 0
using the sign test, we reject H0 if S+ is large. The observed S+ = 65 and
the p-value is
81 81
,
≥ 65 − 0.5
P (Bin(81, 0.5) ≥ 65) ≈ P N
2 4
= P (N (0, 1) > 16/3)
<
P (N (0, 1) > 3.09) = 0.001 < 0.05,
so we conclude that most people lose weight after finishing the program
at the 0.05 significance level.
17