AMS572.01 Practice Final Exam Fall, 2013

AMS572.01
Practice Final Exam
Fall, 2013
Name ___________________________________ID ______________________Signature________________________
Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide
complete solutions for full credit. The exam goes from 11:15am - 1:45pm. Good luck!
1. The following table gives the amount of additive (x) and the reduction in nitrogen oxides (y) in 7 cars.
Amount of additive (x)
1
2
3
4
5
6
7
Reduction in nitrogen oxide (y)
2.5
3.1
3.8
3.2
3.9
4.4
4.8
(a) Find the least squares regression line.
(b) Test at Ξ± = 0.05 whether there is a significant linear relationship between these two variables.
(c) What percentage of variation in nitrogen oxide is explained by the amount of additive?
(d) Please write up the entire SAS code necessary to answer questions (a), (b), (c) above.
Solution: This is a simple linear regression problem.
(a) 𝑛 = 7, π‘₯Μ… = 4, 𝑦̅ = 3.67
𝑆π‘₯𝑦 = βˆ‘ π‘₯𝑦 βˆ’ 𝑛π‘₯Μ… 𝑦̅ = 112.4 βˆ’ 7 βˆ— 4 βˆ— 3.67 = 9.6
𝑆π‘₯π‘₯ = βˆ‘ π‘₯ 2 βˆ’ 𝑛π‘₯Μ… 2 = 140 βˆ’ 7 βˆ— 42 = 28
𝑆𝑦𝑦 = βˆ‘ 𝑦 2 βˆ’ 𝑛𝑦̅ 2 = 98.15 βˆ’ 7 βˆ— 3.672 = 3.79
𝑆π‘₯𝑦 9.6
𝛽̂1 =
=
= 0.343
𝑆π‘₯π‘₯ 28
𝛽̂0 = 𝑦̅ βˆ’ 𝛽̂1 π‘₯Μ… = 3.67 βˆ’ 0.343 βˆ— 4 = 2.298
The fitted least square regression line is:
𝑦̂ = 2.298 + 0.343π‘₯
(b) The mean square error estimate of Οƒ is:
𝑆𝑦𝑦 βˆ’ 𝛽̂12 𝑆π‘₯π‘₯
𝑆𝑆𝐸
𝑆𝑆𝑇 βˆ’ 𝑆𝑆𝑅
3.79 βˆ’ 0.3432 βˆ— 28
Μ‚ = βˆšπ‘€π‘†πΈ = √
Οƒ
=√
=√
=√
= 0.315
π‘›βˆ’2
π‘›βˆ’2
π‘›βˆ’2
7βˆ’2
The hypotheses are: 𝐻0 : 𝛽1 = 0 versus π»π‘Ž : 𝛽1 β‰  0
Test statistic:
𝛽̂1 βˆ’ 0
𝛽̂1
0.343
𝑑0 =
=
=
= 5.76 > 𝑑5,0.025 = 2.571
0.315
Μ‚
Οƒ
SE(𝛽̂1 )
√28
βˆšπ‘†π‘₯π‘₯
Therefore we reject the null hypothesis at Ξ± = 0.05 and conclude that there is a significant linear
relationship between these two variables.
(c)
𝑆π‘₯𝑦 2
9.62
𝑅 =
=
= 0.8684
𝑆π‘₯π‘₯ 𝑆𝑦𝑦 28 βˆ— 3.79
Therefore we claim that 86.84% of variation in nitrogen oxide is explained by the amount of additive.
2
(d)
Data nitro_ox;
input x y;
datalines;
1 2.5
2 3.1
3 3.8
1
4 3,2
5 3.9
6 4.4
7 4.8
;
run;
proc reg data = nitro_ox;
model y = x;
run;
2. Based on interviews of couples seeking divorces, a social worker compiles the following data related to the
period of acquaintanceship before marriage and the duration of marriage:
Acquaintanceship
Duration of marriage:
before marriage
≀ 4 years
> 4 years
Under 0.5 years
11 (10)
8 (9)
0.5 – 1.5 years
28 (28)
24 (24)
Over 1.5 years
21 (22)
19 (18)
(a) Perform a test to determine if there is a relationship between the period of acquaintanceship before
marriage and the duration of marriage. Use Ξ± = 0.05.
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a two-way contingency table problem.
(a) We are performing a test for independence (multinomial sampling).
H0 : Ο€ij = Ο€i. βˆ— Ο€.j , for all i, j
Ha : the above is not true
Let
ni. βˆ— n.j
eij =
, for all i, j
nβˆ™βˆ™
(Note: for simplicity, I rounded the expected values to integers – but in reality, one does not need to do so.)
The test statistic is:
π‘Ÿ
πœ’02
𝑐
2
3
2
2
(𝑛𝑖𝑗 βˆ’π‘’π‘–π‘— )
(𝑛𝑖𝑗 βˆ’π‘’π‘–π‘— )
1
1 1 1
2
= βˆ‘βˆ‘
= βˆ‘βˆ‘
=
+
+ +
= 0.312 < πœ’2,0.05
= 5.991
𝑒𝑖𝑗
𝑒𝑖𝑗
10 22 9 18
𝑖=1 𝑗=1
𝑖=1 𝑗=1
We could not reject the null hypothesis and conclude that we do not have enough evidence to show any
relationship between the period of acquaintanceship before marriage and the duration of marriage.
(b)
data marriage;
input acquint $ duration $ number;
datalines;
short le4 11
short gt4 8
med le4 28
med gt4 24
long le4 21
long gt4 19
;
run;
proc freq data=marriage;
weight number;
2
tables acquint*duration / chisq ;
run;
3. The following table records the observed number of births at a hospital in four consecutive quarterly periods.
Quarters
Jan-Mar
Apr-June
July-Sept
Oct-Dec
Number of births
110
57
53
80
(a) It is conjectured that twice as many babies are born during the Jan-Mar quarter than are born in any of
the other three quarters. At Ξ± = 0.05, test if these data strongly contradict the stated conjecture.
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a one-way contingency table problem.
(a) We are performing a Chi-square goodness of fit test.
H0 : PJM = 0.4, PAJ = PJP = POD = 0.2
Ha : the above is not true
The test statistic is:
(110 βˆ’ 300 βˆ— 0.4)2 (57 βˆ’ 300 βˆ— 0.2)2 (53 βˆ’ 300 βˆ— 0.2)2 (80 βˆ’ 300 βˆ— 0.2)2
2
πœ’02 =
+
+
+
= 8.47 > πœ’3,0.05
300 βˆ— 0.4
300 βˆ— 0.2
300 βˆ— 0.2
300 βˆ— 0.2
= 7.815
We reject the null hypothesis and conclude that these data strongly contradict the stated conjecture.
.(b)
DATA BIRTH;
INPUT QUARTER $ NUMBER;
DATALINES;
Jan-Mar 110
Apr-Jun 57
Jul-Sep 53
Oct-Dec 80
;
* HYPOTHESIZING A 2:1:1:1 RATIO;
PROC FREQ DATA=BIRTH ORDER=DATA; WEIGHT NUMBER;
TITLE3 'GOODNESS OF FIT ANALYSIS';
TABLES QUARTER / CHISQ NOCUM TESTP=(0.4 0.2 0.2 0.2);
RUN;
4. Suppose the National Transportation Safety Board (NTSB) wants to examine the safety of compact cars,
midsize cars, and full-size cars. It collects a sample of three for each of the cars types.
Compact
Midsize cars
Full-size cars
643
469
484
655
427
456
702
525
402
(a) Using the hypothetical data provided below, test at Ξ± = 0.05 whether the mean pressure applied to the
driver’s head during a crash test is equal for each types of car. What assumptions are necessary for your
test?
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a one-way ANOVA problem with 3 independent samples.
(a) We need to perform an ANOVA F-test. The first assumption is that all three populations are normal. The
second is that all three population variances are unknown but equal.
H0 : ΞΌ1 = ΞΌ2 = ΞΌ3
3
Ha : the above is not true
Source
Car Type
Error
Total
SS
86049.55
10254
96303.55
Analysis of Variance
d.f.
2
6
8
MS
43024.78
1709
F
25.17
Since F0 = 25.17 > F2,6,0.05 = 5.14, we reject the null hypothesis, and claim that the mean pressures
applied to the driver’s head during a crash test are NOT all equal for these three types of car.
.(b)
data car;
input type $ pressure;
datalines;
;
Compact 643
Compact 655
Compact 702
Midsize 469
Midsize 427
Midsize 525
Fulsize 484
Fulsize 456
Fulsize 402
run;
proc anova data = car;
class type;
model pressure = type;
run;
5. The length of time to recovery was recorded for patients randomly assigned and subjected to two different
surgical procedures. The data (recorded in days) are as follows:
Procedure 1
Procedure 2
Sample mean
7.3
8.9
Sample variance
1.23
1.49
Sample size
11
13
(a) Test at Ξ± = 0.01 whether the data present sufficient evidence to indicate a difference between the mean
recovery times for the two surgical procedures. What assumptions are necessary? Test the assumptions
necessary if you can.
(b) Please derive the corresponding general test using the pivotal quantity method. Please derive the pivotal
quantity and its distribution, list the test statistic, and derive the rejection region for a 2-sided test at the
significance level ofΞ±.
(c) (extra credit) Please derive the general test using the likelihood ratio test method. Prove whether this test is
equivalent to the one derived using the pivotal quantity method in part (b).
Solution:
(a) Inference on two population means. Two small and independent samples.
Procedure 1: 𝑋̅ = 7.3, 𝑆12 = 1.23, 𝑛1 = 11
Procedure 2: π‘ŒΜ… = 8.9, 𝑆22 = 1.49, 𝑛2 = 13
4
2
2
[1] Under the normality assumption, we first test if the two population variances are equal. That is, H 0 :  1 ο€½  2 versus
H a :  22 ο€Ύ  12 . The test statistic is
𝑆22 1.49
=
= 1.21 < 𝐹12,10,0.05,π‘ˆ = 2.91
𝑆12 1.23
2
2
We cannot reject H0 -- it is reasonable to assume that  1 ο€½  2 .
𝐹0 =
[2] This is inference on two population means, independent samples. The first assumption is that both
populations are normal. The second is the equal variance assumption which we have checked in (a) [1].
Now we perform the pooled-variance t-test with hypotheses H 0 : 1 ο€­  2 ο€½ 0 versus H a : 1 ο€­ 2 ο‚Ή 0
t0 ο€½
X ο€­ Y  ο€­ 0
S p 1 / n1  1 / n2
ο€½
7.3 ο€­ 8.9 ο€­ 0
1.37 1 / 11  1 / 13
ο€½ ο€­3.33
Since |t 0 | = 3.33 > t 22,0.005 = 2.819, we reject H0 and conclude that the data present sufficient evidence to
indicate a difference between the mean recovery times for the two surgical procedures at the significance level
of 0.01.
(b) Derivation of the pooled-variance t-test (2-sided test) using the pivotal quanity approach
Suppose we have two independent random samples from two normal populations: X 1 , X 2 ,
and Y1 , Y2 ,
, X n1 ~ N  1 ,  2  ,
, Yn2 ~ N  2 ,  2  . Here is a simple outline of the derivation of the test: H 0 : 1 ο€­  2 ο€½ 0 versus
H a : 1 ο€­  2 ο‚Ή 0 using the pivotal quantity approach.
X ο€­ Y  . Its distribution is
N  ο€­  ,  1 / n  1 / n  using the mgf for N  ,   which is M t  ο€½ exp t   t / 2 , and the independence
X ο€­ Y  ο€­  ο€­   ~ N 0,1 . Unfortunately, Z can not serve
properties of the random samples. From this we have Z ο€½
[1]. We start with the point estimator for the parameter of interest 1 ο€­  2  :
2
1
2
2
1
2 2
2
1
2
 1 / n1  1 / n2
as the pivotal quantity because Οƒ is unknown.
[2]. We next look for a way to get rid of the unknown Οƒ following a similar approach in the construction of the pooled2
variance t-statistic. We found that W ο€½ n1 ο€­ 1S12  n2 ο€­ 1S 22 /  2 ~  n21 n2 ο€­2 using the mgf for  k which is

1οƒΆ
M t  ο€½  οƒ·
 2t οƒΈ

k/2
, and the independence properties of the random samples.
[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the
random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our
pivotal quantity: T ο€½
X ο€­ Y  ο€­ 
1
ο€­ 2 
S p 1 / n1  1 / n2
~ t n1  n2 ο€­2 , where S p2 ο€½


n1 ο€­ 1S12  n2 ο€­ 1S 22
n1  n2 ο€­ 2
[4]. The rejection region is derived from P T0 ο‚³ c | H 0 ο€½  , where T0 ο€½
is the pooled sample variance.
X ο€­ Y  ο€­ 0
S p 1 / n1  1 / n2
H0
~ t n1  n2 ο€­2 . Thus
c ο€½ t n1  n2 ο€­ 2, / 2 . Therefore at the significance level of Ξ±, we reject H 0 in favor of H a iff T0 ο‚³ t n1 n2 ο€­2, / 2
(c) Derivation of the pooled-variance t-test (2-sided test) using the likelihood ratio test approach
Given that we have two independent random samples from two normal populations with equal but unknown
variances. Now we derive the likelihood ratio test for:
5
H0 : ΞΌ1 = ΞΌ2 vs Ha : ΞΌ1 β‰  ΞΌ2
Let ΞΌ1 = ΞΌ2 = ΞΌ, then,
={βˆ’βˆž < ΞΌ1 = ΞΌ2 = ΞΌ < +∞, 0 ≀ Οƒ2 < +∞}, Ξ© = {βˆ’βˆž < ΞΌ1 , ΞΌ2 < +∞, 0 < Οƒ2 < +∞}
1
n1 +n2
2
L(Ο‰) = L(ΞΌ, Οƒ2 ) = (2πσ2 )
lnL(Ο‰) = βˆ’
n1 +n2
2
2
1
n1
2
(xi βˆ’ ΞΌ)2 + βˆ‘nj=1
exp[βˆ’ 2Οƒ2 (βˆ‘i=1
(yj βˆ’ ΞΌ) )], and there are two parameters .
2
1
n1
2
(xi βˆ’ ΞΌ)2 + βˆ‘nj=1
ln(2πσ2 ) βˆ’ 2Οƒ2 (βˆ‘i=1
(yj βˆ’ ΞΌ) ), for it contains two parameters, we do the
partial derivatives with
and Οƒ2 respectively and let the partial derivatives equal to 0. Then we have:
n2
1
βˆ‘ni=1
xi + βˆ‘j=1
yj n1 xΜ… + n2 yΜ…
ΞΌΜ‚ =
=
n1 + n2
n1 + n2
2 =
ΟƒΜ‚
Ο‰
1
n1 +n2
n1
n2
1
2
[βˆ‘ (xi βˆ’ ΞΌΜ‚)2 + βˆ‘ (yj βˆ’ ΞΌΜ‚) ]
n1 + n2
i=1
j=1
2
1
n1
2
(xi βˆ’ ΞΌ1 )2 + βˆ‘nj=1
L(Ξ©) = L(ΞΌ1 , ΞΌ2 , Οƒ2 ) = (2πσ2 ) 2 exp[βˆ’ 2Οƒ2 (βˆ‘i=1
(yj βˆ’ ΞΌ2 ) )], and there are three
parameters.
n1
n2
n1 + n2
1
2
lnL(Ξ©) = βˆ’
ln(2πσ2 ) βˆ’ 2 (βˆ‘ (xi βˆ’ ΞΌ1 )2 + βˆ‘ (yj βˆ’ ΞΌ2 ) )
2
2Οƒ
i=1
j=1
2
We do the partial derivatives with ΞΌ1 , ΞΌ2 and Οƒ respectively and let them all equal to 0. Then we have:
n1
n2
1
2
ΞΌ
Μ‚1 = xΜ…, ΞΌ
Μ‚2 = yΜ…, ΟƒΜ‚2Ξ© =
[βˆ‘ (xi βˆ’ xΜ…)2 + βˆ‘ (yj βˆ’ yΜ…) ]
n1 + n2
i=1
j=1
At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we
have:
n1 +n2
2
L(Ο‰
Μ‚)
Ξ»=
=
Μ‚)
L(Ξ©
1
( Μ‚
)
2πσ2Ο‰
n1 +n2
2
1
( Μ‚2 )
2πσΩ
ΟƒΜ‚2
= [ Ξ©]
2
ΟƒΜ‚
Ο‰
1
2
βˆ‘ni=1
(xi βˆ’ xΜ…)2 + βˆ‘nj=1
(yj βˆ’ yΜ…)
2
n1 +n2
2
n1 +n2
2
=[
]
n1 xΜ… + n2 yΜ… 2
n1 xΜ… + n2 yΜ… 2
n1
n2
βˆ‘i=1 (xi βˆ’
βˆ‘
n1 + n2 ) + j=1 (yj βˆ’ n1 + n2 )
n1 +n2
t 20
]βˆ’ 2
n1 + n2 βˆ’ 2
where t 0 is the test statistic in the pooled variance t-test. Therefore, Ξ» ≀ Ξ»βˆ— is equivalent to |t 0 |β‰₯ c. Thus at the
significance levelΞ±, we reject the null hypothesis in favor of the alternative when |t 0 | β‰₯ c = t n1 +n2 βˆ’2,Ξ±/2. This
test is identical to the test we have derived in part (b).
= [1 +
6.
People at high risk of sudden cardiac death can be identified using the change in a signal averaged
electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The
method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave
correct results on 46 patients.
(a) Is this convincing evidence that the new method is more accurate? Please test at Ξ± =.05.
6
(b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the
new method is better at Ξ± =.05?
(c) How many patients should be tested in order for this power to be at least 0.75?
Solution:
7