AMS572.01
Practice Final Exam
Fall, 2013
Name ___________________________________ID ______________________Signature________________________
Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide
complete solutions for full credit. The exam goes from 11:15am - 1:45pm. Good luck!
1. The following table gives the amount of additive (x) and the reduction in nitrogen oxides (y) in 7 cars.
Amount of additive (x)
1
2
3
4
5
6
7
Reduction in nitrogen oxide (y)
2.5
3.1
3.8
3.2
3.9
4.4
4.8
(a) Find the least squares regression line.
(b) Test at Ξ± = 0.05 whether there is a significant linear relationship between these two variables.
(c) What percentage of variation in nitrogen oxide is explained by the amount of additive?
(d) Please write up the entire SAS code necessary to answer questions (a), (b), (c) above.
Solution: This is a simple linear regression problem.
(a) π = 7, π₯Μ
= 4, π¦Μ
= 3.67
ππ₯π¦ = β π₯π¦ β ππ₯Μ
π¦Μ
= 112.4 β 7 β 4 β 3.67 = 9.6
ππ₯π₯ = β π₯ 2 β ππ₯Μ
2 = 140 β 7 β 42 = 28
ππ¦π¦ = β π¦ 2 β ππ¦Μ
2 = 98.15 β 7 β 3.672 = 3.79
ππ₯π¦ 9.6
π½Μ1 =
=
= 0.343
ππ₯π₯ 28
π½Μ0 = π¦Μ
β π½Μ1 π₯Μ
= 3.67 β 0.343 β 4 = 2.298
The fitted least square regression line is:
π¦Μ = 2.298 + 0.343π₯
(b) The mean square error estimate of Ο is:
ππ¦π¦ β π½Μ12 ππ₯π₯
πππΈ
πππ β πππ
3.79 β 0.3432 β 28
Μ = βπππΈ = β
Ο
=β
=β
=β
= 0.315
πβ2
πβ2
πβ2
7β2
The hypotheses are: π»0 : π½1 = 0 versus π»π : π½1 β 0
Test statistic:
π½Μ1 β 0
π½Μ1
0.343
π‘0 =
=
=
= 5.76 > π‘5,0.025 = 2.571
0.315
Μ
Ο
SE(π½Μ1 )
β28
βππ₯π₯
Therefore we reject the null hypothesis at Ξ± = 0.05 and conclude that there is a significant linear
relationship between these two variables.
(c)
ππ₯π¦ 2
9.62
π
=
=
= 0.8684
ππ₯π₯ ππ¦π¦ 28 β 3.79
Therefore we claim that 86.84% of variation in nitrogen oxide is explained by the amount of additive.
2
(d)
Data nitro_ox;
input x y;
datalines;
1 2.5
2 3.1
3 3.8
1
4 3,2
5 3.9
6 4.4
7 4.8
;
run;
proc reg data = nitro_ox;
model y = x;
run;
2. Based on interviews of couples seeking divorces, a social worker compiles the following data related to the
period of acquaintanceship before marriage and the duration of marriage:
Acquaintanceship
Duration of marriage:
before marriage
β€ 4 years
> 4 years
Under 0.5 years
11 (10)
8 (9)
0.5 β 1.5 years
28 (28)
24 (24)
Over 1.5 years
21 (22)
19 (18)
(a) Perform a test to determine if there is a relationship between the period of acquaintanceship before
marriage and the duration of marriage. Use Ξ± = 0.05.
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a two-way contingency table problem.
(a) We are performing a test for independence (multinomial sampling).
H0 : Οij = Οi. β Ο.j , for all i, j
Ha : the above is not true
Let
ni. β n.j
eij =
, for all i, j
nββ
(Note: for simplicity, I rounded the expected values to integers β but in reality, one does not need to do so.)
The test statistic is:
π
π02
π
2
3
2
2
(πππ βπππ )
(πππ βπππ )
1
1 1 1
2
= ββ
= ββ
=
+
+ +
= 0.312 < π2,0.05
= 5.991
πππ
πππ
10 22 9 18
π=1 π=1
π=1 π=1
We could not reject the null hypothesis and conclude that we do not have enough evidence to show any
relationship between the period of acquaintanceship before marriage and the duration of marriage.
(b)
data marriage;
input acquint $ duration $ number;
datalines;
short le4 11
short gt4 8
med le4 28
med gt4 24
long le4 21
long gt4 19
;
run;
proc freq data=marriage;
weight number;
2
tables acquint*duration / chisq ;
run;
3. The following table records the observed number of births at a hospital in four consecutive quarterly periods.
Quarters
Jan-Mar
Apr-June
July-Sept
Oct-Dec
Number of births
110
57
53
80
(a) It is conjectured that twice as many babies are born during the Jan-Mar quarter than are born in any of
the other three quarters. At Ξ± = 0.05, test if these data strongly contradict the stated conjecture.
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a one-way contingency table problem.
(a) We are performing a Chi-square goodness of fit test.
H0 : PJM = 0.4, PAJ = PJP = POD = 0.2
Ha : the above is not true
The test statistic is:
(110 β 300 β 0.4)2 (57 β 300 β 0.2)2 (53 β 300 β 0.2)2 (80 β 300 β 0.2)2
2
π02 =
+
+
+
= 8.47 > π3,0.05
300 β 0.4
300 β 0.2
300 β 0.2
300 β 0.2
= 7.815
We reject the null hypothesis and conclude that these data strongly contradict the stated conjecture.
.(b)
DATA BIRTH;
INPUT QUARTER $ NUMBER;
DATALINES;
Jan-Mar 110
Apr-Jun 57
Jul-Sep 53
Oct-Dec 80
;
* HYPOTHESIZING A 2:1:1:1 RATIO;
PROC FREQ DATA=BIRTH ORDER=DATA; WEIGHT NUMBER;
TITLE3 'GOODNESS OF FIT ANALYSIS';
TABLES QUARTER / CHISQ NOCUM TESTP=(0.4 0.2 0.2 0.2);
RUN;
4. Suppose the National Transportation Safety Board (NTSB) wants to examine the safety of compact cars,
midsize cars, and full-size cars. It collects a sample of three for each of the cars types.
Compact
Midsize cars
Full-size cars
643
469
484
655
427
456
702
525
402
(a) Using the hypothetical data provided below, test at Ξ± = 0.05 whether the mean pressure applied to the
driverβs head during a crash test is equal for each types of car. What assumptions are necessary for your
test?
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a one-way ANOVA problem with 3 independent samples.
(a) We need to perform an ANOVA F-test. The first assumption is that all three populations are normal. The
second is that all three population variances are unknown but equal.
H0 : ΞΌ1 = ΞΌ2 = ΞΌ3
3
Ha : the above is not true
Source
Car Type
Error
Total
SS
86049.55
10254
96303.55
Analysis of Variance
d.f.
2
6
8
MS
43024.78
1709
F
25.17
Since F0 = 25.17 > F2,6,0.05 = 5.14, we reject the null hypothesis, and claim that the mean pressures
applied to the driverβs head during a crash test are NOT all equal for these three types of car.
.(b)
data car;
input type $ pressure;
datalines;
;
Compact 643
Compact 655
Compact 702
Midsize 469
Midsize 427
Midsize 525
Fulsize 484
Fulsize 456
Fulsize 402
run;
proc anova data = car;
class type;
model pressure = type;
run;
5. The length of time to recovery was recorded for patients randomly assigned and subjected to two different
surgical procedures. The data (recorded in days) are as follows:
Procedure 1
Procedure 2
Sample mean
7.3
8.9
Sample variance
1.23
1.49
Sample size
11
13
(a) Test at Ξ± = 0.01 whether the data present sufficient evidence to indicate a difference between the mean
recovery times for the two surgical procedures. What assumptions are necessary? Test the assumptions
necessary if you can.
(b) Please derive the corresponding general test using the pivotal quantity method. Please derive the pivotal
quantity and its distribution, list the test statistic, and derive the rejection region for a 2-sided test at the
significance level ofΞ±.
(c) (extra credit) Please derive the general test using the likelihood ratio test method. Prove whether this test is
equivalent to the one derived using the pivotal quantity method in part (b).
Solution:
(a) Inference on two population means. Two small and independent samples.
Procedure 1: πΜ
= 7.3, π12 = 1.23, π1 = 11
Procedure 2: πΜ
= 8.9, π22 = 1.49, π2 = 13
4
2
2
[1] Under the normality assumption, we first test if the two population variances are equal. That is, H 0 : ο³ 1 ο½ ο³ 2 versus
H a : ο³ 22 οΎ ο³ 12 . The test statistic is
π22 1.49
=
= 1.21 < πΉ12,10,0.05,π = 2.91
π12 1.23
2
2
We cannot reject H0 -- it is reasonable to assume that ο³ 1 ο½ ο³ 2 .
πΉ0 =
[2] This is inference on two population means, independent samples. The first assumption is that both
populations are normal. The second is the equal variance assumption which we have checked in (a) [1].
Now we perform the pooled-variance t-test with hypotheses H 0 : ο1 ο ο 2 ο½ 0 versus H a : ο1 ο ο2 οΉ 0
t0 ο½
ο¨X ο Y ο© ο 0
S p 1 / n1 ο« 1 / n2
ο½
ο¨7.3 ο 8.9ο© ο 0
1.37 1 / 11 ο« 1 / 13
ο½ ο3.33
Since |t 0 | = 3.33 > t 22,0.005 = 2.819, we reject H0 and conclude that the data present sufficient evidence to
indicate a difference between the mean recovery times for the two surgical procedures at the significance level
of 0.01.
(b) Derivation of the pooled-variance t-test (2-sided test) using the pivotal quanity approach
Suppose we have two independent random samples from two normal populations: X 1 , X 2 ,
and Y1 , Y2 ,
, X n1 ~ N ο¨ ο1 , ο³ 2 ο© ,
, Yn2 ~ N ο¨ ο2 , ο³ 2 ο© . Here is a simple outline of the derivation of the test: H 0 : ο1 ο ο 2 ο½ 0 versus
H a : ο1 ο ο 2 οΉ 0 using the pivotal quantity approach.
ο¨X ο Y ο© . Its distribution is
N ο¨ο ο ο , ο³ ο¨1 / n ο« 1 / n ο©ο© using the mgf for N ο¨ο , ο³ ο© which is M ο¨t ο© ο½ exp ο¨οt ο« ο³ t / 2ο© , and the independence
ο¨X ο Y ο© ο ο¨ο ο ο ο© ~ N ο¨0,1ο© . Unfortunately, Z can not serve
properties of the random samples. From this we have Z ο½
[1]. We start with the point estimator for the parameter of interest ο¨ο1 ο ο 2 ο© :
2
1
2
2
1
2 2
2
1
2
ο³ 1 / n1 ο« 1 / n2
as the pivotal quantity because Ο is unknown.
[2]. We next look for a way to get rid of the unknown Ο following a similar approach in the construction of the pooled2
variance t-statistic. We found that W ο½ ο¨n1 ο 1ο©S12 ο« ο¨n2 ο 1ο©S 22 / ο³ 2 ~ ο£ n21 ο«n2 ο2 using the mgf for ο£ k which is
ο
ο¦1οΆ
M ο¨t ο© ο½ ο§ ο·
ο¨ 2t οΈ
ο
k/2
, and the independence properties of the random samples.
[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the
random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our
pivotal quantity: T ο½
ο¨X ο Y ο© ο ο¨ο
1
ο ο2 ο©
S p 1 / n1 ο« 1 / n2
~ t n1 ο« n2 ο2 , where S p2 ο½
ο¨
ο©
ο¨n1 ο 1ο©S12 ο« ο¨n2 ο 1ο©S 22
n1 ο« n2 ο 2
[4]. The rejection region is derived from P T0 ο³ c | H 0 ο½ ο‘ , where T0 ο½
is the pooled sample variance.
ο¨X ο Y ο© ο 0
S p 1 / n1 ο« 1 / n2
H0
~ t n1 ο« n2 ο2 . Thus
c ο½ t n1 ο« n2 ο 2,ο‘ / 2 . Therefore at the significance level of Ξ±, we reject H 0 in favor of H a iff T0 ο³ t n1 ο«n2 ο2,ο‘ / 2
(c) Derivation of the pooled-variance t-test (2-sided test) using the likelihood ratio test approach
Given that we have two independent random samples from two normal populations with equal but unknown
variances. Now we derive the likelihood ratio test for:
5
H0 : ΞΌ1 = ΞΌ2 vs Ha : ΞΌ1 β ΞΌ2
Let ΞΌ1 = ΞΌ2 = ΞΌ, then,
={ββ < ΞΌ1 = ΞΌ2 = ΞΌ < +β, 0 β€ Ο2 < +β}, Ξ© = {ββ < ΞΌ1 , ΞΌ2 < +β, 0 < Ο2 < +β}
1
n1 +n2
2
L(Ο) = L(ΞΌ, Ο2 ) = (2ΟΟ2 )
lnL(Ο) = β
n1 +n2
2
2
1
n1
2
(xi β ΞΌ)2 + βnj=1
exp[β 2Ο2 (βi=1
(yj β ΞΌ) )], and there are two parameters .
2
1
n1
2
(xi β ΞΌ)2 + βnj=1
ln(2ΟΟ2 ) β 2Ο2 (βi=1
(yj β ΞΌ) ), for it contains two parameters, we do the
partial derivatives with
and Ο2 respectively and let the partial derivatives equal to 0. Then we have:
n2
1
βni=1
xi + βj=1
yj n1 xΜ
+ n2 yΜ
ΞΌΜ =
=
n1 + n2
n1 + n2
2 =
ΟΜ
Ο
1
n1 +n2
n1
n2
1
2
[β (xi β ΞΌΜ)2 + β (yj β ΞΌΜ) ]
n1 + n2
i=1
j=1
2
1
n1
2
(xi β ΞΌ1 )2 + βnj=1
L(Ξ©) = L(ΞΌ1 , ΞΌ2 , Ο2 ) = (2ΟΟ2 ) 2 exp[β 2Ο2 (βi=1
(yj β ΞΌ2 ) )], and there are three
parameters.
n1
n2
n1 + n2
1
2
lnL(Ξ©) = β
ln(2ΟΟ2 ) β 2 (β (xi β ΞΌ1 )2 + β (yj β ΞΌ2 ) )
2
2Ο
i=1
j=1
2
We do the partial derivatives with ΞΌ1 , ΞΌ2 and Ο respectively and let them all equal to 0. Then we have:
n1
n2
1
2
ΞΌ
Μ1 = xΜ
, ΞΌ
Μ2 = yΜ
, ΟΜ2Ξ© =
[β (xi β xΜ
)2 + β (yj β yΜ
) ]
n1 + n2
i=1
j=1
At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we
have:
n1 +n2
2
L(Ο
Μ)
Ξ»=
=
Μ)
L(Ξ©
1
( Μ
)
2ΟΟ2Ο
n1 +n2
2
1
( Μ2 )
2ΟΟΞ©
ΟΜ2
= [ Ξ©]
2
ΟΜ
Ο
1
2
βni=1
(xi β xΜ
)2 + βnj=1
(yj β yΜ
)
2
n1 +n2
2
n1 +n2
2
=[
]
n1 xΜ
+ n2 yΜ
2
n1 xΜ
+ n2 yΜ
2
n1
n2
βi=1 (xi β
β
n1 + n2 ) + j=1 (yj β n1 + n2 )
n1 +n2
t 20
]β 2
n1 + n2 β 2
where t 0 is the test statistic in the pooled variance t-test. Therefore, Ξ» β€ Ξ»β is equivalent to |t 0 |β₯ c. Thus at the
significance levelΞ±, we reject the null hypothesis in favor of the alternative when |t 0 | β₯ c = t n1 +n2 β2,Ξ±/2. This
test is identical to the test we have derived in part (b).
= [1 +
6.
People at high risk of sudden cardiac death can be identified using the change in a signal averaged
electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The
method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave
correct results on 46 patients.
(a) Is this convincing evidence that the new method is more accurate? Please test at Ξ± =.05.
6
(b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the
new method is better at Ξ± =.05?
(c) How many patients should be tested in order for this power to be at least 0.75?
Solution:
7
© Copyright 2026 Paperzz