Solution of HK 6

ACMS 30600
46 points: Cold-Vitamin (8), Berkeley (16), Time Series (22)
I.
Homework #6 Solutions
Cold-Vitamin data analysis
1. Find the Odds Ratio and Relative Risk for the data in the table (4 points).
O.R = 0.48995
R.R. = 0.552332
Scoring: 2 points for Odds Ratio, 2 for Relative Risk. Work required to receive credit for both.
Needed to have odds/risk when exposed/not exposed, not the reverse, for full credit.
2. Fit the saturated Log linear model for the contingency table. Paste your code at the end.
(2 points)
Code:
vc<-c(1,1,0,0)
cold<-c(1,0,1,0)
count<-c(17,122,31,109)
fit1 <-glm(count~vc+cold+vc*cold,family=poisson)
summary(fit1)
Saturated model: log(count)=4.69135+0.11267(vc)-1.25736(cold)-0.71345(interaction); the
fitted count will be exactly the same listed as the table, and the residual deviance is 0.
Scoring: 2 points for correct model with correct code; 1 point deduction if no code; ½ point
deduction if didn’t load in vc, cold, and count.
3. Implement a chi-square test to determine whether the 2 way interaction term should be
included in the log linear model for Cold-Vitamin data (2 points)
Note: Null Hypothesis is no association (no interaction needed), Alternative Hypothesis is
association (interaction needed).
If we fit another reduced model count~vc+cold, the residual deviance is 4.8718 on DF = 1.
Hence, the chi-square test stat is 0.0273, and we fail to reject the null hypothesis that the
interaction term is removed.
Code: pchisq(4.8714,1,lower.tail=F)
II.
Berkeley data analysis
Fit a log linear model with only three main effects. Based on the fitted count, find out the
Marginal and Conditional Associations of Department and gender. Interpret the result.
Preferred Answer:
# Berkeley data
count<-c(313,512,19,89,207,353,8,17,205,120,391,202)
D<-c(1,1,1,1,2,2,2,2,3,3,3,3)
Gender<-c(0,0,1,1,0,0,1,1,0,0,1,1)
Stat<-c(0,1,0,1,0,1,0,1,0,1,0,1)
fit1.berkely<-glm(count~D+Gender+Stat,family=poisson)
summary(fit1.berkely)
fit1.berkely$fitted.values
1. The model is log count = 5.607380-0.009237(Department)0.856699(Gender)+0.123309(Stat)
Note: If swap base case for gender, intercept is 4.750681.
And the fitted values (already round off) in each cell are
M rejected
M accepted
F rejected
Dept A
270
305
115
Dept B
267
303
114
Dept C
265
300
113
F accepted
130
128
127
2. Marginal association will be based on the marginal table:
Marg. tab. M
A
270+305=575
B
570
C
565
F
115+130=245
242
240
Hence OR1 = 575*242/(245*570)=0.9964, OR2 = 575*240/(565*245)=0.9969
OR3 = 570*240/(242*565)=1.0005
These OR are from each two lines of Marg. Table. They are close to 1, implying the department
and gender has no marginal association based on fitted values.
3. Conditional tables for Stat = Accepted/Rejected.
OR are 270*114/(267*115)=1.002, (270*113)/(265*115)=1.001,
267*113/(265*114)=0.9987
Similarly, from the second conditional table, we have:
OR’s are 305*128/(303*130)=0.9911, 305*127/(300*130)=0.9932,
303*127/(128*300)=1.0021.
They are all close to 1, which means department and gender don’t have conditionally associatio
n.
Rejected
A
B
C
M
270
267
265
F
115
114
113
Accepted
A
B
C
M
305
303
300
F
130
128
127
Scoring Summary:
This was a challenging question for many students, and this is where the vast majority of
points were lost on the homework. Here were the scoring guidelines applied to this question.
2 points for fitting the model in R/providing correct fitted values (if you used approach #2 or #3,
you needed to show the R code to receive full points)
6 points for showing all of the correct collapsed tables (or at least reflecting in the computation
of the odds ratios how the numbers were derived).
6 points for showing all 9 odds ratios, and the work for computing all 9 odds ratios.
2 points for a conclusion of no marginal or conditional association, or for a conclusion that was
consistent with your earlier collapsed tables and odds ratios
Another Solution:
This approach reflects that A, B, and C are categorical variables and shouldn’t be treated
linearly. Full credit was given for this solution if the fitted values were used [not count] and
all of the steps discussed above are shown correctly.
> # If D is coded using A, B, C
> count<-c(313,512,19,89,207,353,8,17,205,120,391,202)
> D<-c("a","a","a","a","b","b","b","b","c","c","c","c")
> Gender<-c(0,0,1,1,0,0,1,1,0,0,1,1)
> Stat<-c(0,1,0,1,0,1,0,1,0,1,0,1)
> fit2.berkely<-glm(count~as.factor(D)+Gender+Stat,family=poisson)
> summary(fit2.berkely)
>fit2.berkely$fitted.values
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.72784 0.04136 138.495 < 2e-16 ***
as.factor(D)b -0.46679 0.05274 -8.851 < 2e-16 ***
as.factor(D)c -0.01621 0.04649 -0.349 0.72735
Gender
-0.85670 0.04430 -19.340 < 2e-16 ***
Stat
0.12331 0.04060 3.037 0.00239 **
Fitted Values [can round at this stage; showing so that students with incorrect answers can
understand the approach]. These come from the R output:
M rejected
M accepted
F rejected
F accepted
Dept A
307.305
347.634
130.4698
147.59
Dept B
192.68
217.97
81.806
92.54
Dept C
302.36426
342.04
128.372
145.2
***For the marginal table, add up M rejected and M accepted for one column, F rejected and F
accepted for the other. For the conditional tables, take one with all of the rejected fitted values,
and one with all of the accepted fitted values.
Note: If swap base case for gender, intercept is 4.87114. That probably was the most common
output for the problem. If also swap stat [rejected] in addition to gender, intercept is 4.99445,
with nothing else unchanged. None of the fitted values will be impacted as a result.
Third Approach:
A few students took a similar approach to the 1st one, but made three separate tables for
marginal and six for conditional looking at Yes/No for Dept A, Dept B, and Dept C. Although that
isn’t the ideal approach, full credit for the problem was given if all work was correctly shown
after that point.
III.
Time Series Questions
1.
a. 4 points: 2 points for plot [I don’t believe any students lost credit here]; 2 points for saying
that there was a positive long-term trend.
b. 2 points: The first four moving averages. A graph here isn’t necessary or sufficient for full
credit:
1991: 1254.33
1992: 1273.00
1993: 1297.67
1994: 1288.33
c. 2 points: The first four values of the exponentially smoothed series:
1990: 1283.00
1991: 1268.00
1992: 1261.70
1993: 1284.89
2.
a. 2 points:
Note: AR-1 only includes an Rt-1 term. Also, if you included E(Yt), it was necessary to specify
what that consisted of for full credit.
b. 2 points:
*fit 1st order autoregressive model;
proc autoreg data = intrate_data;
model intrate = time / nlag=1;
run;
intercept=11.5888, slope=-0.2705, phi=0.311054
Comments: 0.5 points deducted if had dwProb option instead of nlag=1. Also, many students
had intercept=11.5915, slope=-0.2733 with the same phi value. Full credit was provided in
those instances.
3.
a. 2 points total
For t = 31: 10+2.5(31)+0.64(-3)= 85.58
For t = 32: 10+2.5(32)+0.64*0.64*(-3)=88.77
For t = 33: 10+2.5(32)+0.64*0.64*0.64*(-3)=91.71
b. 2 points total
For t = 31: 85.58 +/- 2 sqrt(4.3)= (81.43, 89.73)
For t = 32: 88.77 +/- 2 sqrt (4.3*(1+0.64^2))= (83.85, 93.69)
For t = 33: 91.71 +/- 2 sqrt (4.3*(1+0.64^2+0.64^4))= (86.50, 96.92)
4.
1-step ahead forecast (2 points):
11.5888-.2705*24+.311054*1.0327=5.418025
Note: You must use and add the positive value of phi, not the negative here (SAS).
95% forecast Limit:
5.418025 +/- 2*sqrt(.735)
Note: 0.735 is the MSE from the regression. Some students had other values…one was around
0.78, and since that was shown in the SAS output, I gave credit. Any other values (0.85, 0.88;
etc.) did not receive full credit.