Chapter 4

Statistics 9055
Chapter 4
Simulated 3 × 3 Contingency Table
Probabilities when rows and columns are independent
Columns
Rows
1
2
3
𝑝+1
𝑝+2
𝑝+3
0.2
0.3
0.5
1
𝑝1+
0.3
0.06
0.09
0.15
2
𝑝2+
0.3
0.06
0.09
0.15
3
𝑝3+
0.4
0.08
0.12
0.20
Simulated Data
>
>
>
>
>
w<-c(1,2,3,1,2,3,1,2,3)
x<-c(1,1,1,2,2,2,3,3,3)
p<-c(0.06,0.06,0.08,0.09,0.09,0.12,0.15,0.15,0.20)
y<-rmultinom(1,1000,p)
y
[,1]
[1,]
64
[2,]
70
[3,]
87
[4,]
85
[5,]
86
[6,] 110
[7,] 151
[8,] 162
[9,] 185
Simulation Analysis
> glm_sim<-glm(y~factor(x)+factor(w), family=poisson)
> summary(glm_sim)
Call:
glm(formula = y ~ factor(x) + factor(w), family = poisson)
Deviance Residuals:
1
2
-0.28413 -0.03318
9
-0.38139
3
0.27917
4
0.07614
5
-0.35749
6
0.25550
7
0.13067
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.19419
0.08281 50.646 < 2e-16 ***
factor(x)2
0.24019
0.08991
2.672 0.00755 **
factor(x)3
0.81244
0.08083 10.052 < 2e-16 ***
factor(w)2
0.05827
0.08049
0.724 0.46909
factor(w)3
0.24164
0.07714
3.132 0.00173 **
--Signif. codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 133.76849
Residual deviance:
0.60402
AIC: 68.956
on 8
on 4
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
8
0.28784
3 × 3 Contingency Table:
Heights of Husbands and Wives
Husband
Totals
Wife
Tall
Medium
Short
Tall
18
20
12
50
Medium
28
51
25
104
Short
14
28
9
51
Totals
60
99
46
205
Data File
> hts<-read.table("heights.txt",header=T)
> attach(hts)
> hts
count husband
wife
1
18
tall
tall
2
20 medium
tall
3
12
short
tall
4
28
tall medium
5
51 medium medium
6
25
short medium
7
14
tall short
8
28 medium short
9
9
short short
Analysis
> glm_hts<-glm(count~factor(husband)+factor(wife), family=poisson)
> summary(glm_hts)
Call:
glm(formula = count ~ factor(husband) + factor(wife), family = poisson)
Deviance Residuals:
1
2
3
0.8490 -0.8699
0.2304
9
-0.7508
4
-0.4482
5
0.1092
6
0.3404
7
-0.2424
Coefficients:
Estimate Std. Error z value
(Intercept)
3.9165
0.1218 32.152
factor(husband)short -0.7665
0.1784 -4.295
factor(husband)tall
-0.5008
0.1636 -3.061
factor(wife)short
-0.7126
0.1709 -4.168
factor(wife)tall
-0.7324
0.1721 -4.256
--Signif. codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05
Pr(>|z|)
< 2e-16
1.74e-05
0.00221
3.07e-05
2.08e-05
β€˜.’ 0.1 β€˜ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 50.5890
Residual deviance: 2.9232
AIC: 56.57
on 8
on 4
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
***
***
**
***
***
8
0.6645
Three-way Table Example:
Chinese Smoking Study
City
Beijing
Shanghai
Shenyang
Nanjing
Harbin
Zhengzhou
Taiyuan
Nanchang
Smoking
Smokers
Non-smokers
Smokers
Non-smokers
Smokers
Non-smokers
Smokers
Non-smokers
Smokers
Non-smokers
Smokers
Non-smokers
Smokers
Non-smokers
Smokers
Non-smokers
Lung Cancer
Yes
No
126
100
35
61
908
688
497
807
913
747
336
598
235
172
58
121
402
308
121
215
182
156
72
98
60
99
11
43
104
89
21
36
Hypothesis
Are lung cancer and smoking independent
conditional on the city in which a person lives?
Analysis
> survey<read.table("chinese_smoking_survey.txt",header=T)
> attach(survey)
> head(survey)
count cancer smoking
city
1
126
yes
smoke Beijing
2
35
yes nosmoke Beijing
3
908
yes
smoke Shanghai
4
497
yes nosmoke Shanghai
5
913
yes
smoke Shenyang
6
336
yes nosmoke Shenyang
> y<-count
> a<-factor(cancer)
> b<-factor(smoking)
> c<-factor(city)
> glm_survey<-glm(y~a+b+c+a*c+b*c, family=poisson)
> summary(glm_survey)
Results
Call:
glm(formula = y ~ a + b + c + a * c + b * c, family = poisson)
Deviance Residuals:
Min
1Q
Median
-5.6161 -2.0466
0.0332
3Q
1.8887
Max
5.0372
.
.
.
etc.
.
.
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 7983.67
Residual deviance: 288.27
AIC: 555.23
on 31
on 8
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
Nearly Saturated Model
> glm_survey<-glm(y~a+b+c+a*b+a*c+b*c, family=poisson)
> summary(glm_survey)
Call:
glm(formula = y ~ a + b + c + a * b + a * c + b * c, family =
poisson)
Deviance Residuals:
Min
1Q
Median
3Q
Max
-0.97966 -0.10261 -0.00002
0.10203
1.05837
.
.
etc.
.
.
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 7983.6714
Residual deviance:
5.1958
AIC: 274.15
on 31
on 7
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
Ordinal Variables: Example Data
(1) A working mother can establish just as warm and secure a relationship with her children as a
mother who does not work.
Strongly Agree
Agree
Disagree
Strongly Disagree
(2) Working women should have paid maternity leave.
Strongly
Agree
Agree
Neither
Agree nor
Disagree
Disagree
Strongly
Disagree
Question 2
Question 1
Strongly Agree
Agree
Disagree
Strongly Disagree
Strongly
Agree
97
102
42
9
250
Agree
96
199
102
18
415
Neither
22
48
25
7
102
Disagree
17
38
36
10
101
Strongly
Disagree
2
5
7
2
16
234
392
212
46
884
Data in R
> gss<-read.table("generalsocialsurvey.txt",header=T)
> attach(gss)
> gss
y ques1 ques2 u v
1
97
SA
SA 4 5
2 102
A
SA 3 5
3
42
D
SA 2 5
4
9
SD
SA 1 5
5
96
SA
A 4 4
6 199
A
A 3 4
7 102
D
A 2 4
8
18
SD
A 1 4
9
22
SA
N 4 3
10 48
A
N 3 3
11 25
D
N 2 3
12
7
SD
N 1 3
13 17
SA
D 4 2
14 38
A
D 3 2
15 36
D
D 2 2
16 10
SD
D 1 2
17
2
SA
SD 4 1
18
5
A
SD 3 1
19
7
D
SD 2 1
20
2
SD
SD 1 1
Test for Independence
> glm_gss<-glm(y~factor(ques1)+factor(ques2),family=poisson)
> summary(glm_gss)
Call:
glm(formula = y ~ factor(ques1) + factor(ques2), family = poisson)
Deviance Residuals:
Min
1Q
Median
-2.4520 -1.0756 -0.3441
3Q
1.0838
Max
3.5406
Coefficients:
(Intercept)
factor(ques1)D
factor(ques1)SA
factor(ques1)SD
factor(ques2)D
factor(ques2)N
factor(ques2)SA
factor(ques2)SD
--Signif. codes:
Estimate Std. Error
5.21508
0.06188
-0.61468
0.08525
-0.51594
0.08261
-2.14262
0.15585
-1.41316
0.11095
-1.40331
0.11051
-0.50682
0.08006
-3.25569
0.25476
z value
84.274
-7.210
-6.245
-13.748
-12.737
-12.698
-6.330
-12.779
Pr(>|z|)
< 2e-16
5.59e-13
4.23e-10
< 2e-16
< 2e-16
< 2e-16
2.44e-10
< 2e-16
***
***
***
***
***
***
***
***
0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 938.236
Residual deviance: 44.961
AIC: 159.99
on 19
on 12
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
> 1-pchisq(44.961,12)
[1] 1.046476e-05
Test for Dependence in the Ordinal Variables – I
> glm_gss<-glm(y~factor(ques1)+factor(ques2)+u*v,family=poisson)
> summary(glm_gss)
Call:
glm(formula = y ~ factor(ques1) + factor(ques2) + u * v, family = poisson)
Deviance Residuals:
Min
1Q
-1.50466 -0.35880
Median
0.09221
3Q
0.49857
Max
1.25805
Coefficients: (2 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept)
2.31815
0.49277
4.704 2.55e-06 ***
factor(ques1)D
0.30873
0.17610
1.753 0.07958 .
factor(ques1)SA -1.49532
0.19086 -7.834 4.71e-15 ***
factor(ques1)SD -0.36359
0.32448 -1.121 0.26249
factor(ques2)D -0.06429
0.24463 -0.263 0.79271
factor(ques2)N -0.70752
0.16032 -4.413 1.02e-05 ***
factor(ques2)SA -1.24270
0.15202 -8.175 2.96e-16 ***
factor(ques2)SD -1.29799
0.39755 -3.265 0.00109 **
u
NA
NA
NA
NA
v
NA
NA
NA
NA
u:v
0.24320
0.04129
5.890 3.85e-09 ***
--Signif. codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 938.2361
Residual deviance:
8.6677
AIC: 125.70
on 19
on 11
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
Test for Dependence in the Ordinal Variables – II
> t<-u*v
> glm_gss<-glm(y~factor(ques1)+factor(ques2)+t,family=poisson)
> summary(glm_gss)
Call:
glm(formula = y ~ factor(ques1) + factor(ques2) + t, family = poisson)
Deviance Residuals:
Min
1Q
-1.50466 -0.35880
Median
0.09221
3Q
0.49857
Max
1.25805
Coefficients:
(Intercept)
factor(ques1)D
factor(ques1)SA
factor(ques1)SD
factor(ques2)D
factor(ques2)N
factor(ques2)SA
factor(ques2)SD
t
--Signif. codes:
Estimate Std. Error z value Pr(>|z|)
2.31815
0.49277
4.704 2.55e-06 ***
0.30873
0.17610
1.753 0.07958 .
-1.49532
0.19086 -7.834 4.71e-15 ***
-0.36359
0.32448 -1.121 0.26249
-0.06429
0.24463 -0.263 0.79271
-0.70752
0.16032 -4.413 1.02e-05 ***
-1.24270
0.15202 -8.175 2.96e-16 ***
-1.29799
0.39755 -3.265 0.00109 **
0.24320
0.04129
5.890 3.85e-09 ***
0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 938.2361
Residual deviance:
8.6677
AIC: 125.70
on 19
on 11
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
Test for Dependence in the Ordinal Variables - III
> glm_gss<-glm(y~factor(ques1)+factor(ques2)+I(u*v),family=poisson)
> summary(glm_gss)
Call:
glm(formula = y ~ factor(ques1) + factor(ques2) + I(u * v), family = poisson)
Deviance Residuals:
Min
1Q
-1.50466 -0.35880
Median
0.09221
3Q
0.49857
Max
1.25805
Coefficients:
Estimate Std. Error z value Pr(>|z|)
2.31815
0.49277
4.704 2.55e-06 ***
0.30873
0.17610
1.753 0.07958 .
-1.49532
0.19086 -7.834 4.71e-15 ***
-0.36359
0.32448 -1.121 0.26249
-0.06429
0.24463 -0.263 0.79271
-0.70752
0.16032 -4.413 1.02e-05 ***
-1.24270
0.15202 -8.175 2.96e-16 ***
-1.29799
0.39755 -3.265 0.00109 **
0.24320
0.04129
5.890 3.85e-09 ***
(Intercept)
factor(ques1)D
factor(ques1)SA
factor(ques1)SD
factor(ques2)D
factor(ques2)N
factor(ques2)SA
factor(ques2)SD
I(u * v)
--Signif. codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 938.2361
Residual deviance:
8.6677
AIC: 125.70
on 19
on 11
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4