Goodness of Fit Tests - University of New Haven

Goodness of Fit Tests
Marc H. Mehlman
[email protected]
University of New Haven
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
1 / 26
Table of Contents
1
Goodness of Fit Chi–Squared Test
2
Tests of Independence
3
Chapter #9 R Assignment
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
2 / 26
Goodness of Fit Chi–Squared Test
Goodness of Fit Chi–Squared Test
Goodness of Fit Chi–Squared Test
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
3 / 26
Goodness of Fit Chi–Squared Test
Idea of the chi-square test
The chi-square (χ 2 ) test is used when the data are categorical. It
measures how different the observed data are from what we would
Observed sample proportions
Expected proportions under
(1 SRS of 700 births)
H0: p1=p2=p3=p4=p5=p6=p7=1/7
20%
Expected composition
Sample composition
expect if H0 was true.
15%
10%
5%
0%
Mon.
Tue.
Wed.
Thu.
Fri.
Marc Mehlman (University of New Haven)
Sat.
Sun.
20%
15%
10%
5%
0%
Mon.
Tue.
Goodness of Fit Tests
Wed.
Thu.
Fri.
Sat.
Sun.
4 / 26
Goodness of Fit Chi–Squared Test
The chi-square distributions
The χ2 distributions are a family of distributions that take only positive
values, are skewed to the right, and are described by a specific
degrees of freedom.
Published tables & software
give the upper-tail area for
critical values of many χ2
distributions.
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
5 / 26
Goodness of Fit Chi–Squared Test
Table D
Ex: df = 6
If χ2 = 15.9
the P-value
is between
0.01 −0.02.
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
80
100
p
0.25
0.2
0.15
0.1
0.05
0.025
0.02
0.01
0.005 0.0025
0.001
1.32
1.64
2.07
2.71
3.84
5.02
5.41
6.63
7.88
9.14
10.83
2.77
3.22
3.79
4.61
5.99
7.38
7.82
9.21
10.60
11.98
13.82
4.11
4.64
5.32
6.25
7.81
9.35
9.84
11.34
12.84
14.32
16.27
5.39
5.99
6.74
7.78
9.49
11.14
11.67
13.28
14.86
16.42
18.47
6.63
7.29
8.12
9.24
11.07
12.83
13.39
15.09
16.75
18.39
20.51
7.84
8.56
9.45
10.64
12.59
14.45
15.03
16.81
18.55
20.25
22.46
9.04
9.80
10.75
12.02
14.07
16.01
16.62
18.48
20.28
22.04
24.32
10.22
11.03
12.03
13.36
15.51
17.53
18.17
20.09
21.95
23.77
26.12
11.39
12.24
13.29
14.68
16.92
19.02
19.68
21.67
23.59
25.46
27.88
12.55
13.44
14.53
15.99
18.31
20.48
21.16
23.21
25.19
27.11
29.59
13.70
14.63
15.77
17.28
19.68
21.92
22.62
24.72
26.76
28.73
31.26
14.85
15.81
16.99
18.55
21.03
23.34
24.05
26.22
28.30
30.32
32.91
15.98
16.98
18.20
19.81
22.36
24.74
25.47
27.69
29.82
31.88
34.53
17.12
18.15
19.41
21.06
23.68
26.12
26.87
29.14
31.32
33.43
36.12
18.25
19.31
20.60
22.31
25.00
27.49
28.26
30.58
32.80
34.95
37.70
19.37
20.47
21.79
23.54
26.30
28.85
29.63
32.00
34.27
36.46
39.25
20.49
21.61
22.98
24.77
27.59
30.19
31.00
33.41
35.72
37.95
40.79
21.60
22.76
24.16
25.99
28.87
31.53
32.35
34.81
37.16
39.42
42.31
22.72
23.90
25.33
27.20
30.14
32.85
33.69
36.19
38.58
40.88
43.82
23.83
25.04
26.50
28.41
31.41
34.17
35.02
37.57
40.00
42.34
45.31
24.93
26.17
27.66
29.62
32.67
35.48
36.34
38.93
41.40
43.78
46.80
26.04
27.30
28.82
30.81
33.92
36.78
37.66
40.29
42.80
45.20
48.27
27.14
28.43
29.98
32.01
35.17
38.08
38.97
41.64
44.18
46.62
49.73
28.24
29.55
31.13
33.20
36.42
39.36
40.27
42.98
45.56
48.03
51.18
29.34
30.68
32.28
34.38
37.65
40.65
41.57
44.31
46.93
49.44
52.62
30.43
31.79
33.43
35.56
38.89
41.92
42.86
45.64
48.29
50.83
54.05
31.53
32.91
34.57
36.74
40.11
43.19
44.14
46.96
49.64
52.22
55.48
32.62
34.03
35.71
37.92
41.34
44.46
45.42
48.28
50.99
53.59
56.89
33.71
35.14
36.85
39.09
42.56
45.72
46.69
49.59
52.34
54.97
58.30
34.80
36.25
37.99
40.26
43.77
46.98
47.96
50.89
53.67
56.33
59.70
45.62
47.27
49.24
51.81
55.76
59.34
60.44
63.69
66.77
69.70
73.40
56.33
58.16
60.35
63.17
67.50
71.42
72.61
76.15
79.49
82.66
86.66
66.98
68.97
71.34
74.40
79.08
83.30
84.58
88.38
91.95
95.34
99.61
88.13
90.41
93.11
96.58 101.90 106.60 108.10 112.30 116.30 120.10 124.80
109.10 111.70 114.70 118.50 124.30 129.60 131.10 135.80 140.20 144.30 149.40
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
0.0005
12.12
15.20
17.73
20.00
22.11
24.10
26.02
27.87
29.67
31.42
33.14
34.82
36.48
38.11
39.72
41.31
42.88
44.43
45.97
47.50
49.01
50.51
52.00
53.48
54.95
56.41
57.86
59.30
60.73
62.16
76.09
89.56
102.70
128.30
153.20
6 / 26
Goodness of Fit Chi–Squared Test
Data for n observations of a categorical variable with k possible outcomes are
summarized as observed counts, n1 , n2 , · · · , nk in k cells. Let H0 specify the cell
probabilities p1 , p2 , · · · , pk for the k possible outcomes.
Definition
oj
ej
def
=
def
=
observed in cell j
npj = expected in cell j
Example
Three species of large fish (A, B, C) that are native to a certain river have been
observed to exist in equal proportions.
A recent survey of 300 large fish found 89 of species A, 120 of species B and 91 of
species C. What are the observed and expected counts?
Solution:
o1 = 89,
o2 = 120 and o3 = 91.
1
e1 = e2 = e3 = npj = 300
= 100.
3
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
7 / 26
Goodness of Fit Chi–Squared Test
Theorem (Chi–Squared Goodness of Fit Test)
The chi–square statistic, which measures how much the observed cell counts differ
from the expected cell counts, is
def
x =
k
X
(oj − ej )2
.
ej
j=1
Let
H0 : the cell probabilities are p1 , · · · , pk .
If H0 is true and
all expected counts are ≥ 1
no more than 20% of the expected counts are < 5.
then the chi–squared statistic is approximately χ2 (k − 1). In that case, the p–value of
the test
H0
versus
HA : not H0
is approximately P(x ≥ C ) where C ∼ χ2 (k − 1).
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
8 / 26
Goodness of Fit Chi–Squared Test
Example
River ecology
Three species of large fish (A, B, C) that are native to a certain river have been
observed to co-exist in equal proportions.
A recent random sample of 300 large fish found 89 of species A, 120 of species
B, and 91 of species C. Do the data provide evidence that the river’s ecosystem
has been upset?
H0: pA = pB = pC = 1/3
Ha: H0 is not true
Number of proportions compared:
k=3
All the expected counts are :
n / k = 300 / 3 = 100
Degrees of freedom:
(k – 1) = 3 – 1 = 2
X2 calculations:
( 89 − 100) 2 + (120 − 100) 2 + ( 91 − 100) 2
χ2 =
100
100
= 1.21 + 4.0 + 0.81 = 6.02
Marc Mehlman (University of New Haven)
100
Goodness of Fit Tests
9 / 26
Goodness of Fit Chi–Squared Test
Example (cont.)
If H0 was true, how likely would it be to find by chance a discrepancy between
observed and expected frequencies yielding a X2 value of 6.02 or greater?
From Table E, we find 5.99 < X2 < 7.38, so 0.05 > P > 0.025
Software gives P-value = 0.049
Using a typical significance level of 5%, we conclude that the results are
significant. We have found evidence that the 3 fish populations are not
currently equally represented in this ecosystem (P < 0.05).
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
10 / 26
Goodness of Fit Chi–Squared Test
Example (cont.)
Interpreting the χ 2 output
The individual values summed in the χ2 statistic are the χ 2 components.

When the test is statistically significant, the largest components
indicate which condition(s) are most different from the expected H0.

You can also compare the actual proportions qualitatively in a graph.
χ2 =
Percent of total .
40%
100
100
= 1.21 + 4.0 + 0.81 = 6.02
30%
20%
100
The largest X2 component, 4.0, is for
10%
0%
( 89 − 100) 2 + (120 − 100) 2 + ( 91 − 100) 2
species B. The increase in species B
A
gumpies
B
sticklebarbs
C
spotheads
Marc Mehlman (University of New Haven)
contributes the most to significance.
Goodness of Fit Tests
11 / 26
Goodness of Fit Chi–Squared Test
Example
Goodness of fit for a genetic model
Under a genetic model of dominant epistasis, a cross of white and
yellow summer squash will yield white, yellow, and green squash with
probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1).
Suppose we observe the following data:
Are they consistent with the genetic model?
H0: pwhite = 12/16; pyellow = 3/16; pgreen = 1/16
Ha: H0 is not true
We use H0 to compute the
expected counts for each
squash type.
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
12 / 26
Goodness of Fit Chi–Squared Test
Example (cont.)
We then compute the chi-square statistic:
χ2 =
(155 −153.75) 2 + ( 40 − 38.4375) 2 + (10 −12.8125) 2 = 0.069106
153.75
38.4375
12.8125
χ2
0.01016
0.06352
0.61738
0.69106
Degrees of freedom = k – 1 = 2, and X2 = 0.691.
Using Table D we find P > 0.25. Software gives P = 0.708.
This is not significant and we fail to reject H0. The observed data are consistent
with a dominant epistatic genetic model (12:3:1). The small observed deviations
from the model could simply have arisen from the random sampling process
alone.
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
13 / 26
Goodness of Fit Chi–Squared Test
Example (cont.)
> obs=c(155,40,10)
> tprob=c(12/16, 3/16, 1/16)
> chisq.test(obs,p=tprob)
Chi-squared test for given probabilities
data: obs
X-squared = 0.6911, df = 2, p-value = 0.7078
> exp=chisq.test(obs,p=tprob)$expected
> exp
[1] 153.7500 38.4375 12.8125
> (obs-exp)^2/exp
[1] 0.01016260 0.06351626 0.61737805
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
14 / 26
Tests of Independence
Tests of Independence
Tests of Independence
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
15 / 26
Tests of Independence
Example
Two-way tables
An experiment has a two-way, or block, design if two categorical
factors are studied with several levels of each factor.
Two-way tables organize data about two categorical variables with any
number of levels/treatments obtained from a two-way, or block, design.
High school students were asked whether they smoke,
and whether their parents smoke:
Second factor:
Student smoking status
First factor:
Parent smoking status
Marc Mehlman (University of New Haven)
400
416
188
Goodness of Fit Tests
1380
1823
1168
16 / 26
Tests of Independence
Example (cont.)
both parents smoke
one parent smokes
neither parent smokes
Total
student smokes
400
416
188
1,004
student doesn’t smoke
1,380
1,823
1,168
4,371
Total
1,780
2,239
1,356
5,375
Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities:
P(student & one parent smokes)
=
=
P(student smokes)
=
=
P(one parent smokes)
=
=
Marc Mehlman (University of New Haven)
P(being in row #2 & column #1)
2, 1 entry
grand total
=
416
5, 375
= 0.077
P(being in column #1)
column #1 total
grand total
=
1, 004
5, 375
= 0.187
P(being in row #2)
row #2 total
grand total
Goodness of Fit Tests
=
2, 239
5, 375
= 0.417.
17 / 26
Tests of Independence
Example (cont.)
both parents smoke
one parent smokes
neither parent smokes
Total
student smokes
400
416
188
1,004
student doesn’t smoke
1,380
1,823
1,168
4,371
Total
1,780
2,239
1,356
5,375
Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities:
P(student smokes | both parents smoke)
=
P(student smokes | one parent smokes)
=
P(student smokes | neither parent smokes)
=
Marc Mehlman (University of New Haven)
1, 1 entry
row #1 total
2, 1 entry
row #2 total
3, 1 entry
row #3 total
Goodness of Fit Tests
=
=
=
400
1, 780
416
2, 239
188
1, 356
= 0.225
= 0.186
= 0.139.
18 / 26
Tests of Independence
Observe:
Assuming
H0 : row variable and column variable are independent,
eij
= (grand total) ∗ P(being in ij th cell)
= (grand total) ∗ P(being in row #i) ∗ P(being in column #j)
row #i total
column #j total
∗
= (grand total) ∗
grand total
grand total
(row #i total) ∗ (column #j total)
=
.
grand total
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
19 / 26
Tests of Independence
Example (cont.)
both parents smoke
one parent smokes
neither parent smokes
Total
student smokes
400
416
188
1,004
student doesn’t smoke
1,380
1,823
1,168
4,371
Total
1,780
2,239
1,356
5,375
The expected counts of the six cells are:
1, 780 ∗ 1, 004
= 332.49
5, 375
2, 239 ∗ 1, 004
= 418.22
=
5, 375
1, 356 ∗ 1, 004
=
= 253.29
5, 375
1, 780 ∗ 4, 371
= 1, 447.51
5, 375
2, 239 ∗ 4, 371
=
= 1, 820.48
5, 375
1, 356 ∗ 4, 371
=
= 1, 102.71
5, 375
e11 =
e12 =
e21
e22
e31
Marc Mehlman (University of New Haven)
e32
Goodness of Fit Tests
20 / 26
Tests of Independence
Theorem (Chi–Squared Test for Two–Way Tables)
The chi–square statistic from a two–way r × c table,
def
x =
r X
c
X
(oij − eij )2
,
eij
i=1 j=1
measures how much the observed cell counts differ from the expected cell counts when
H0 : row variable and column variable are independent
holds. If H0 is true and
all expected counts are ≥ 1
no more than 20% of the expected counts are < 5.
then the chi–squared statistic is approximately χ2 ((r − 1)(c − 1)). In that case, the
p–value of the test,
H0
versus
HA : not H0
is approximately P(x ≥ C ) where C ∼ χ2 ((r − 1)(c − 1)).
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
21 / 26
Tests of Independence
Example (cont.)
Influence of parental smoking
Here is a computer output for a chi-square test performed on the data from
a random sample of high school students (rows are parental smoking
habits, columns are the students’ smoking habits). What does it tell you?
Sample size?
Hypotheses?
Are the data ok for a χ2 test?
Interpretation?
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
22 / 26
Tests of Independence
Example (cont.)
>
>
>
>
>
row1=c(400,1380)
row2=c(416,1823)
row3=c(188,1168)
obs = rbind(row1,row2,row3)
chisq.test(obs)
Pearson’s Chi-squared test
data: obs
X-squared = 37.5663, df = 2, p-value = 6.959e-09
> exp=chisq.test(obs)$expected
> exp
[,1]
[,2]
row1 332.4874 1447.513
row2 418.2244 1820.776
row3 253.2882 1102.712
> (obs-exp)^2/exp
[,1]
[,2]
row1 13.70862455 3.14881241
row2 0.01183057 0.00271743
row3 16.82884348 3.86551335
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
23 / 26
Tests of Independence
Consider a 2 × 2 two–way table:
male
female
bad driver
789
823
good driver
563
575
One can test whether being a bad/good driver has nothing to do with
gender by
1
z test for comparing two proportions.
2
Goodness of fit Chi–Squared Test for Independence.
Both ways are equivalent and will yield the same result.
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
24 / 26
Chapter #9 R Assignment
Chapter #9 R Assignment
Chapter #9 R Assignment
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
25 / 26
Chapter #9 R Assignment
1
A car expert claims that 30% of all cars in Johnstown are American made, 35%
are Japanese made, 20% are Korean made and 15% are European. Of 156 cars
randomly observed in Johnstown, 67 were American, 42 were Japanese, 24 were
Korean and 23 were European. Find the p–value of a goodness of fit test between
the what was expected and what was observed.
2
Senie et al. (1981) investigated the relationship between age and frequency of
breast self-examination in a sample of women (Senie, R. T., Rosen, P. P., Lesser,
M. L., and Kinne, D. W. Breast self–examinations and medical examination
relating to breast cancer stage. American Journal of Public Health, 71, 583–590.)
A summary of the results is presented in the following table: Frequency of breast
self–examination
Age
under 45
45 - 59
60 and over
Monthly
91
150
109
Occasionally
90
200
198
Never
51
155
172
From Hand et al., page 307, table 368. Do an independence test to see if age and
frequency of breast self–examination are independent.
Marc Mehlman (University of New Haven)
Goodness of Fit Tests
26 / 26