PSTAT 120C Probability and Statistics - Week 8

PSTAT 120C Probability and Statistics - Week 8
Fang-I Chu, Varvara Kulikova
University of California, Santa Barbara
May 22, 2012
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Topics for review
Contingency table
Usage and test form
Hint for #1,#2 ,#4 in hw6
Simpson’s Paradox
example for illustration
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
Contingency table
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
Contingency table
Usage
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
Contingency table
Usage
type of analysis: count data with concerns of the independence
of two methods/subjects
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
Contingency table
Usage
type of analysis: count data with concerns of the independence
of two methods/subjects
use to investigate a dependency ( or contingency) between two
classification criteria
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
Contingency table
Usage
type of analysis: count data with concerns of the independence
of two methods/subjects
use to investigate a dependency ( or contingency) between two
classification criteria
Test form
shift
1
2
total
type
A
B
C total
a11 a12 a13
r1
a21 a22 a23
r2
c1
c2 c3
n
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
H0 : classifications are independent v.s. Ha : classifications are
not independent
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
H0 : classifications are independent v.s. Ha : classifications are
not independent
formula for expected value Ei,j =
Fang-I Chu, Varvara Kulikova
ri cj
n
with n as total counts
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
H0 : classifications are independent v.s. Ha : classifications are
not independent
formula for expected value Ei,j =
ri cj
n
with n as total counts
after compute expected value, we calculate the χ2 statistics
P (O −E )2
using the formula i,j i,jEi,j i,j
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
χ2 goodness of fit tests
H0 : classifications are independent v.s. Ha : classifications are
not independent
formula for expected value Ei,j =
ri cj
n
with n as total counts
after compute expected value, we calculate the χ2 statistics
P (O −E )2
using the formula i,j i,jEi,j i,j
degree of freedom=(# of row-1)(# of column-1)
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 in hw1
#1
Every Tuesday afternoon during the school year, a certain
university brought in a speaker to lecture on a topic of current
interest. The lecture organizing committee wants to know which
classes are attending the lectures and how they should target their
publicity. After the fourth lecture of the year a random sample of
250 students were asked how many of the lectures they had
attended. A breakdown of their responses by class in the table.
number of lectures attended
0
1 2 3 4
freshmen
7
13 23 12 15
sophomores
14
19 20 4 13
juniors
15
15 17 3 10
seniors
16
10 12 7 5
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued...
#1
(a) Calculated the expected values for the 20 cells in this table
under the assumption that attending lectures is independent of
year in school.
Hint: table of expected value
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued...
#1
(a) Calculated the expected values for the 20 cells in this table
under the assumption that attending lectures is independent of
year in school.
Hint: table of expected value
# lectures attended
0
1
2
3
4
total
freshmen
14.56
15.96 20.16 7.28 12.04 70
sophomores
14.56
15.96 20.16 7.28 12.04 70
juniors
12.48
13.68 17.28 6.24 10.32 60
seniors
10.4
11.4 14.4 5.2
8.6
50
total
52
57
72
26
43
250
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(b) Is the χ2 approximation appropriate for this data? If not, what
should be done?
Hint:
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(b) Is the χ2 approximation appropriate for this data? If not, what
should be done?
Hint:
the criteria that χ2 approximation is appropriate: all expected
values are greater than 4.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(b) Is the χ2 approximation appropriate for this data? If not, what
should be done?
Hint:
the criteria that χ2 approximation is appropriate: all expected
values are greater than 4.
it is sufficient to check whether the smallest expected value is
greater than 4.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(b) Is the χ2 approximation appropriate for this data? If not, what
should be done?
Hint:
the criteria that χ2 approximation is appropriate: all expected
values are greater than 4.
it is sufficient to check whether the smallest expected value is
greater than 4.
for this case, χ2 approximation is appropriate. (check it!)
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(c) Calculate an appropriate χ2 statistic and test whether or not
lecture attendance is associate with year in school.
Hint:
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(c) Calculate an appropriate χ2 statistic and test whether or not
lecture attendance is associate with year in school.
Hint:
Using formula X 2 =
Pk
i=1
Fang-I Chu, Varvara Kulikova
(Oi −Ei )2
Ei
we obtained X 2 = 18.87
PSTAT 120C Probability and Statistics
#1 continued..
#1
(c) Calculate an appropriate χ2 statistic and test whether or not
lecture attendance is associate with year in school.
Hint:
Using formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 18.87
degree of freedom becomes (4 − 1)(5 − 1) = 12
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(c) Calculate an appropriate χ2 statistic and test whether or not
lecture attendance is associate with year in school.
Hint:
Using formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 18.87
degree of freedom becomes (4 − 1)(5 − 1) = 12
Compare X 2 with χ212,0.05 , we could draw our conclusion.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(d) How would you interpret your result?
Hint: from part (c),
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(d) How would you interpret your result?
Hint: from part (c),
if the result appears insignificant, we conclude there is no
sufficient evidence to prove that lecture attendance is
associate with year in school.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#1 continued..
#1
(d) How would you interpret your result?
Hint: from part (c),
if the result appears insignificant, we conclude there is no
sufficient evidence to prove that lecture attendance is
associate with year in school.
if the result appears significant, we conclude there is sufficient
evidence to believe that lecture attendance is associate with
year in school.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2
#2
At study done of 627 patients in a Texas hospital recorded whether
or not the patients had Hepatitis C, whether or not they had a
tattoo and whether they got that tattoo in a tattoo parlor.
Hepatitis C No Hepatitis C
Tattoo, parlor
18
35
Tattoo, elsewhere
8
53
No Tattoo
22
491
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued...
#2
(a) Perform the appropriate χ2 test, and interpret your result.
Hint: marginal total
Tattoo, parlor
Tattoo, elsewhere
No Tattoo
total
Hepatitis C No Hepatitis C total
18
35
53
8
53
61
22
491
513
48
579
627
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued...
#2
(a) Perform the appropriate χ2 test, and interpret your result.
Hint: table of expected value
Tatto, parlor
Tattoo, elsewhere
No Tattoo
total
Hepatitis C No Hepatitis C total
4.057
48.943
53
4.70
56.3
61
39.27
473.73
513
48
579
627
2
P
i)
Use formula X 2 = ki=1 (Oi −E
we obtained X 2 = 62.68, which is
Ei
our test statitsic. Note our degree of freedom is (2 − 1)(3 − 1) = 2.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued..
#2
(b) Why would it be wrong to conclude that tattoo parlors are
responsible for Hepatitis C?
Hint:
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued..
#2
(b) Why would it be wrong to conclude that tattoo parlors are
responsible for Hepatitis C?
Hint:
the alternative hypothesis doesn’t state that tattoo are
responsible for Hep C.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued..
#2
(b) Why would it be wrong to conclude that tattoo parlors are
responsible for Hepatitis C?
Hint:
the alternative hypothesis doesn’t state that tattoo are
responsible for Hep C.
the test performed in (a) does not aim to test the different
impact of tattoo on Hep C from pallor and the ones from
elsewhere.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued..
#2
(b) Why would it be wrong to conclude that tattoo parlors are
responsible for Hepatitis C?
Hint:
the alternative hypothesis doesn’t state that tattoo are
responsible for Hep C.
the test performed in (a) does not aim to test the different
impact of tattoo on Hep C from pallor and the ones from
elsewhere.
look at part (c)-
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued..
#2
(b) Why would it be wrong to conclude that tattoo parlors are
responsible for Hepatitis C?
Hint:
the alternative hypothesis doesn’t state that tattoo are
responsible for Hep C.
the test performed in (a) does not aim to test the different
impact of tattoo on Hep C from pallor and the ones from
elsewhere.
look at part (c)even when we narrowed our null hypothesis, the conclusion
indicate significant relationship between parlor and Hep C, the
casual relationship between two still remain to confirm.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued..
#2
(b) Why would it be wrong to conclude that tattoo parlors are
responsible for Hepatitis C?
Hint:
the alternative hypothesis doesn’t state that tattoo are
responsible for Hep C.
the test performed in (a) does not aim to test the different
impact of tattoo on Hep C from pallor and the ones from
elsewhere.
look at part (c)even when we narrowed our null hypothesis, the conclusion
indicate significant relationship between parlor and Hep C, the
casual relationship between two still remain to confirm.
Note: existing lurking variable to cause insignificant
conclusion, such as, IV drug user( shared needle) are more
likely to go to tattoo parlors.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2 continued...
#2
(c) Perform a test that specifically compares the rate of Hep C in
people who went to tattoo parlors to the rest of the population of
people.
Hint: Table of expected value
Tatto, parlor
No parlor Tattoo
total
Hepatitis C No Hepatitis C total
4.1
48.9
53
43.9
530.1
574
48
579
627
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2(c) continued...
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2(c) continued...
Using formula X 2 =
Pk
i=1
Fang-I Chu, Varvara Kulikova
(Oi −Ei )2
Ei
we obtained X 2 = 56.67
PSTAT 120C Probability and Statistics
#2(c) continued...
Using formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 56.67
degree of freedom becomes (2 − 1)(2 − 1) = 1
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#2(c) continued...
Using formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 56.67
degree of freedom becomes (2 − 1)(2 − 1) = 1
Compare X 2 with χ21,0.05 , we could draw our conclusion.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4 in hw 5
#4
A random sample of 168 people were asked whether they believe in
the existence of angels and whether they believe that aliens from
other planets have visited the earth
.
Aliens
No Aliens
Total
Angels No Angels Don’t Know Total
30
7
23
60
43
22
43
108
73
29
66
168
We want to test whether or not there is relationship between
holding these two beliefs.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4 continued...
#4
(a) Calculate the χ2 statistic for this data.
Hint: table of expected value
0
Angels No Angels
Aliens
22.5
10.36
No Aliens 46.9
18.64
Total
73
29
Fang-I Chu, Varvara Kulikova
Don’t know Total
23.57
60
42.4
108
66
168
PSTAT 120C Probability and Statistics
#4 continued...
#4
(a) Calculate the χ2 statistic for this data.
Hint: table of expected value
0
Angels No Angels
Aliens
22.5
10.36
No Aliens 46.9
18.64
Total
73
29
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
Fang-I Chu, Varvara Kulikova
Don’t know Total
23.57
60
42.4
108
66
168
we obtained X 2 = 4.54
PSTAT 120C Probability and Statistics
#4 continued...
#4
(a) Calculate the χ2 statistic for this data.
Hint: table of expected value
0
Angels No Angels
Aliens
22.5
10.36
No Aliens 46.9
18.64
Total
73
29
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
Don’t know Total
23.57
60
42.4
108
66
168
we obtained X 2 = 4.54
degrees of freedom is (3 − 1)(3 − 1) = 4
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4 continued...
#4
(a) Calculate the χ2 statistic for this data.
Hint: table of expected value
0
Angels No Angels
Aliens
22.5
10.36
No Aliens 46.9
18.64
Total
73
29
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
Don’t know Total
23.57
60
42.4
108
66
168
we obtained X 2 = 4.54
degrees of freedom is (3 − 1)(3 − 1) = 4
Compare X 2 with χ24,0.05 , we could draw our conclusion.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Hint: alternative way simplify the table as
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Hint: alternative way simplify the table as
0
Angels No Angels
Aliens
30
7
37
No Aliens
43
22
65
Total
73
29
102
Then table of expected value as
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Hint: alternative way simplify the table as
0
Angels No Angels
Aliens
30
7
37
No Aliens
43
22
65
Total
73
29
102
Then table of expected value as
0
Angels No Angels
Aliens
26.5
10.52
37
No Aliens 46.52
18.5
65
Total
73
29
102
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
Fang-I Chu, Varvara Kulikova
we obtained X 2 = 2.57
PSTAT 120C Probability and Statistics
#4(a) continued...
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 2.57
degrees of freedom is (2 − 1)(2 − 1) = 1
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 2.57
degrees of freedom is (2 − 1)(2 − 1) = 1
Compare X 2 with χ21,0.05 , we could draw our conclusion.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
#4(a) continued...
Use formula X 2 =
Pk
i=1
(Oi −Ei )2
Ei
we obtained X 2 = 2.57
degrees of freedom is (2 − 1)(2 − 1) = 1
Compare X 2 with χ21,0.05 , we could draw our conclusion.
Note: the conclusion from using two methods should agree.
(why?)
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: example 1
Example 1
51
50
while the admission rate of male and female is 100
versus 100
,
some people argue that the admission office biased the admission
process by gender. Is this valid argument? why?
Solution: We should look at admission rate by department,
men women
History
Geography
total
Fang-I Chu, Varvara Kulikova
1
45
50
55
51
100
5
55
45
45
50
100
PSTAT 120C Probability and Statistics
Simpson’s paradox: example 1
Men
History
Geography
Total
1
45
50
55
51
100
Fang-I Chu, Varvara Kulikova
Women
<
<
>
5
45
45
45
50
100
PSTAT 120C Probability and Statistics
Simpson’s paradox: example 1
Men
History
Geography
Total
1
45
50
55
51
100
Women
5
45
45
45
50
100
<
<
>
In general:
Admission rate by department,
Men
Department j
...
Total
aj
bj
PJ
ai
PJi=1
b
i=1 i
Fang-I Chu, Varvara Kulikova
Women
<
j = 1, . . . , J
>
Aj
Bj
PJ
Ai
Pi=1
J
i=1 Bi
PSTAT 120C Probability and Statistics
Simpson’s paradox: Admissions example
when we look at history department and geography
department, admission rates for male are lower than such for
female, however, the total admission rate indicates admission
rate is higher for male than female!
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: Admissions example
when we look at history department and geography
department, admission rates for male are lower than such for
female, however, the total admission rate indicates admission
rate is higher for male than female!
This happened due to Simpson’s paradox.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: Admissions example
when we look at history department and geography
department, admission rates for male are lower than such for
female, however, the total admission rate indicates admission
rate is higher for male than female!
This happened due to Simpson’s paradox.
slightly more women apply to department that are much
more selective.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: Admissions example
when we look at history department and geography
department, admission rates for male are lower than such for
female, however, the total admission rate indicates admission
rate is higher for male than female!
This happened due to Simpson’s paradox.
slightly more women apply to department that are much
more selective.
Such argument is invalid because of Simpson’s paradox.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: Kidney stone example
Testing the effect of kidney stone treatment:
Small Stones
Large Stones
Combined
Treatment A 81/87 = 0.93 192/263 = 0.73 273/350 = 0.78
Treatment B 234/270 = 0.87 55/80 = 0.69 289/350 = 0.83
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: Kidney stone example
Testing the effect of kidney stone treatment:
Small Stones
Large Stones
Combined
Treatment A 81/87 = 0.93 192/263 = 0.73 273/350 = 0.78
Treatment B 234/270 = 0.87 55/80 = 0.69 289/350 = 0.83
Lurking variable: size of stones.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s paradox: Kidney stone example
Testing the effect of kidney stone treatment:
Small Stones
Large Stones
Combined
Treatment A 81/87 = 0.93 192/263 = 0.73 273/350 = 0.78
Treatment B 234/270 = 0.87 55/80 = 0.69 289/350 = 0.83
Lurking variable: size of stones.
Simpson’s paradox: small stones and large stones are treated with
more success by treatment A vs the overall effect of treatment B is
greater if sizes of the stones are not considered!
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s Paradox: vector interpretation
Let a1 , a2 be the count for category 1 and 2 respectively and b1 , b2
be totals for each category. Vector representation for each category
is (a1 , b1 ) and (a2 , b2 ) with slopes a1 /b1 and a2 /b2 and for a
combined case is represented by a vector (a1 + a2 , b1 + b2 ) with
the slope (a1 + a2 )/(b1 + b2 ) (see parallelogram rule).
Simpson’s paradox for a vector case: Even if each vector in
category 1 has a smaller slope than a corresponding vector in
category 2 the sum of the two vectors in category 1 can still have a
larger slope than the sum of the two vectors is category 2.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics
Simpson’s Paradox: vector interpretation - Kidney stone
treatment example
Evidently, treatment B has a smaller slope than treatment A (for
both small and large kidney stones) and for combined the
relationship is reversed. i.e. slope of combined vector for treatment
B has a larger slope than one for A.
Fang-I Chu, Varvara Kulikova
PSTAT 120C Probability and Statistics