PSTAT 120C Probability and Statistics - Week 8 Fang-I Chu, Varvara Kulikova University of California, Santa Barbara May 22, 2012 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Topics for review Contingency table Usage and test form Hint for #1,#2 ,#4 in hw6 Simpson’s Paradox example for illustration Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests Contingency table Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests Contingency table Usage Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests Contingency table Usage type of analysis: count data with concerns of the independence of two methods/subjects Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests Contingency table Usage type of analysis: count data with concerns of the independence of two methods/subjects use to investigate a dependency ( or contingency) between two classification criteria Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests Contingency table Usage type of analysis: count data with concerns of the independence of two methods/subjects use to investigate a dependency ( or contingency) between two classification criteria Test form shift 1 2 total type A B C total a11 a12 a13 r1 a21 a22 a23 r2 c1 c2 c3 n Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests H0 : classifications are independent v.s. Ha : classifications are not independent Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests H0 : classifications are independent v.s. Ha : classifications are not independent formula for expected value Ei,j = Fang-I Chu, Varvara Kulikova ri cj n with n as total counts PSTAT 120C Probability and Statistics χ2 goodness of fit tests H0 : classifications are independent v.s. Ha : classifications are not independent formula for expected value Ei,j = ri cj n with n as total counts after compute expected value, we calculate the χ2 statistics P (O −E )2 using the formula i,j i,jEi,j i,j Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics χ2 goodness of fit tests H0 : classifications are independent v.s. Ha : classifications are not independent formula for expected value Ei,j = ri cj n with n as total counts after compute expected value, we calculate the χ2 statistics P (O −E )2 using the formula i,j i,jEi,j i,j degree of freedom=(# of row-1)(# of column-1) Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 in hw1 #1 Every Tuesday afternoon during the school year, a certain university brought in a speaker to lecture on a topic of current interest. The lecture organizing committee wants to know which classes are attending the lectures and how they should target their publicity. After the fourth lecture of the year a random sample of 250 students were asked how many of the lectures they had attended. A breakdown of their responses by class in the table. number of lectures attended 0 1 2 3 4 freshmen 7 13 23 12 15 sophomores 14 19 20 4 13 juniors 15 15 17 3 10 seniors 16 10 12 7 5 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued... #1 (a) Calculated the expected values for the 20 cells in this table under the assumption that attending lectures is independent of year in school. Hint: table of expected value Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued... #1 (a) Calculated the expected values for the 20 cells in this table under the assumption that attending lectures is independent of year in school. Hint: table of expected value # lectures attended 0 1 2 3 4 total freshmen 14.56 15.96 20.16 7.28 12.04 70 sophomores 14.56 15.96 20.16 7.28 12.04 70 juniors 12.48 13.68 17.28 6.24 10.32 60 seniors 10.4 11.4 14.4 5.2 8.6 50 total 52 57 72 26 43 250 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (b) Is the χ2 approximation appropriate for this data? If not, what should be done? Hint: Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (b) Is the χ2 approximation appropriate for this data? If not, what should be done? Hint: the criteria that χ2 approximation is appropriate: all expected values are greater than 4. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (b) Is the χ2 approximation appropriate for this data? If not, what should be done? Hint: the criteria that χ2 approximation is appropriate: all expected values are greater than 4. it is sufficient to check whether the smallest expected value is greater than 4. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (b) Is the χ2 approximation appropriate for this data? If not, what should be done? Hint: the criteria that χ2 approximation is appropriate: all expected values are greater than 4. it is sufficient to check whether the smallest expected value is greater than 4. for this case, χ2 approximation is appropriate. (check it!) Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (c) Calculate an appropriate χ2 statistic and test whether or not lecture attendance is associate with year in school. Hint: Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (c) Calculate an appropriate χ2 statistic and test whether or not lecture attendance is associate with year in school. Hint: Using formula X 2 = Pk i=1 Fang-I Chu, Varvara Kulikova (Oi −Ei )2 Ei we obtained X 2 = 18.87 PSTAT 120C Probability and Statistics #1 continued.. #1 (c) Calculate an appropriate χ2 statistic and test whether or not lecture attendance is associate with year in school. Hint: Using formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 18.87 degree of freedom becomes (4 − 1)(5 − 1) = 12 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (c) Calculate an appropriate χ2 statistic and test whether or not lecture attendance is associate with year in school. Hint: Using formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 18.87 degree of freedom becomes (4 − 1)(5 − 1) = 12 Compare X 2 with χ212,0.05 , we could draw our conclusion. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (d) How would you interpret your result? Hint: from part (c), Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (d) How would you interpret your result? Hint: from part (c), if the result appears insignificant, we conclude there is no sufficient evidence to prove that lecture attendance is associate with year in school. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #1 continued.. #1 (d) How would you interpret your result? Hint: from part (c), if the result appears insignificant, we conclude there is no sufficient evidence to prove that lecture attendance is associate with year in school. if the result appears significant, we conclude there is sufficient evidence to believe that lecture attendance is associate with year in school. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 #2 At study done of 627 patients in a Texas hospital recorded whether or not the patients had Hepatitis C, whether or not they had a tattoo and whether they got that tattoo in a tattoo parlor. Hepatitis C No Hepatitis C Tattoo, parlor 18 35 Tattoo, elsewhere 8 53 No Tattoo 22 491 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued... #2 (a) Perform the appropriate χ2 test, and interpret your result. Hint: marginal total Tattoo, parlor Tattoo, elsewhere No Tattoo total Hepatitis C No Hepatitis C total 18 35 53 8 53 61 22 491 513 48 579 627 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued... #2 (a) Perform the appropriate χ2 test, and interpret your result. Hint: table of expected value Tatto, parlor Tattoo, elsewhere No Tattoo total Hepatitis C No Hepatitis C total 4.057 48.943 53 4.70 56.3 61 39.27 473.73 513 48 579 627 2 P i) Use formula X 2 = ki=1 (Oi −E we obtained X 2 = 62.68, which is Ei our test statitsic. Note our degree of freedom is (2 − 1)(3 − 1) = 2. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued.. #2 (b) Why would it be wrong to conclude that tattoo parlors are responsible for Hepatitis C? Hint: Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued.. #2 (b) Why would it be wrong to conclude that tattoo parlors are responsible for Hepatitis C? Hint: the alternative hypothesis doesn’t state that tattoo are responsible for Hep C. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued.. #2 (b) Why would it be wrong to conclude that tattoo parlors are responsible for Hepatitis C? Hint: the alternative hypothesis doesn’t state that tattoo are responsible for Hep C. the test performed in (a) does not aim to test the different impact of tattoo on Hep C from pallor and the ones from elsewhere. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued.. #2 (b) Why would it be wrong to conclude that tattoo parlors are responsible for Hepatitis C? Hint: the alternative hypothesis doesn’t state that tattoo are responsible for Hep C. the test performed in (a) does not aim to test the different impact of tattoo on Hep C from pallor and the ones from elsewhere. look at part (c)- Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued.. #2 (b) Why would it be wrong to conclude that tattoo parlors are responsible for Hepatitis C? Hint: the alternative hypothesis doesn’t state that tattoo are responsible for Hep C. the test performed in (a) does not aim to test the different impact of tattoo on Hep C from pallor and the ones from elsewhere. look at part (c)even when we narrowed our null hypothesis, the conclusion indicate significant relationship between parlor and Hep C, the casual relationship between two still remain to confirm. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued.. #2 (b) Why would it be wrong to conclude that tattoo parlors are responsible for Hepatitis C? Hint: the alternative hypothesis doesn’t state that tattoo are responsible for Hep C. the test performed in (a) does not aim to test the different impact of tattoo on Hep C from pallor and the ones from elsewhere. look at part (c)even when we narrowed our null hypothesis, the conclusion indicate significant relationship between parlor and Hep C, the casual relationship between two still remain to confirm. Note: existing lurking variable to cause insignificant conclusion, such as, IV drug user( shared needle) are more likely to go to tattoo parlors. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2 continued... #2 (c) Perform a test that specifically compares the rate of Hep C in people who went to tattoo parlors to the rest of the population of people. Hint: Table of expected value Tatto, parlor No parlor Tattoo total Hepatitis C No Hepatitis C total 4.1 48.9 53 43.9 530.1 574 48 579 627 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2(c) continued... Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2(c) continued... Using formula X 2 = Pk i=1 Fang-I Chu, Varvara Kulikova (Oi −Ei )2 Ei we obtained X 2 = 56.67 PSTAT 120C Probability and Statistics #2(c) continued... Using formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 56.67 degree of freedom becomes (2 − 1)(2 − 1) = 1 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #2(c) continued... Using formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 56.67 degree of freedom becomes (2 − 1)(2 − 1) = 1 Compare X 2 with χ21,0.05 , we could draw our conclusion. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4 in hw 5 #4 A random sample of 168 people were asked whether they believe in the existence of angels and whether they believe that aliens from other planets have visited the earth . Aliens No Aliens Total Angels No Angels Don’t Know Total 30 7 23 60 43 22 43 108 73 29 66 168 We want to test whether or not there is relationship between holding these two beliefs. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4 continued... #4 (a) Calculate the χ2 statistic for this data. Hint: table of expected value 0 Angels No Angels Aliens 22.5 10.36 No Aliens 46.9 18.64 Total 73 29 Fang-I Chu, Varvara Kulikova Don’t know Total 23.57 60 42.4 108 66 168 PSTAT 120C Probability and Statistics #4 continued... #4 (a) Calculate the χ2 statistic for this data. Hint: table of expected value 0 Angels No Angels Aliens 22.5 10.36 No Aliens 46.9 18.64 Total 73 29 Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei Fang-I Chu, Varvara Kulikova Don’t know Total 23.57 60 42.4 108 66 168 we obtained X 2 = 4.54 PSTAT 120C Probability and Statistics #4 continued... #4 (a) Calculate the χ2 statistic for this data. Hint: table of expected value 0 Angels No Angels Aliens 22.5 10.36 No Aliens 46.9 18.64 Total 73 29 Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei Don’t know Total 23.57 60 42.4 108 66 168 we obtained X 2 = 4.54 degrees of freedom is (3 − 1)(3 − 1) = 4 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4 continued... #4 (a) Calculate the χ2 statistic for this data. Hint: table of expected value 0 Angels No Angels Aliens 22.5 10.36 No Aliens 46.9 18.64 Total 73 29 Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei Don’t know Total 23.57 60 42.4 108 66 168 we obtained X 2 = 4.54 degrees of freedom is (3 − 1)(3 − 1) = 4 Compare X 2 with χ24,0.05 , we could draw our conclusion. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Hint: alternative way simplify the table as Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Hint: alternative way simplify the table as 0 Angels No Angels Aliens 30 7 37 No Aliens 43 22 65 Total 73 29 102 Then table of expected value as Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Hint: alternative way simplify the table as 0 Angels No Angels Aliens 30 7 37 No Aliens 43 22 65 Total 73 29 102 Then table of expected value as 0 Angels No Angels Aliens 26.5 10.52 37 No Aliens 46.52 18.5 65 Total 73 29 102 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei Fang-I Chu, Varvara Kulikova we obtained X 2 = 2.57 PSTAT 120C Probability and Statistics #4(a) continued... Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 2.57 degrees of freedom is (2 − 1)(2 − 1) = 1 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 2.57 degrees of freedom is (2 − 1)(2 − 1) = 1 Compare X 2 with χ21,0.05 , we could draw our conclusion. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics #4(a) continued... Use formula X 2 = Pk i=1 (Oi −Ei )2 Ei we obtained X 2 = 2.57 degrees of freedom is (2 − 1)(2 − 1) = 1 Compare X 2 with χ21,0.05 , we could draw our conclusion. Note: the conclusion from using two methods should agree. (why?) Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: example 1 Example 1 51 50 while the admission rate of male and female is 100 versus 100 , some people argue that the admission office biased the admission process by gender. Is this valid argument? why? Solution: We should look at admission rate by department, men women History Geography total Fang-I Chu, Varvara Kulikova 1 45 50 55 51 100 5 55 45 45 50 100 PSTAT 120C Probability and Statistics Simpson’s paradox: example 1 Men History Geography Total 1 45 50 55 51 100 Fang-I Chu, Varvara Kulikova Women < < > 5 45 45 45 50 100 PSTAT 120C Probability and Statistics Simpson’s paradox: example 1 Men History Geography Total 1 45 50 55 51 100 Women 5 45 45 45 50 100 < < > In general: Admission rate by department, Men Department j ... Total aj bj PJ ai PJi=1 b i=1 i Fang-I Chu, Varvara Kulikova Women < j = 1, . . . , J > Aj Bj PJ Ai Pi=1 J i=1 Bi PSTAT 120C Probability and Statistics Simpson’s paradox: Admissions example when we look at history department and geography department, admission rates for male are lower than such for female, however, the total admission rate indicates admission rate is higher for male than female! Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: Admissions example when we look at history department and geography department, admission rates for male are lower than such for female, however, the total admission rate indicates admission rate is higher for male than female! This happened due to Simpson’s paradox. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: Admissions example when we look at history department and geography department, admission rates for male are lower than such for female, however, the total admission rate indicates admission rate is higher for male than female! This happened due to Simpson’s paradox. slightly more women apply to department that are much more selective. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: Admissions example when we look at history department and geography department, admission rates for male are lower than such for female, however, the total admission rate indicates admission rate is higher for male than female! This happened due to Simpson’s paradox. slightly more women apply to department that are much more selective. Such argument is invalid because of Simpson’s paradox. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: Kidney stone example Testing the effect of kidney stone treatment: Small Stones Large Stones Combined Treatment A 81/87 = 0.93 192/263 = 0.73 273/350 = 0.78 Treatment B 234/270 = 0.87 55/80 = 0.69 289/350 = 0.83 Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: Kidney stone example Testing the effect of kidney stone treatment: Small Stones Large Stones Combined Treatment A 81/87 = 0.93 192/263 = 0.73 273/350 = 0.78 Treatment B 234/270 = 0.87 55/80 = 0.69 289/350 = 0.83 Lurking variable: size of stones. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s paradox: Kidney stone example Testing the effect of kidney stone treatment: Small Stones Large Stones Combined Treatment A 81/87 = 0.93 192/263 = 0.73 273/350 = 0.78 Treatment B 234/270 = 0.87 55/80 = 0.69 289/350 = 0.83 Lurking variable: size of stones. Simpson’s paradox: small stones and large stones are treated with more success by treatment A vs the overall effect of treatment B is greater if sizes of the stones are not considered! Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s Paradox: vector interpretation Let a1 , a2 be the count for category 1 and 2 respectively and b1 , b2 be totals for each category. Vector representation for each category is (a1 , b1 ) and (a2 , b2 ) with slopes a1 /b1 and a2 /b2 and for a combined case is represented by a vector (a1 + a2 , b1 + b2 ) with the slope (a1 + a2 )/(b1 + b2 ) (see parallelogram rule). Simpson’s paradox for a vector case: Even if each vector in category 1 has a smaller slope than a corresponding vector in category 2 the sum of the two vectors in category 1 can still have a larger slope than the sum of the two vectors is category 2. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics Simpson’s Paradox: vector interpretation - Kidney stone treatment example Evidently, treatment B has a smaller slope than treatment A (for both small and large kidney stones) and for combined the relationship is reversed. i.e. slope of combined vector for treatment B has a larger slope than one for A. Fang-I Chu, Varvara Kulikova PSTAT 120C Probability and Statistics
© Copyright 2026 Paperzz