Section 2.5: Data Analysis for Two-Way Tables Section 9.1: Chi-square test for Two-Way Tables Learning goals for this chapter: Find the joint, marginal, and conditional distributions from a two-way table of the counts by hand and with SPSS. Determine from the wording of the story whether the question is asking for a joint, marginal, or conditional percentage/probability. Know when it two-way tables and the chi-square test are the correct statistical technique for a story. Perform a hypothesis test for a 2 test, including: stating the hypotheses, obtaining the test statistic and P-value from SPSS, and writing a conclusion in terms of the story. Check assumption to see if it is appropriate to use a 2 test using the footnote of the SPSS 2 test. Two-way tables and the chi-square test are used when you are studying the association between 2 categorical variables. The joint distribution of the 2 categorical variables is the cell # (the inner squares). total # All the joint distribution should add to 1. The marginal distribution allows us to study 1 variable at a time. You get them just by adding across a row or down a column for the specific variable you are interested in. The marginals are written in the margins of the table (far right and very bottom). The marginals for the row variable should add to 1. The marginals for the column variable should add to 1. Conditional distribution: If you know one variable for sure (you have “reduced your world”), what are the respective percentages for the other variable? Bar graphs are a good way to demonstrate conditional distributions. Hypothesis testing with 2-way tables H0: There is no association between the row and column variables in the population. Ha: There is an association between the row and column variables in the population. To test the null hypothesis, compare observed cell counts with expected cell counts calculated under the assumption that the null hypothesis is true. 1 Test statistic: Chi Square Test Statistic X 2 observed count - expected count 2 expected count row total x column total , n where n = total # of observations for the table. Expected count = The X2 test statistic has an approximately chi-square distribution. To use the chi-square table, you need the degrees of freedom, (r-1)(c-1). Go to Table F in the back of the book. WE WILL LET SPSS CALCULATE THE TEST STATISTIC AND P-VALUE FOR US. YOU DO NOT NEED TO KNOW HOW TO USE THE TABLE. P-value for chi-square test is: P ( 2 X 2) (We’ll be using SPSS to do the test.) The chi-square test becomes more accurate as the cell counts increase and for tables larger than 2x2. For tables larger than 2x2: use chi-square test whenever the average of the expected counts is 5 or more and the smallest expected count is 1 or more <20% of cells have expected counts of less than 5. For 2x2 tables: use chi-square test whenever all 4 expected cell counts to be 5 or more Example: Market researchers know that background music can influence the mood and purchasing behavior of customers. One study in a supermarket in Northern Ireland compared 3 treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of bottles of French, Italian, and other wine purchased. Here is the 2-way table that summarizes the data in counts (total # of bottles sold = 243: Wine French Italian Other None 30 11 43 Music French 39 1 35 2 Italian 30 19 35 Calculate the joint distribution for music and wine: Music Wine French Italian Other None 12.3 4.5 17.7 French 16.0 0.4 14.4 Italian 12.3 7.8 14.4 Calculate the marginal distribution for music: Music Wine French Italian Other Marg. for music None 12.3 4.5 17.7 34.6 French 16.0 0.4 14.4 30.9 Italian 12.3 7.8 14.4 34.6 Calculate the marginal distribution for wine: Music Wine None French Italian Marg. for wine French 12.3 16.0 12.3 40.7 Italian 4.5 0.4 7.8 12.8 Other 17.7 14.4 14.4 46.5 Marg. for music 34.6 30.9 34.6 100 3 Questions (joint, marginal, conditional?): 1. “What percent of all wine bought was Italian with French music playing in the store?” 2. “Of the Italian wine purchased, what percent was from a store playing French music?” 3. “What percent of wine bought was Italian?” 4. “What percent of the wine purchased from French music-playing stores was French?” 5. “What percent of wine was purchased from a store with no music playing?” Using SPSS, set up the data so that you have a wine column, a music column, and a purchase column (where you will input the counts inside the chart). Wine French Italian Other French Italian Other French Italian Other Music None None None French French French Italian Italian Italian Purchase 30 11 43 39 1 35 30 19 35 Then go to Data --> Weight Cases. Click “Weight cases by” and then move “purchase” into the “frequency variable” box. Click OK. Do Analyze--> Descriptive Statistics --> Crosstabs. Make sure “observed” is checked. Put “wine” into the “Rows” box and “music” into the “Columns” box. Click OK. You will get: 4 Type of W ine * Type of Music Crosstabulation Count Type of Wine French Italian Other Total French 39 1 35 75 Type of Music Italian 30 19 35 84 None 30 11 43 84 Total 99 31 113 243 Then if you want the %s for joint and marginal distributions instead of counts, you go back to your data and do Analyze --> Descriptive Statistics --> Crosstabs --> (your rows and columns should still be entered from the previous step) --> Click “Cells” --> Click “Total.” Also, un-click “observed” so your table won’t also include the counts and be too crowded. Click “Continue” and then “OK.” You will get: Type of W ine * Type of Music Crosstabulation % of Total Type of Wine French Italian Other Total French 16.0% .4% 14.4% 30.9% Type of Music Italian 12.3% 7.8% 14.4% 34.6% None 12.3% 4.5% 17.7% 34.6% Total 40.7% 12.8% 46.5% 100.0% Is there a relationship in the population between the type of wine purchased and the type of music that is playing? Perform a significance test, and write a short summary of your conclusion. Hypotheses: Test statistic: P-value: Conclusion in terms of the story: Was it appropriate to use the chi-square test here? Justify your answer. 5 To make SPSS do the hypothesis test, you go back to Analyze --> Descriptive Statistics -> Crosstabs --> Cells. Then click “total” to make their checks go away. Also click “expected” under “counts.” Click Continue. Then click Statistics --> Chi-Square --> Continue --> OK. You will get: Chi -Square Tests Pearson Chi-Square Likelihood Ratio N of Valid Cases Value 18.279a 21.875 243 df 4 4 Asy mp. Sig. (2-s ided) .001 .000 a. 0 cells (.0% ) have expect ed count less than 5. The minimum expected count is 9.57. Use the “Pearson Chi-Square” to get your X2 test statistic, and the “Asymp. Sig.” to get the P-value. Example: Psychological and social factors can influence the survival of patients with serious diseases. One study examined the relationship between survival of patients with coronary heart disease and pet ownership. Each of 92 patients was classified as having a pet or not and by whether they survived for one year. The researchers suspect that having a pet might be connected to the patient status. Here are the data: Patient Status Alive Dead Total Pet ownership No Yes 28 50 11 3 39 53 a) Find the joint and marginal distributions (in probabilities) of patient status and pet ownership. Patient Status Pet ownership No Yes Alive Dead Marg. for pets 0.304 0.543 0.120 0.033 0.424 0.576 Marg, for status 0.847 0.153 b) Assuming a patient is still alive, what is the probability he owns a pet? Is this a joint, marginal, or conditional probability? 6 c) What is the probability a patient is still alive and owns a pet? Is this a joint, marginal, or conditional probability? d) What is the probability a patient owns a pet? Is this a joint, marginal, or conditional probability? e) State the hypotheses for a 2 test of this problem, find the X2 test statistic, its degrees of freedom, and the P-value. State your conclusion in terms of the original problem. Hypotheses: Test statistic: P-value: Conclusion in terms of the story: Chi-Square Tests Pearson Chi-Square Continuity Correction(a) Likelihood Ratio 1 Asymp. Sig. (2-sided) .003 7.190 1 .007 9.011 1 .003 Value 8.851(b) df Fisher's Exact Test Exact Sig. (2-sided) .006 Linear-by-Linear Association 8.755 N of Valid Cases 92 1 .003 7 Exact Sig. (1-sided) .004 Student Handout for M&Ms/Skittles Activity (Chapter 9: Two Way Distributions) Part 1: Plain vs. Peanut M&Ms 1. Your data for plain (mine for peanut), in counts: Brown Plain Peanut Total 2 Yellow Red Blue 5 0 3 Orange Green 8 Total 4 22 Overall total number of plain and peanut M&Ms counted: 2. Joint Distribution (in white boxes). Divide each count above by the overall total of M&Ms. Brown Yellow Red Blue Plain Peanut Marginal for color Orange Green Marginal for flavor 100% 3. Marginal Distributions (above in shading). Add down the columns and across the rows. The bottom numbers should add to 100%, and the right column should add to 100%. 4. Conditional distribution of flavor for green M&Ms (you know the M&M is green, now what is the chance it is …). The denominator will be the same for both of these calculations. These two percentages should add to 100%. Plain 5. Peanut Bar graph for the conditional distributions above (you will have 2 bars on 1 graph): 8 6. Conditional distribution of color for plain M&M. Denominator will be the same for all 6 calculations. All 6 add to 100%. Brown Yellow Red Blue Orange Green 7. Sketch a bar graph for the conditional distribution of color for plain M&Ms. You will have 6 bars on the graph. 8. Conditional distribution of color for peanut M&Ms. Denominator will be the same for all 6 calculations. All 6 add to 100%. Brown 9. Yellow Red Blue Orange Green Sketch a bar graph for the conditional distribution of color for peanut M&Ms. You will have 6 bars on the graph. Use the same y-axis scale that you used for the bar graph for plain M&Ms so that you can easily compare your results? 9 How do they compare? In order to do a hypothesis test, we need a large data set, like one from the whole class. Plain Peanut Brown 147 69 Yellow 302 110 Red 264 70 Blue 407 162 Orange 330 148 Green 373 123 10. Hypotheses for the M&Ms 2 hypothesis test. Be sure to state whether your conclusion refers to the population or the sample. 11. Test statistic and P-value for the 12. Conclusion for the 13. Was it appropriate to use the chi-square test here? 2 2 hypothesis test: hypothesis test ( = 0.01) in terms of the story. 10 Part 2: M&Ms vs. Skittles Table for counts for the whole class: Yellow 302 361 663 M&Ms Skittles Total 2 Non-yellow 1521 1351 2872 14. Hypotheses for test: 15. Test statistic and P-value: 16. Conclusion ( = 0.01) in terms of the story. 17. Was it appropriate to use the chi-square test here? 11 Total 1823 1712 3535
© Copyright 2026 Paperzz