5. Two-way tables The Practice of Statistics in the Life Sciences Third Edition © 2014 W.H. Freeman and Company Objectives (PSLS Chapter 5) Two-way tables Two-way tables Marginal distributions Conditional distributions Simpson’s paradox Two-way tables Two-way tables summarize data about two categorical variables (or factors) collected on the same set of individuals. Each factor can have any number of levels. If the row factor has “r” levels, and the column factor has “c” levels, we say that the two-way table is an “r by c” table. High school students were asked whether they smoke, and whether their parents smoke: Second factor: Student smoking status First factor: Parent smoking status 400 416 188 1380 1823 1168 Marginal distributions We can examine each factor in a two-way table separately by studying the row totals and the column totals. They represent the marginal distributions, expressed in counts or percents. 400 416 188 1380 1823 1168 Marginal distribution for student smoking Marginal distribution for parental smoking Computing marginal percents Marginal percents are marginal counts divided by the table grand total. 400 416 188 1380 1823 1168 400 416 188 18.7% 1380 1823 1168 81.3% 1004 18.7% 5375 33.1% 41.7% 25.2% 100% 1780 33.1% 5375 The marginal distributions can be displayed on separate bar graphs, typically expressed as percents instead of raw counts. Percent of students interviewed Graphs 45% Parental smoking Sum of Counts 40% 35% 30% 25% 20% 15% 10% 5% 0% Both Each graph represents only one of the two variables, ignoring the second one. Each marginal distribution can also be shown in a pie chart. Percent of students interviewed 90% Sum of Counts One Neither StudentParents smoking 80% 70% 60% 50% 40% 30% 20% 10% 0% Smoker Nonsmoker Conditional distributions A conditional distribution is the distribution of one factor for each level of the other factor. A conditional percent is computed using the counts within a single row or a single column. The denominator is the corresponding row or column total (rather than the table grand total). 400 416 188 1380 1823 1168 Percent of students who smoke when both parents smoke = 400/1780 = 22.5% Comparing conditional distributions Comparing conditional distributions helps us describe the “relationship" between the two categorical variables. We can compare the percent of individuals in one level of factor 1 for each level of factor 2. Substantial differences suggest an association between factor 1 and factor 2. 400 416 188 1380 1823 1168 Conditional distribution of student smokers for different parental smoking statuses: Percent of students who smoke when both parents smoke = 400/1780 = 22.5% Percent of students who smoke when one parent smokes = 416/2239 = 18.6% Percent of students who smoke when neither parent smokes = 188/1356 = 13.9% Graphs The conditional distributions can be compared graphically by displaying the percents making up one factor, for each level of the other factor. Conditional distribution of student smoking status for different levels of parental smoking status: Both parents smoke One parent smokes Neither parent smokes Percent who Percent who Row total smoke do not smoke 22% 78% 100% 19% 81% 100% 14% 86% 100% Conditional distribution of student smoking status for Both parents smoke different levels of parental One parent smokes smoking status: Neither parent smokes Among students with 2 parents smoking Percent who Percent who Row total smoke do not smoke 22% 78% 100% 19% 81% 100% 14% 86% 100% 22% Percent who smoke 78% Percent who do not smoke Among students with 1parent smoking 19% Percent who smoke Percent who do not smoke Among students with 0 parent smoking 14% Percent who smoke 81% Percent who do not smoke 86% Conditional distribution of parental smoking status for different levels of student smoking status: Percent with 2 parents smoking Percent with 1 parent smoking Percent with 0 parent smoking Column total Among students who smoke Percent with 2 parents smoking Percent with 1 parent smoking Percent with 0 parent smoking Student smokes 40% 41% 19% 100% Student does not smoke 32% 42% 27% 100% Among students who do not smoke 19% 40% 41% Percent with 2 parents smoking 27% 32% Percent with 1 parent smoking 42% Percent with 0 parent smoking A 2013 Gallup survey investigated how phrasing may affect the opinions of American adults regarding physician-assisted suicide. Here are the findings: Should be allowed Should not be allowed No opinion Form A "End the patient's life by some painless means" 70% 27% 3% Form B "Assist the patient to commit suicide" 51% 45% 4% 719 816 Number interviewed The value 70% is A. a marginal value representing the proportion of respondents in favor of physician-assisted suicide. B. a conditional value representing the proportion of respondents in favor of physician-assisted suicide, given that the question was asked in Form A. Should be allowed Should not be allowed No opinion Form A "End the patient's life by some painless means" 70% 27% 3% Form B "Assist the patient to commit suicide" 51% 45% 4% 719 816 Number interviewed Allowed Not allowed No opinion Total "Painless means" 503 194 22 719 "Commit suicide" 416 367 33 816 Simpson’s paradox Lurking variables are always a problem for interpretation, but their impact can be even more drastic when dealing with categorical data. An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's paradox. The table on the right compares the failure rates when removing kidney stones in a sample of patients, using one of two procedures: open surgery and PCNL (a minimally invasive technique). Small stones Open surgery PCNL Success 81 234 273 289 77 6136 Failure 6 % failure 7% 13% 22% 17% Can you think of a possible lurking variable here? Small stones Open surgery PCNL Success 81 234 273 289 61 Failure 677 36 % failure 7% 13% 22% 17% Success Failure % failure Lar Open sur 192 71 27% The procedures are not chosen randomly by surgeons! In fact, the minimally invasive procedure is most likely used for smaller stones with a good chance of success, whereas open surgery is likely used for more problematic conditions. Small stones Open surgery PCNL Success 81 234 Failure 6 36 % failure 7% 13% Success Failure % failure Large stones Open surgery PCNL 192 55 71 25 27% 31% In New York State (excluding New York City), 1,359 white men and 121 black men died from prostate cancer in 1994. Based on how many white and black men lived there in 1994, the prostate cancer mortality rates were as follows: Death from prostate cancer All ages White 1,359 Yes No Total Rate per 100,0000 4,736,887 4,738,246 28.7 Black 121 418,871 418,992 28.9 Cancer mortality rates are similar in both groups. But when the data are broken down by age group we see that Death from prostate cancer Yes No Total Rate per 100,0000 Under 65 years of age White Black 76 18 4,177,823 396,899 4,177,899 396,917 1.8 4.5 Yes No Total Rate per 100,0000 Age 65 and older White Black 1,282 102 559,075 21,973 560,357 22,075 228.8 462.1 black men had a much higher rate of prostate cancer death than white men. What is the source of this example of Simpson’s paradox?
© Copyright 2026 Paperzz