CHAPTER 1 Exploring Data 1.1 Analyzing Categorical Data The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Analyzing Categorical Data Learning Objectives After this section, you should be able to: CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables The Practice of Statistics, 5th Edition 2 Two-Way Tables and Marginal Distributions When a dataset involves two categorical variables, we begin by examining the counts or percents in various categories for one of the variables. A two-way table describes two categorical variables, organizing counts according to a row variable and a column variable. Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A 50-50 chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 Total 2367 2459 4826 The Practice of Statistics, 5th Edition What are the variables described by this two-way table? How many young adults were surveyed? 3 Two-Way Tables and Marginal Distributions The marginal distribution of one of the categorical variables in a twoway table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. How to examine a marginal distribution: 1)Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. 2)Make a graph to display the marginal distribution. The Practice of Statistics, 5th Edition 4 Two-Way Tables and Marginal Distributions Examine the marginal distribution of chance of getting rich. Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A 50-50 chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 Total 2367 2459 4826 Chance of being wealthy by age 30 Percent 35 Almost no chance 194/4826 = 4.0% 30 Some chance 712/4826 = 14.8% A 50-50 chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% 25 Percent Response Young adults by gender and chance of getting rich 20 15 10 5 0 Almost certain 1083/4826 = 22.4% Almost none Some chance 50-50 chance Good chance Almost certain Survey Response The Practice of Statistics, 5th Edition 5 Relationships Between Categorical Variables A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. How to examine or compare conditional distributions: 1) Select the row(s) or column(s) of interest. 2) Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3) Make a graph to display the conditional distribution. • Use a side-by-side bar graph or segmented bar graph to compare distributions. The Practice of Statistics, 5th Edition 6 Relationships Between Categorical Variables Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A 50-50 chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 Total 2367 2459 Chance of being wealthy by age 4826 30 Response Male Female Almost no chance 98/2459 = 4.0% 96/2367 = 4.1% 90% 286/2459 = 11.6% 426/2367 = 18.0% 70% 720/2459 = 29.3% 696/2367 = 29.4% 758/2459 = 30.8% 663/2367 = 28.0% 597/2459 = 24.3% 486/2367 = 20.5% A 50-50 chance A good chance Almost certain 80% Percent Some chance 100% Almost certain 60% 50% Good chance 40% 30% 50-50 chance 20% 10% Some chance 0% Males Opinion The Practice of Statistics, 5th Edition Females Almost no chance 7 Relationships Between Categorical Variables Making this determination requires formal inference, which will have to wait a few chapters. Chance of being wealthy by age 30 Percent Can we say there is an association between gender and opinion in the population of young adults? 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Almost certain Good chance 50-50 chance Some chance Males Opinion Females Almost no chance Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background. The Practice of Statistics, 5th Edition 8 Cell Phones The Pew Research Center asked a random sample of 2024 adult cell phone owners from the United States which type of cell phone they own: iPhone, Android, or other (including non-smart phones). Here are the results, broken down by age category. • http://pewinternet.org/Reports/2013/Smartphone-Ownership2013.aspx iPhone Android Other Total 18-34 169 214 134 517 35-54 171 189 277 637 55+ 127 100 643 870 Total 467 503 1054 2024 What are the variables described by this two way table? Age and type of phone The Practice of Statistics, 5th Edition 9 Cell Phones (a) Use the cell phone data to calculate the marginal distribution (in percents) of type of cell phone. (b) Make a graph to display the marginal distribution. Describe what you see. Age Type of Phone iPhone Android Other Total 18-34 169 214 134 517 35-54 171 189 277 637 55+ 127 100 643 870 Total 467 503 1054 2024 467 = 23% 2024 503 android: = 25% 2024 1054 other: = 52% 2024 iphone: The percent who own iPhones and Androids are about the same, but more than half of cell phone owners do not own either an iPhone or an Android. The Practice of Statistics, 5th Edition 10 Cell Phones (c) Use the cell phone data to calculate the conditional distribution of cell phone type for the 18- to 34-year-olds. Age Type of Phone iPhone Android Other Total 18-34 169 214 134 517 The Practice of Statistics, 5th Edition 35-54 171 189 277 637 55+ 127 100 643 870 Total 467 503 1054 2024 169 = 32.7% 517 214 android: = 41.4% 517 134 other: = 25.9% 517 iphone: 11 Cell Phones (d) Based on the cell phone data, is there evidence of a difference in the distribution of cell phone ownership for the three age groups? Justify your answer. Age Type of Phone iPhone Android Other Total 18-34 32.7% 41.4% 25.9% 100% 35-54 26.8% 29.7% 43.5% 100% 55+ 14.6% 11.5% 73.9% 100% Based on the sample data, more than half of the 18- to 34-year-olds and 35- to 54-year-olds own an iPhone or an Android, while much less than half of the 55-and-older group own one of these types. Furthermore, younger people tend to prefer the Android to the iPhone, while older people show a slight preference for the iPhone over the Android. The Practice of Statistics, 5th Edition 12 Data Analysis: Making Sense of Data Section Summary In this section, we learned how to… DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables The Practice of Statistics, 5th Edition 13 PAGE 22 20-26 EVEN Homework The Practice of Statistics, 5th Edition 14
© Copyright 2024 Paperzz