Aim #95: How do we summarize bivariate data with frequency tables

Aim #95: How do we summarize bivariate data with frequency tables and relative
frequency tables?
5-5-17
Homework: Handout
Do Now: High school students in the United States were invited to complete an
online survey in 2010. More than 1,000 students responded to this survey that
included a question about a student‛s favorite sport. 450 of the completed
surveys were randomly selected. A rather confusing breakdown of the data by
gender was compiled from the 450 surveys:
• 100 students indicated their favorite sport was soccer. 49 of those students
were females.
• 131 students selected lacrosse as their favorite sport . 71 of those students were
males.
• 75 students selected basketball their favorite sport. 48 of those students were
females.
• 26 students indicated football as their favorite sport. 25 of those students were
males.
• And finally, 118 students indicated volleyball as their favorite sport. 70 of those
students were females.
1) What is the most popular sport? least popular?
2) How many more females than males indicated their favorite sport is volleyball?
3) How many more males than females indicated their favorite sport was soccer?
4) How do you think the 450 surveys used in Example 1 might have been selected?
You can assume that there were 1,000 surveys to select f rom.
The data in the Do Now consist of two responses from each student completing a
survey. The first response indicates a student‛s gender and the second response
indicates the student‛s favorite sport. For example, data collected from one
student was “male” and “soccer.”
The data are bivariate categorical data.
The first step in analyzing the statistical question posed by the students in their
mathematics class is to organize this data in a two-way frequency table. A two-way
frequency table that can be used to organize the categorical data is shown below.
The letters below represent the frequency counts of the cells of the table.
soccer
lacrosse basketball football
volleyball
• The highlighted cells are called marginal frequencies. They are located around the
“margins” of the table and represent the totals of the rows or columns of the
table.
• The non-shaded cells within the table are called joint frequencies. Each joint cell
is the frequency count of responses from the two categorical variables located by
the intersection of a row and column.
1) Describe the data that would be counted in (a), (j), (l), (n), and (r).
2) Cell (i) is the number of male students who selected basketball as their
favorite sport. Using the information given in Example 1, what is the value of this
number?
3) Cell (d) is the number of females whose favorite sport is football. Using the
information given in Example 1, what is the value of this number?
4) Complete the table above by determining a frequency count for each cell based
on the summarized data. Place each value next to the appropriate letter.
A survey asked the question “How tall are you to the nearest inch?” A second
question on this survey asked, “What sports do you play?”
1) Indicate what type of data, numerical or categorical, would be collected from
the first question? second question?
Another random sample of 100 surveys was selected. Jill had a copy of the
frequency table that summarized these 100 surveys. Unfortunately she spilled
part of her lunch on the copy. The following summaries were still readable:
soccer
lacrosse
basketball
football
volleyball
1) Help Jill recreate the table by determining the frequencies for cells (c), (e), (j),
and (q).
2) Of the cells (c), (e), (j), and (q), which cells represent joint frequencies?
3) Of the cells (c), (e), (j), and (q), which cells represent marginal frequencies?
Consider the two-way frequency table below.
soccer
lacrosse
basketball
football
How many people were represented in this table?
volleyball
The table below is a relative frequency table.
Relative frequency is found by dividing the frequency count by the total number of
observations.
The relative frequency of females selecting soccer is 49/450. Some other relative
frequencies are given.
Calculate the remaining relative frequencies in the table above. Write the
value in the
table as a decimal rounded to the nearest thousandth.
soccer
lacrosse
basketball
football
volleyball
1. Based on previous work with frequency tables, which cells in this table would
represent the joint relative frequencies?
2. Which cells in the relative frequency table would represent the marginal
frequencies?
relative
3. What is the joint relative frequency for females and basketball? Interpret
the meaning of this value.
4. What is the marginal relative frequency for lacrosse? Interpret the
meaning of
this value.
5. What is the difference in the joint relative frequencies for males and for
females
who chose soccer as their favorite sport?
6. Is there a noticeable difference between the genders and their favorite
sports?
The students looked at both tables in order to determine the most popular sport.
Scott acknowledged that football was probably not the best choice based on
the data. “The data indicates that lacrosse is the most popular
sport," continued
Scott.
Jill, however, still did not agree with Scott that this was a good choice. She argued
that volleyball was a better choice.
1. How does the data support Scott‛s claim? Why do you think he selected
lacrosse as
his favorite sport?
2. How does the data support Jill‛s claim? Why do you think she selected
volleyball as
as her favorite sport?
3. Of the two sportslacrosse and volleyball, select one and justify whyyou think it is a
better choice based on the data.
Create a relative frequency table based on the frequency table below (use decimals).
Frequency table
Relative frequency table
Use the values given to complete the frequency table.
Create a relative frequency table for the above data.
Sum it up!
• Categorical data are data that take on values that are categories rather than
numbers. Examples include male or female for the categorical variable of gender, or
the five superpower categories for the categorical variable of superpower qualities.
• A two-way frequency table is used to summarize bivariate categorical data.
• The number in a two-way frequency table at the intersection of a row and column
of the response to two categorical variables represents a joint frequency.
• The total number of responses for each value of a categorical variable in the table
represents the marginal frequency for that value.
• A relative frequency is found by dividing the frequency count by the total number of
observations. It can be expressed as a decimal or a percentage.