Name______________________________
Date_________________
Lesson 11.1: Histograms and Dot Plots
Algebra I
Data collected by means of surveys, observation, or research is often summarized by graphs. Data refers to the
raw information, the information you want to analyze and understand.
Two types of graphs we can use to analyze data are Histograms and Dot Plots.
Histogram Example: Charlie’s Food Factory currently employs 28 workers whose ages are shown below on a
histogram.
(a) How many workers have ages between
19 and 21 years?
(b) How many workers are over 21
years of age?
Dot Plot Example: The following Dot Plot also shows the ages of the workers at Charlie’s Food Factory, but in
a different format.
(a) How many workers are 18
years old?
(b) What is the range of the ages
of workers?
(c) Would consider this distribution to be uniform, symmetric, skewed right, or skewed left?
(d) What are some differences between a histogram and a dot plot? When would a histogram be more useful
and when would a dot plot be more useful?
Exercise #1: The 2006 – 2007 Arlington High School Varsity Boy’s basketball team had an excellent season,
compiling a record of 15 – 5 (15 wins and 5 losses). The total points scored by the team for each of the 20
games are listed below in the order in which the games were played:
76, 55, 76, 64, 46, 91, 65, 46, 45, 53, 56, 53, 57, 67, 62, 64, 67, 52, 58, 62
(a) Complete the frequency table below.
(b) Construct the histogram below.
Exercise #2: A farm is studying the weight of baby chickens (chicks) after 1 week of growth. They find the
weight, in ounces, of 20 chicks. The weights are shown below. Construct a dot plot on the axes given.
2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6
1. The test grades from 18 students at Bolderdash High are listed below.
72, 86, 95, 75, 100, 85, 87, 100, 81, 84, 78, 94, 96, 80, 100, 98, 96, 91
Complete the frequency table.
Interval
71 - 75
76 - 80
81 - 85
86 - 90
91 - 95
96 - 100
Frequency
Construct the histogram on the grid below.
2. The numbers below represent the number of minutes that it takes for 20 different people to eat breakfast
0, 5, 5, 0, 7, 8, 8, 6, 0, 0, 9, 5, 7, 6, 8, 3, 0, 5, 9, 10
On the line below, construct your own scale and draw a dot plot to represent the data.
_________________________________________________________________________
3. The heights of a group of 36 people are measured to the nearest inch. The heights range from 47 inches to 76
inches. To construct a histogram showing 10 intervals, the length of each interval should be
A. 2 inches
B. 3 inches
C. 4 inches
D. 5 inches
4. A local marketing company did a survey of 30 households to determine how many devices the household
contained that family members watched video on (i.e. TV’s, tablets, smart phones, etcetera). The dot plot of the
responses is shown below.
How many households have three devices capable of showing video on them?
(1) 1
(3) 7
(2) 2
(4) 5
More households had 4 devices to watch video on than any other number. Which of the following is closest to
the percent of households that have 4 devices?
(1) 22%
(3) 27%
(2) 34%
(4) 45%
The marketing company would like to claim that the majority of households have either 3 or 4 screens capable
of watching video on. Does the information displayed on the dot plot support this claim? Explain your
reasoning.
5. On a recent Precalculus quiz, Mr. Weiler found the following distribution of scores, which are arranged in 5
point intervals (with the exception of the last interval).
How many students scored in the 75 to 79 point range?
(1) 8
(3) 25
(2) 10
(4) 5
Students do not pass the quiz if they receive lower than a 70. How many students did not pass?
(1) 8
(3) 7
(2) 5
(4) 15
How many total students took the quiz?
(1) 25
(3) 56
(2) 104
(4) 93
Twenty-two students scored in the 80 to 84 range on this test. Does the histogram provide us with enough
information to conclude that a student must have scored an 82 on this test? Explain your thinking.
Name__________________________
Date_________________
11.1: Histograms and Dot Plots
Algebra I
1. The Fahrenheit temperature readings on 30 April mornings in Stormville, New York, are shown below.
41°, 58°, 61°, 54°, 49°, 46°, 52°, 58°, 67°, 43°,
47°, 60°, 52°, 58°, 48°, 44°, 59°, 66°, 62°, 55°,
44°, 49°, 62°, 61°, 59°, 54°, 57°, 58°, 63°, 60°
Using the data, complete the frequency table below.
On the grid below, construct and label a frequency histogram based on the table.
2. Ms. Romano is a health coach and nutritionist. Recently, she encouraged Matthew to eat a healthier breakfast
and recommended a cereal with less sugar. There are many different cereals and it seems like the amount of
sugar in each type varies widely. Matthew took a trip to the grocery store and recorded the sugar amount that
each cereal has in one serving.
Label the number line below using numbers to represent the amount of sugar in one serving and then construct a
dot plot.
3. The test scores for 10 students in Ms. Sampson’s homeroom were 61, 67, 81, 83, 87, 88, 89, 90, 98, and 100. Which
frequency table is accurate for this set of data?
1)
2)
3)
4)
4. The students in one social studies class were asked howmany brothers and sisters (siblings) they each have.
Thedot plot here shows the results.
a. How many of the students have six siblings?
b. How many of the students have no siblings?
c. How many of the students have three or more siblings?
d. Would you consider this dot plot to be uniform, symmetric, skewed left, or skewed right?
5. The accompanying histogram shows the heights of the students in Kyra’s health class.
What is the total number of students in the class?
1) 5
2) 15
3) 16
4) 209
Name_____________________________
Lesson 11.2: Measures of Central Tendency
Date______________
Algebra I
In our day to day activities, we deal with many problems that involve related items of numerical information
called data. Statistics is the study of sets of such numerical data. When we gather numerical data, besides
displaying it, we often want to know a single number that is representative of the data as a whole. We call these
types of numbers measures of central tendency. The three most common measures of central tendency are
mean, median, and mode.
The mean (or average), sometimes represented by X , of a number set is found by adding all the numbers and
then dividing by the number of numbers in the set. For example, if Matthew’s grades are 80, 92, 85, 91, 95, and
88, the mean is found by using the formula:
X
sum
80 92 85 91 95 88 531
88.5
number of items
6
6
The median is the middle value when a set of numbers is arranged in order from least to great.
If there is an odd number of numbers, the median is the middle number. For the numbers 2, 5, 7, 19, 20
the median is 7.
If there is an even number of numbers, the median is the mean, or average, of the two middle numbers.
For the numbers 3, 4, 6, 7, 8, 9, the median is
6+7
2
=
13
2
= 6.5
The mode is the value that appears the most times. A set of values may have:
No mode (every number appears the same number of times)
One mode (one number appears more times than any other number)
More than one mode (several different numbers are tied for the most times repeated)
Exercise #1: A survey was taken amongst 12 people on the number of passwords they currently have to
remember. The results in ascending order are shown below. State the mean, median, and mode (to the nearest
tenth).
0, 1, 2, 1, 1, 3, 2, 6, 3, 3, 4, 3
Exercise #2: Students in Mr. Okafor’s algebra class were trying to determine if people speed along a certain
section of roadway. They collected speeds of 20 vehicles, as displayed in the table below.
(a) Find the mean, median and mode for this data set.
(b) The speed limit along this part of the highway is 35 mph. Based on your results from part (a), is it a fair
to make the conclusion that the average driver does speed on this roadway?
Exercise 3: Determine the mean and median, if possible. If not possible, explain why not.
When conducting a statistical study, it is not always possible to obtain information about every person or
situation to which the study applies. Unlike a census, in which every person is counted, some studies use only
a sample or portion of the items being investigated. Whenever a sample is taken, it is vital that it be fair; in
other words, the sample reflects the overall population.
Exercise #4: To determine which television programs are the most popular in a large city, a poll is conducted
by selecting a sample of people at random and interviewing them. Outside which of the following locations
would the interviewer be most likely to find a fair sample? Explain your choice and why the others are
inappropriate.
(1) A baseball stadium
(3) A grocery store
(2) A concert hall
(4) A comedy club
Exercise #5: Truong is trying to determine the average height of high school male students. Because he is on
the basketball team, he uses the heights of the 14 players on the team, which are given below in inches.
69, 70, 72, 72, 74, 74, 74, 75, 76, 76, 76, 77, 77, 82
(a) Calculate the mean and median for this data set. Round any non-integer answers to the nearest tenth.
(b) Is the data set above a fair sample to use to determine the average height of high school male students?
Explain your answer.
Data sets can have members that are far away from all of the rest of the data set. These elements are called
outliers, which can result in a mean that does not represent the true “average” of a data set.
Exercise #6: In Mr. Petrovic’s Advanced Calculus Course, eight students recently took a test. Their grades
were as follows:
45, 78, 82, 85, 87, 89, 93, 95
(a) Calculate the mean and median of this data set.
(b) What score is an outlier in this data set?
(c) Which value, the mean or the median, is a better measure of how well the average student did on Mr.
Petrovic’s quiz?
Exercise #7: Which statement is true about the data set 3, 4, 5, 6, 7, 7, 10?
1)
2)
3)
4)
mean = mode
mean > mode
mean = median
mean < median
Exercise #8: Mr. Taylor raised all his students’ scores on a recent test by five points. How were the mean and the
range of the scores affected?
1) The mean increased by five and the range
increased by five.
2) The mean increased by five and the range
remained the same.
3) The mean remained the same and the range
increased by five.
4) The mean remained the same and the range
remained the same.
Name______________________________
11.2: Measures of Central Tendency
Date____________________
Algebra I
1. The Student Government at Arlington High School decided to conduct a survey to determine where to go on
a senior field trip. They asked students the following question: “Would you rather go to a sports event or to an
IMAX movie?” At which of the following locations would they most likely get a fair sample?
(1) The gym, after a game
(2) The auditorium after a play
(3) A randomly chosen study hall
(4) At the Nature Club meeting.
2. For the following data set, calculate the mean and median. Any non-integer answers should be rounded to the
nearest tenth.
3, 5, 8, 8, 12, 16, 17, 20, 24
3. For the following data set, calculate the mean and median. Any non-integer answers should be rounded to the
nearest tenth.
5, 5, 9, 10, 13, 16, 18, 20, 22, 22
4. Which of the following is true about the data set {3, 5, 5, 7, 9}?
(1) median > range
(3) mean > median
(2) median = mean
(4) median > mean
5. Which of the following data sets has a median of 7.5?
(1) {6, 7, 8, 9, 10}
(3) {3, 5, 7, 8, 10, 14}
(2) {1, 3, 7, 10, 14}
(4) {2, 7, 9, 11, 14, 17}
6. A survey is taken by an insurance company to determine how many car accidents the average New York
City resident has gotten into in the past 10 years. The company surveyed 20 people who are getting off a train at
a subway station. The following table gives the results of the survey.
(a) Calculate the mean and median number of accidents of this data set. Remember, there are 6-zeros in this
data set, 8-1’s, etc.
(b) Are there any outliers in this data set? If so, what data value?
(c) Which number, the mean or the median, better represents the number of accidents an average person in
this survey had over this 10 year period? Explain your answer.
(d) Does this sample fairly represent the average number of accidents a typical New York City resident
would get into over a 10 year period? Why or why not?
Name___________________________
Lesson 11.3: Box Plots, IQR, Standard Deviation & Variance
Date_____________
Algebra 1
Recall that the median of a number set divides the set into two equal parts, or halves. Quartiles are the values
that separate the data into four parts, each containing one-fourth, or 25% of the numbers.
Example: 53, 54, 55, 55, 56, 58, 59, 59, 59, 59, 60, 61, 61, 63, 64, 65, 75, 85, 95, 95
57
59.5
64.5
1st quartile
median
3rd quartile
2nd quartile
Exercise #1: Determine the 1st quartile, median, and 3rd quartile of the number set below.
53, 39, 51, 54, 40, 46, 41, 50, 49, 39, 50, 51, 39
The five-number summary consists of the minimum value, 1st quartile, median, 3rd quartile, and maximum
value.
To Calculate the Five-Number Summary on a Graphing Calculator:
Press STAT
Choose 1: Edit
Enter the data into L1
Press STAT then the Right Arrow to Calc and choose 1: 1-Var Stats
Press Enter
Scroll down to find minx, Q1, Med, Q3, Maxx
Exercise #2: Use your calculator to determine the five number summary of the data set below.
42, 15, 25, 30, 42, 75, 80, 85, 65, 25, 19, 72, 77, 25
Box Plots are another way to represent data graphically. A Box Plot describes a data set using its quartiles and
min/max values.
Example:
Exercise #3: The test scores from Mrs. Gray’s math class are shown below.
72, 73, 66, 71, 82, 85, 95, 85, 86, 89, 91, 92
Construct a box-and-whisker plot to display these data.
Exercise #4: The number of songs fifteen students have on their MP3 players is:
120, 124, 132, 145, 200, 255, 260, 292,308, 314, 342, 407, 421, 435, 452
State the values of the minimum, 1st quartile, median, 3rd quartile, and maximum. Using these
values, construct a box-and-whisker plot using an appropriate scale on the line below.
Measures of Central Tendency are good for describing the typical data value in a set. However, they do not
inform us of how much variation is there within the data set. A good way to measure variation within a set is
with Interquartile Range (IQR). The Interquartile Range (IQR) is similar to the Range of a data set. Range
is the difference between the Max and Min whereas the Interquartile Range (IQR) is the difference between
the 3rd Quartile and 1st Quartile.
Exercise #5: The two data sets below each have equal means but differ in the variation within the data set.
Determine the Interquartile Range (IQR) of each data set. The IQR is defined as the difference between the
third quartile value and the first quartile value.
Data Set #1: 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 10, 11, 11
Data Set #2: 5, 5, 6, 6, 7, 7, 8, 8, 9, 9
The interquartile range gives a good measure of how spread out the data set is. But, the best measure of
variation within a data set is the standard deviation. The actual calculation of standard deviation is complex
and we will not go into it here. We will rely on our calculators for its calculation.
How to find Standard Deviation using the Graphing Calculator:
Following the same steps used to find the five number summary (shown previously in lesson) enter the
data into L1 and choose 1-Var Stats
Standard Deviation is shown by X
Exercise #6: A farm is studying the weight of baby chickens (chicks) after 1 week of growth. They find the
weight, in ounces, of 20 chicks. The weights are shown below. Calculate the standard deviation for this data set.
Round any non-integer values to the nearest tenth. Include appropriate units in your answers.
2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6, 3
Exercise #7: A marketing company is trying to determine how much diversity there is in the age of people who
drink different soft drinks. They take a sample of people and ask them which soda they prefer. For the two
sodas, the age of those people who preferred them is given below.
Soda A: 18, 16, 22, 16, 28, 18, 21, 38, 22, 29, 25, 44, 36, 27, 40
Soda B: 25, 22, 18, 30, 27, 19, 22, 28, 25, 19, 23, 29, 26, 18, 20
(a) Explain why standard deviation is a better measure of the diversity in age than the mean.
(b) Which soda appears to have a greater diversity in the age of people who prefer it? How did you decide on
this?
(c) Use your calculator to determine the sample standard deviation, normally given as Sx, for both data sets.
Round your answers to the nearest tenth.
Exercise #8: Johnny wants to determine how many flowers are on each rose bush in a very large flower garden.
He took a sample of the rose bushes and counted the number of flowers in each rose bush that was in his
sample. His results are below.
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
(a) If you wanted to determine the standard deviation of Johnny’s results, should you use population standard
deviation or sample standard deviation? Why?
(b) Determine the correct standard deviation.
Exercise #9: Paul has 20 rose bushes in his garden. He counted the number of flowers on each bush. His
results are below.
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
(a) If you wanted to determine the standard deviation of Paul’s results, should you use population standard
deviation or sample standard deviation? Why?
(b) Determine the correct standard deviation.
Note: If the question does not specifically say “sample” use population standard deviation.
Exercise #10: Which of the following data sets would have a standard deviation (population) closest to zero?
Do this without your calculator. Explain how you arrived at your answer.
(1) {−5, −2, −2, 0, 1, 2, 5}
(3) {11, 11, 12, 13, 13}
(2) {5, 8, 10, 16, 20}
(4) {3, 7, 11, 11, 11, 18}
1. Using the line provided, construct a box plot for the 12 scores below.
26, 32, 19, 65, 57, 16, 28, 42, 40, 21, 38, 10
Determine the number of scores that lie above the 75th percentile
2. The box plot below represents the math test scores of 20 students.
What percentage of the test scores are less than 72?
1) 25
2) 50
3) 75
4) 100
3.
(a) Calculate the IQR for the points scored each year.
(b) Determine the standard deviation for each year.
(c) Which year has more variation in the number of points scored? Explain your reasoning.
Name__________________________
11.3: Box Plots, IQR, Standard Deviation & Variance
Date_________________
Algebra 1
1. The box-and-whisker plot below represents the ages of 12 people.
What percentage of these people are age 15 or older?
1) 25
2) 35
3) 75
4) 85
2. What is the value of the third quartile shown on the box-and-whisker plot below?
1)
2)
3)
4)
6
8.5
10
12
3. The box-and-whisker plot below represents a set of grades in a college statistics class.
Which interval contains exactly 50% of the grades?
1) 63-88
2) 63-95
3) 75-81
4) 75-88
4. Using the line provided, construct a box plot of describe the data set below. Make sure to create your own
scale and label the line.
5, 7, 3, 6, 4, 6, 6, 5, 8, 9
5. For each of the following data sets, find the interquartile range and the population standard deviation. Show
your calculation for the IQR. Round all non-integer values to the nearest tenth.
(a) 4, 6, 8, 10, 15, 19, 22, 25
(b) 3, 3, 4, 5, 5, 6, 6, 7, 7, 8
6. For the data set shown in the dot plot below, which of the following is closest to its population standard
deviation?
(1) 2.7
(3) 3.3
(2) 4.2
(4) 5.8
7. What is the interquartile range of the data set represented in the box plot shown below?
(1) 24
(3) 8
(2) 14
(4) 12
8. Which of the following best measures the average distance that a data value lies away from the mean?
(1) mean
(3) median
(2) standard deviation
(4) range
9. Which of the following data sets would have the largest standard deviation?
(1) {3, 3, 4, 5, 5}
(3) {2, 8, 18, 26, 35}
(2) {72, 73, 74, 75, 76}
(4) {8, 10, 12, 14, 16}
10. Two surveys were done, each with 30 participants. In the first case (Survey A), the survey was random, in
the second case (Survey B), the survey only included families with at least one teenager. The dot plots of the
results are shown below.
(a) Enter the data into your calculator and use it to calculate the mean number of devices, the interquartile
range, and the standard deviation of both data sets. Round all non-integers to the nearest tenth. Remember, you
will have to enter a given data point more than once. For example, in Survey A, you will need to enter 2-0’s, 51’s, 3-2’s, etc.
Survey A Statistics:
Survey B Statistics:
(b) Which of these two survey data sets had the greatest variation in the data? Explain based on the statistics
you found in part (a).
(c) How many of the 30 values in Survey B fall within one standard deviation of the mean. To do this
calculation, add the standard deviation and subtract the standard deviation from the mean and then count the
number of values between the results of this addition and subtraction.
Name_______________________
Lesson 11.4: Scatter Plots & Linear Regression
Date___________________
Algebra 1
Oftentimes, statistical studies are done where data is collected on two variables instead of one in order to
establish whether there is a correlation or relationship between the two variables. This is called a bivariate
data analysis.
A Scatter Plot relates two sets of data on a graph. Both the vertical and horizontal axes show numeric data
amounts.
Importance of Studying
Example:
A Scatter Plot can also have 3 different types of Correlation:
Data that appears to have a correlation can be approximated by a line of best fit. We can frequently use the line
of best fit to make predictions. We can also use this line to determine the strength of the correlation or
relationship. When the dots are tight with the line, it is a stronger correlation, when they are spread out, it is
a weaker correlation.
Example:
Exercise #1: A survey was taken of 10 low and high temperatures, in Fahrenheit, in the month of April to try to
establish a relationship between a day’s low temperature and high temperatures.
(a) Construct a scatter plot of this bivariate
data set on the grid below.
(b) Draw a line of best fit through this data set.
(c) Use your line of best fit to estimate the high temperature for a day in April given that the low temperature
was 42 degrees. Illustrate your answer on your graph.
(d) Would you characterize the relationship between the low and high temperature as a positive correlation or a
negative correlation? Explain.
Frequently, lines of best fit are drawn by hand as you did above. However, we can use our calculators to find
the exact equation of the line of best fit, often referred to as Linear Regression.
Steps to find Linear Regression on Graphing Calculator:
Press STAT and then choose 1: Edit
Enter your independent variable (x) in L1
Enter your dependent variable (y) in L2
Press STAT Press the Right Arrow to scroll over to Calc
Choose 4: LinReg(ax+b)
Press Enter
Substitute a and b into 𝒚 = 𝒂𝒙 + 𝒃
Exercise #2: A survey was taken of 10 low and high temperatures, in Fahrenheit, in the month of April to try to
establish a relationship between a day’s low temperature and high temperatures.
Use your calculator to find the equation for the line of best fit. Round the slope of the line to the nearest
hundredth and the y-intercept to the nearest integer.
We can also use our calculators to determine the exact strength of a correlation or relationship between two
sets of data, which is call the correlation coefficient. We use the variable r, where −𝟏 ≤ 𝒓 ≤ 𝟏, to represent
the correlation coefficient.
The strength of the correlation depends on r.
The closer r is to 1, the stronger the positive correlation
The closer r is to – 𝟏, the stronger the negative correlation
If r is 0, there is no correlation
Steps to find r using the graphing calculator
Press 2nd and then 0 to get to CATALOG
Scroll down and choose DiagnosticsOn
Press STAT and then choose 1: Edit
Enter your independent variable (x) in L1
Enter your dependent variable (y) in L2
Press STAT Press the Right Arrow to scroll over to Calc
Choose 4: LinReg(ax+b)
Press Enter
Exercise #3: A survey was taken of 10 low and high temperatures, in Fahrenheit, in the month of April to try to
establish a relationship between a day’s low temperature and high temperatures.
(a) Determine the correlation coefficient.
(b) Is this a strong correlation? Explain.
Exercise #4: Which of the following data sets has the strongest correlation?
Exercise #5: Given the scatter plot shown below, which of the r-values would most likely represent the
correlation between the two variables? Explain your choice.
Exercise #6: Which of the following scatter plots would have a correlation coefficient closest to −1?
Exercise #7: Generally, the mileage that a car gets changes with the weight of the car. A survey of some cars
with their weights and gas mileages is shown below.
(a) Using the grid provided, construct a scatter plot of this bivariate data.
(b) Find the equation for the line of best fit using your calculator and then graph it on your scatter plot. Round
both coefficients to the nearest tenth.
(c) What is the correlation coefficient of this data?
(d) Does this data have a positive or negative correlation? Explain
(e) How strong is the correlation? How do you know?
(f) If a car had a weight of 4,300 pounds, what would this model predict as its gas mileage? Round to the
nearest integer. Use appropriate units and make sense of your answer.
(g) If we wanted to purchase a car that got 40 miles to a gallon, what weight of car, to the nearest 100 pounds,
should we purchase?
Thus far we have discussed correlation which is the relationship between two sets of data. Next we will talk
about causation, which is when one set of data causes another set to behave the way it does.
Note: correlation does not imply causation!
Variables can have extremely strong correlations but no causal relationship. This is often the case if there is a
third variable that causes both (known interestingly enough as a lurking variable).
Example: It has been proved through studies that there is a correlation between ice cream sales and murder.
However, there is no causal relationship. Can anyone think of what the lurking variable might be?
Exercise #8: In each of the following scenarios, two variables are given that if plotted would have a strong
correlation (a scatterplot where the data falls nearly in a line). Determine if there exists a causal relationship
between the two variables. If so, which variable causes the other?
(a) The high temperature in New York City and the
number of bottles of water sold.
(b) Divorce rate in Maine and the
consumption of margarine.
(c) A person’s weight loss and the number of hours
a person spends in the gym per week.
(d) Number of people who drowned in a
swimming pool and the number of movies
that Nicholas Cage appeared in.
Exercise #9: The table below shows the number of firefighters required to fight a given fire versus the dollar
damage done to the house by the fire.
(a) Are the data positively or negatively correlated? How can you tell?
(b) Does the number of firefighters cause the damage done to the house? If not, what hidden variable is causing
both variables to change?
Name__________________________
11.4: Scatter Plots & Linear Regression
Date____________________
Algebra I
1. A survey was done at Ketcham High School to determine the effect of time spent on studying and grade point
average. The table below shows the results for 10 students randomly selected.
(a) Create a scatter plot for this data set on the grid
provided.
(b) Use your calculator to determine the equation
of the line of best fit, then draw the line.
(c) determine the expected GPA from studying
for 8 hours per week. Round your answer
to the nearest whole number.
(d) What is the correlation coefficient for the data? Is then a strong correlation? Explain.
(e) Is there a causal relationship between these two variables? If so, which variable causes the other?
2. A survey was done to determine if there was any connection between the price that people pay for their most
expensive car and the current value of their house. The results, for eight participants, are given below.
A computer was used to determine the line of best fit. Its equation was:
𝑦 = 12𝑥 + 33,766
(a) Use the line of best fit to predict the house value of a person whose most expensive car costs $19,500.
(b) Was the prediction in (a) an overestimate or underestimate of the actual house value? Explain.
(c) Is there a positive or negative correlation between these two variables? Explain.
(d) Is there a causal relationship between these two variables? If you answer yes, then determine which variable
causes the other. If you answer no, then explain a third variable that could be causing both.
3. We are now going to revisit our data from the homework yesterday, but with our calculator. A survey was
done at Ketcham High School to determine the effect of time spent on studying and grade point average. The
table below shows the results for 10 students randomly selected.
(a) Enter the data in your calculator and use it to generate the equation for the line of best fit. Round your slope
to the nearest tenth and round your y-intercept to the nearest integer.
(b) According to the linear regression model from part (a), what GPA, to the nearest integer, would result from
studying for 15 hours in a given week? Justify your answer.
(c) A passing average is defined as a 65% or above. Does the model predict a passing average if the student
spends no time studying in a given week? Justify your answer.
(d) For each additional hour that a student studies per week, how many points does the model predict a GPA
will rise?
4. Below there are six scatter plots, six correlation coefficients, and six terms. Match the appropriate r-value
with the scatter plot it most likely corresponds to. Then match the term you think is most appropriate to the rvalue as well.
Name____________________
Lesson 11.5: Non-Linear Regression
Date________________
Algebra I
In previous lessons we fit bivariate data sets with lines of best fit which we called linear regression.
Sometimes, though, linear models are not the best choice. We can fit data with all sorts of curves, the most
common of which are linear, exponential, and quadratic. But, there are many other types. Today we are going
to focus on exponential and quadratic regression, but first let’s recall the general shapes of these two types of
functions.
Exercise #1: For each scatterplot shown below, determine if it is best fit with a linear, exponential, or quadratic
function. Draw a curve of best fit depending on your choice.
Our calculators can produce equations for exponentials of best fit and quadratics of best fit, also called
exponential or quadratic regression (along with a lot of other types of curves).
Steps to find exponential and quadratic regression on Graphing Calculator:
Enter independent variable (x) into L1
Enter dependent variable (y) into L2
Press STAT and then the Right Arrow to go over to CALC
Scroll down to find both QuadReg (for quadratic regression) and ExpReg (for exponential regression)
Exercise #2: Biologists are modeling the spread of flu cases as they spread around a particular city. The total
number of cases, y, was recorded each day, x, after the total first reached 16. The data for the first week is
shown in the table below.
x, days
y, cases
0
16
1
18
3
22
4
25
6
33
7
35
(a) Use your calculator to find the exponential regression equation for this data set in the form 𝑦 = 𝑎(𝑏)𝑥 .
Round all parameters to the nearest hundredth.
(b) Based on the regression equation, how many total cases of flu will there be after two weeks?
(c) Hospital officials will declare an emergency when the total number of cases exceeds 200. On what day will
they need to declare this emergency?
So, really, regression, as mysterious as it may be, is all about finding the best version of whatever curve we
think fits the data best. In some cases, it is easiest to determine the curve that best fits the data if you actually
graph the data on a scatter plot. We have already discussed how to do this by hand, but there is also a way to do
it on your graphing calculator.
Steps to graph a scatter plot on your Graphing Calculator:
Enter independent variable (x) into L1
Enter dependent variable (y) into L2
Press 2nd and then Y= to get to STAT PLOT
Press ENTER to select 1: Plot1…Off
Select On
Press GRAPH
If the scatter plot does not fit nicely on the screen, adjust the window by pressing ZOOM and then
selecting 9: ZoomStat
Exercise #3: The cost per widget produced by a factory generally drops as more are produced but then starts to
rise again due to overtime costs and wear on the equipment. Quality control engineers recorded data on the cost
per widget compared to the number of widgets produced. Their data is shown below.
(a) Why should a quadratic model be considered for this data set as opposed to linear or exponential? Use your
calculator to create a scatterplot of this data to verify its quadratic nature.
(b) Determine the equation of the quadratic of best fit.
Name_________________________
11.5: Non-Linear Regression
Date____________________
Algebra I
1. For each scatterplot below, determine the best type of regression from: linear, exponential, or quadratic.
Draw a representative curve (line, exponential, or parabola) through the data.
2. Given the scatterplot below, which of the following equations would best mode the data? Explain your
choice.
3. A marketing company is keeping track of the number of hits that a website receives on a daily basis. Their
data for the first two weeks is shown below. A scatterplot of the data is also shown.
(e) Which model will predict faster growth of website hits over time? Explain your answer.
Name_______________________
Lesson 11.6: Residuals
Date__________________
Algebra I
In previous lesson we have learned about different types of regression and the correlation coefficient (or rvalue) which measures the predictability of a model. Although the r-value is an excellent measure, it does not
tell us whether a linear model makes sense for a given set of data. Today we will examine what are known as
residuals and residuals graphs to determine whether or not a linear model makes sense.
The residual of a data point is defined as the difference between the observed y-value and the predicted
y-value.
Exercise #1: A scatter plot was graphed and the line of best fit was determined to be 𝑦 = 2𝑥 + 7. One data
point on the scatter plot is (3,15). Calculate the residual for this data point.
Exercise #2: The data given below show the height (in cm) at various ages (in months) for a group of children.
Determine the equation of the line of best fit and use that to complete the table below.
Age
18
(months)
Height
76
(cm)
Prediction
(cm)
Residual
(cm)
19
20
21
22
23
24
25
26
27
28
29
77.1
78.1
78.3
78.8
79.4
79.9
81.3
81.1
82.0
82.6
83.5
It is possible to find the residuals and graph them on you graphing calculator.
Steps to find Residuals on Graphing Calculator:
Press STAT and choose 1: Edit
Enter independent variable (x) into L1
Enter dependent variable (y) into L2
Press STAT use the Right Arrow to go to CALC and choose 4: LinReg(ax+b) (if you want to turn
diagnostic on so you can also see the correlation coefficient that is fine)
Press STAT and choose 1: Edit
Scroll over to L3 so that L3 is highlighted (you may have to press the up arrow to highlight L3)
Press 2nd and then STAT in order to get to LIST
Choose 7: RESID and press ENTER
The Residuals should appear in L3
Steps to Graph Residuals on Graphing Calculator:
Complete the steps above to determine the residuals
Press 2nd and Y= to get to STAT PLOT
Press ENTER to choose 1: Plot1
Turn the Plot On and Scroll Down to Ylist, it should say L2, press CLEAR to get rid of it then press 2nd
STAT to get to LIST and choose 7: RESID
Press GRAPH
Press ZOOM and choose 9: ZoomStat to adjust the window
Note: If the residuals graph has a pattern or trend, then a linear model is not a good fit for the data. If the
graph has no trend, then a linear model is a good fit.
Exercise #3: In order to be a safe driver, there are a lot of things to consider. For example, you have to leave
enough distance between your car and the car in front of you in case you need to stop suddenly. The table
shows the braking distance for a particular car when traveling at different speeds.
Use your calculator to get the residual graph. Based on the graph, is a linear model a good fit for this data?
Exercise #4: A skydiver jumps from an airplane and an attached micro-computer records the time and speed of
the diver for the first 12 seconds of the diver’s freefall. The data is shown in the table below.
(a) Find the equation for the line of best fit for this data set. Round both coefficients to the nearest tenth. As
well, determine the correlation coefficient and round it to the nearest hundredth. Based on the correlation
coefficient, characterize the fit as positive or negative and how strong of a fit it is.
(b) Graph the line of best fit on your calculator and use the table to help you fill in the following table.
(c) Use your calculator to graph the residuals. Do the residuals show any distinct pattern? What does this tell
us about the data?
Exercise #5: A school district was attempting to correlate the number of hours a student studies in a given week
with their grade point average. They surveyed 8 students and found the following data.
(a) Find the equation for the line of best fit and the associated r-value. Round the linear coefficients to the
nearest tenth and the r-value to the nearest hundredth.
(b) What is the value of the residual associated with the data point (11, 94)?
(c) Produce, using your calculator, the residual graph. Next, sketch the graph in the space provided below. It
does not need to be exact, just an approximation.
(d) Why does this residual plot show a more appropriate linear model than the one in Exercise #2, even though
the r-value is worse?
Name________________________
11.6: Residuals
Date______________________
Algebra I
1. Which of the following residual plots indicates a model that is most appropriate?
2. Which of the following residual plots would indicate the linear model used to produce it was an inappropriate
choice?
3. A set of data is fit with linear regression. The equation for the best fit line was = 5.2𝑥 + 18 . If the observed
value when x =10 was y = 62, then which of the following represents the value of the residual for this data
point?
4. Physics students are performing a lab where they allow a ball to roll down a ramp and record the distance that
it has rolled versus the time it has been rolling. The data for one such experiment are shown below.
(a) Determine the equation for the line of best fit. Round your coefficients to the nearest tenth. Also, determine
the correlation coefficient. Round it to the nearest hundredth.
(b) Using your calculator, produce a scatter plot of this data set along with the line of best fit. Do your best job
to sketch it below.
(c) Calculate the residual for the data point (2.0, 5.6).
(d) Create a graph of the residuals using your calculator. Draw a sketch of the graph below.
(e) Is the linear model appropriate for this data set given the residual plot? Explain below.
Name_________________________
Lesson 11.7: Two-Way Frequency Tables
Date__________________
Algebra I
Thus far we have talked about scatter plots which are used to represent two sets of quantitative data, such as
distance, time, height, or area, which are expressed with numerical measurements. In this lesson we will
discuss Two-Way Frequency Tables which are used to represent two sets of categorical or qualitative data,
such as gender, sport, flavor, color, or shape.
Example:
Now let's take a look at the vocabulary used in two-way frequency tables. Entries in the body of the table are
called joint frequencies. The cells which contain the sum of the initial counts by row and by column are called
marginal frequencies. Note that the lower right corner cell (the total of all the counts) is not labeled as a
marginal frequency.
Exercise #1: The results of a poll of 100 adults about their favorite sport are shown in the two-way table below.
Fill in the missing information.
Exercise #2: As a class, we will fill out the following two-way table based on the hair and eye color of the
people in this classroom.
Data can be converted to relative frequencies in a two-way table so that the results can be easily compared,
especially when numbers are large.
Example: Let’s take another look at the two-way table on gender and vehicles that we saw earlier.
There are three ways to convert this data into relative frequencies.
1. Relative Frequency for whole table
2. Conditional Relative Frequencies for Rows
3. Conditional Relative Frequencies for Columns
Exercise #3: Let’s see if there is a connection between eye color and hair color by using conditional relative
frequencies from an already created two-way table on eye and hair color.
(a) Based on the table above, what is the conditional relative frequency of having green eyes if you have red
hair? (This is equivalent to asking what the percent of people with red hair have green eyes.)
(b) Based on the table above, what is the conditional relative frequency of having green eyes if you have black
hair?
(c) Based on the table above, does it appear that having green eyes has a dependency or at least an association
with having red hair? Explain.
(d) Based on the table above, is it more likely that a person with black hair has blue eyes or that a person with
blond hair has brown eyes? Use conditional marginal frequencies to support your answer.
Exercise #4: A survey of 52 graduating seniors was conducted to determine if there was a connection between
the gender of the student and whether they were going on to college. Based on this data, what is more likely:
that someone going to college is female or that someone who is female is going to college? These may seem
like the same thing, but are quite different.
Name_________________________
11.7: Two-Way Frequency Tables
Date___________________
Algebra I
1. A survey was done to determine the relationship between gender and subject preference. A total of 56
students were surveyed to determine if they liked math, English, social studies, or science as their favorite
subject. The results were then broken down based on whether the respondent was male or female.
(a) Which of the following is closest to the joint relative frequency of being a male who likes social studies?
(1) 0.42
(2) 0.14
(3) 0.31
(4) 0.56
(b) Which of the following is the marginal relative frequency of liking math?
(1)
(2)
18
36
8
10
(3)
(4)
10
18
18
56
(c) What percent of female students liked English as their favorite subject?
(1) 20%
(2) 16%
(3) 11%
(4) 60%
(d) A person looking at this table concludes that it is more likely that a female student will like social studies
than a male student will like math. Is this correct? Justify your answer.
(e) Is it more likely that a person who likes social studies will be female or that a person who is female will like
social studies? Justify.
2. Demographers are trying to understand the association between where a person lives and how they commute
to work. They survey 100 people in three cities with the results shown below.
(a) Given that a person rides a train to work, what is the conditional relative frequency that they live in New
York?
(1) 0.25
(3) 0.49
(2) 0.63
(4) 0.82
(b) If a person lives in Los Angeles, what is the conditional relative frequency that they drive a car?
(1) 0.42
(2) 0.16
(3) 0.68
(4) 0.51
(c) Which of the following is the marginal frequency of walking to work?
(1) 18%
(2) 60%
(3) 25%
(4) 44%
(d) Is a person more likely to ride a train if they live in New York or if they live in Chicago? Justify your
answer.
© Copyright 2026 Paperzz