Chapter 3-12 Notes TPS5e

Monday, February 13: 3.1 Scatterplots
Candy Grab Activity
Read 143–144
Why do we study relationships between two variables?
What is the difference between an explanatory variable and a response variable?
Read 145–149
How do you know which variable to put on which axis of a scatterplot? Where do you start each axis?
What is the easiest way to lose points when making a scatterplot? (xkcd.com/833)
102
What four characteristics should you consider when interpreting a scatterplot?
The following scatterplot shows the amount of sodium (in
milligrams) and amount of fat (in grams) in salads from
McDonalds (with no dressing). Describe the relationship
between sodium and fat.
HW #16 p159 (1, 2, 3, 4, 5, 7, 9, 11)
Tuesday, February 14: 3.1 Correlation
Read 150–151
What is the correlation r?
What are some characteristics of the correlation?
103
Which association is more linear: one with r = 0.50 or one with r = 0.90?
Is correlation a resistant measure of strength?
Read 152–157
How do you calculate correlation?
Here are data on the lengths in centimeters of the femur (a leg bone) and the humerus (a bone in the
upper arm) for the five specimens of archaeopteryx that preserve both bones:
Femur (x ):
38
56
59
64
74
Humerus (y ): 41
63
70
72
84
(a) Make a scatterplot. Describe what you see.
(b) Find the correlation for these data. Does it agree with your description in part (a)?
What are some additional facts about correlation?
HW #16: page 159 (15–18, 21–24)
104
Wednesday, February 15: 3.2 Least-Squares Regression Lines
Read 164–168
What is the general form of a regression equation? What is the difference between y and ?
Don’t you hate it when you open a can of soda and some of
the contents spray out of the can? Two AP Statistics
students, Kerry and Danielle, wanted to investigate if tapping
on a can of soda would reduce the amount of soda expelled
after the can has been shaken. For their experiment, they
vigorously shook 40 cans of soda and randomly assigned
each can to be tapped for 0 seconds, 4 seconds, 8 seconds, or
12 seconds. Then, after opening the can and cleaning up the
mess, the students measured the amount of soda left in each
can (in ml). Here is a scatterplot with the least-squares
regression line
Predict the amount remaining for a can that has been tapped 10 seconds.
Predict the amount remaining for a can that has been tapped for 60 seconds. How confident are you in
this prediction?
What is extrapolation? Is it a good idea to extrapolate?
Interpret the slope and y intercept of the regression line.
Read 168–172
What is a residual? How do you interpret a residual?
105
Calculate and interpret the residual for the can that was tapped for 4 seconds and had 260 ml of soda
remaining.
Here are data on some of McDonald’s beef
sandwiches
x= Carbs (g)
y = Fat (g)
31
9
33
12
34
23
37
19
40
26
40
42
45
29
37
24
38
28
How can we determine the “best” regression line for a set of data?
Is the least-squares regression line resistant to outliers?
Calculate the equation of the least-squares regression line using technology. Make sure to define
variables! Sketch the scatterplot with the graph of the least-squares regression line.
Interpret the slope and y-intercept in context.
Calculate and interpret the residual for the Big Mac, with 45g of carbs and 29g of fat.
HW #17: page 163 (27–32), page 193 (39–42, 45, 46)
106
Friday, February 17: 3.2 Assessing a Regression Line
Read 172–176
What is a residual plot? What is the purpose of a residual plot?
What do you look for in a residual plot? How can you tell if a linear model is appropriate?
Construct and interpret a residual plot for the Candy Grab data.
Read page 177–181
In the Candy Grab activity, we started with only one variable: number of starburst grabbed. To predict
how many candies a randomly selected student would grab, we decided it would be easier if we used
another variable to help us make the prediction (hand span) rather than just the mean number of
starburst. How much better will our predictions be if we use hand span to predict number of starburst
rather than just the mean number of starburst? We can answer this question two ways: s and r2 .
What is the standard deviation of the residuals s ? How do you calculate and interpret it?
For the candy grab data, the standard deviation of the residuals is s = _______. Interpret this value.
107
Instead of comparing the standard deviations, we can also compare the sum of squared residuals using
each model.
How do you interpret r2, the coefficient of determination?
How is r 2 related to r? How is r 2 related to s?
HW #18: page 194 (47, 49, 51, 56, 57)
Monday, February 20: Interpreting Computer Output, Regression to the Mean
(90 min)
Read 181–182
When Mentos are dropped into a newly opened bottle of Diet Coke, carbon dioxide is released from the
Diet Coke very rapidly, causing the Diet Coke to be expelled from the bottle. Will more Diet Coke be
expelled when there is a larger number of Mentos dropped in the bottle? Two statistics students,
Brittany and Allie, decided to find out. Using 16 ounce (2 cup) bottles of Diet Coke, they dropped either
2, 3, 4, or 5 Mentos into a randomly selected bottle, waited for the fizzing to die down, and measured the
number of cups remaining in the bottle. Then, they subtracted this measurement from the original
amount in the bottle to calculate the amount of Diet Coke expelled (in cups).
1.45
0.10
1.40
0.05
Residual
Amount Expelled
1.35
1.30
1.25
0.00
1.20
-0.05
1.15
1.10
-0.10
1.05
2.0
2.5
3.0
3.5
Mentos
4.0
4.5
5.0
1.15
1.20
1.25
Fitted Value
1.30
1.35
108
(a) What is the equation of the least-squares
regression line? Define any variables you use.
Predictor
Constant
Mentos
Coef
1.00208
0.07083
S = 0.0672442
SE Coef
0.04511
0.01228
R-Sq = 60.2%
T
22.21
5.77
P
0.000
0.000
R-Sq(adj)
(b) Interpret the slope of the least-squares regression line.
(c) What is the correlation?
(d) Is a linear model appropriate for this data? Explain.
(e) Would you be willing to use the linear model to predict the amount of Diet Coke expelled when 10
mentos are used? Explain.
(f) Calculate and interpret the residual for bottle of diet coke that had 2 mentos and lost 1.25 cups.
(g) Interpret the values of r 2 and s.
(h) If the amount expelled was measured in ounces instead of cups, how would the values of r 2 and s be
affected? Explain.
109
Read 182–185
How can you calculate the equation of the least-squares regression line using summary statistics?
What happens to the predicted value of y for each increase of 1 standard deviation in x?
What is regression to the mean?
HW #19 page 195 (59*, 61, 63, 65, 71–78)
Predictor
Coef
Constant
266.07
Breeding pairs -6.650
S = 7.76227
*output below for ebook users
SE Coef
T
52.15
1.736
R-Sq = 74.6%
P
5.10
-3.83
0.004
0.012
R-Sq(adj) = 69.5%
Tuesday, February 21: Putting it all Together: Regression and Correlation / QUIZ
(90 min)
Read 185–191
Does it matter which variable is x and which is y?
Which of the following has the greatest correlation?
110
How do outliers affect the correlation, least-squares regression line, and standard deviation of the
residuals? Are all outliers influential?
Here is a scatterplot showing the cost in dollars and the battery
life in hours for a sample of netbooks (small laptop computers).
What effect do the two netbooks that cost $500 have on the
equation of the least-squares regression line, correlation,
standard deviation of the residuals, and ? Explain.
Here is a scatterplot showing the relationship between the number
of fouls and the number of points scored for NBA players in the
2010-2011 season.
a) Describe the association.
b) Should NBA players commit more fouls if they want to score
more points? Explain.
HW #20 page 202 Chapter 3 Review Exercises
111
Monday, February 27: Chapter 12 Introduction
Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting
closer cause higher achievement, or do better students simply choose to sit in the front? To investigate,
an AP Statistics teacher randomly assigned students to seat locations in his classroom for a particular
chapter and recorded the test score for each student at the end of the chapter. The explanatory variable in
this experiment is which row the student was assigned (Row 1 is closest to the front and Row 7 is the
farthest away). Do these data provide convincing evidence that sitting closer causes students to get
higher grades?
Row 1: 76, 77, 94, 99
Row 2: 83, 85, 74, 79
Row 3: 90, 88, 68, 78
Row 4: 94, 72, 101, 70, 79
Row 5: 76, 65, 90, 67, 96
Row 6: 88, 79, 90, 83
Row 7: 79, 76, 77, 63
100
Score
90
80
70
Predictor
Constant
Row
60
1
2
3
4
Row
5
6
7
Coef
85.706
-1.1171
S = 10.0673
SE Coef
4.239
0.9472
R-Sq = 4.7%
T
20.22
-1.18
P
0.000
0.248
R-Sq(adj) = 1.3%
1. Describe the association shown in the scatterplot.
2. Using the computer output, determine the equation of the least-squares regression line.
3. Calculate the value of the correlation.
4. Calculate and interpret the residual for the student who sat in Row 1 and scored 76.
5. Interpret the slope of the least-squares regression line.
112
6. Interpret the standard deviation of the residuals.
7. Interpret the value of r 2 .
8. Explain why it was important to randomly assign the students to seats rather than letting each student
choose his or her own seat.
9. What other variables did the teacher control? What is the purpose of keeping these variables the
same?
10. Does the negative slope provide convincing evidence that sitting closer causes higher achievement,
or is it plausible that the association is due to the chance variation in the random assignment? Let’s
do a simulation to find out!
Ask me about the Flash Cards!
113
Tuesday, February 28: 12.1 Significance Tests for β
Read 742–746
What is the regression model when the conditions for inference are met?
Suppose that the true regression line for the seating chart study is ŷ = 87 – 1x with σ = 10 and that the
regression model is valid. What is the probability that someone sitting in the second row will get at least
90 on the test?
What are the conditions for regression inference? How do we check them?
Read 753–757
What is the standard error of the slope? Is this on the formula sheet? How do you interpret it?
For the seating chart experiment, state and interpret the standard error of the slope.
114
What is the standardized test statistic for a significance test for the slope? Is this formula on the formula
sheet? What degrees of freedom should you use?
What is the evidence for Ha and the two explanations in the crying and IQ example?
Do customers who stay longer at buffets give larger tips? Charlotte, an AP
statistics student who worked at an Asian buffet, decided to investigate this
question for her second semester project. While she was doing her job as a hostess,
she obtained a random sample of receipts, which included the length of time (in
minutes) the party was in the restaurant and the amount of the tip (in dollars).
Output from a linear regression analysis on these data is shown below.
Predictor
Constant
Time (minutes)
S = 1.77931
Coef
4.535
0.03013
SE Coef
1.657
0.02448
R-Sq = 13.2%
T
2.74
1.23
P
0.021
0.247
R-Sq(adj) = 4.5%
Time
(minutes)
23
39
44
55
61
65
67
70
74
85
90
99
Tip
(dollars)
5.00
2.75
7.75
5.00
7.00
8.88
9.01
5.00
7.29
7.50
6.00
6.50
(a) What is the equation of the least-squares regression line for predicting the amount of the tip from the
length of the stay? Define any variables you use.
(b) Interpret the slope and y intercept of the least-squares regression line in context.
115
(c) Do these data provide convincing evidence that customers who stay longer give larger tips?
Can you use your calculator to conduct a test for the slope?
What is p-hacking? http://fivethirtyeight.com/features/science-isnt-broken/
HW #21: page 761 (1, 2, 13, 15)
116
Wednesday, March 1: Confidence Intervals for β
(half-day)
Read 747–751
What is the formula for constructing a confidence interval for a slope? Where do you get the value of
t*? How many degrees of freedom should you use?
For their second-semester project, two AP Statistics students decided to investigate
the effect of sugar on the life of cut flowers. They went to the local grocery store and
randomly selected 12 carnations. All the carnations seemed equally healthy when they
were selected. When the students got home, they prepared 12 identical vases with
exactly the same amount of water in each vase. They put one tablespoon of sugar in 3
vases, two tablespoons of sugar in 3 vases, and three tablespoons of sugar in 3 vases.
In the remaining 3 vases, they put no sugar. After the vases were prepared and placed
in the same location, the students randomly assigned one flower to each vase and
observed how many hours each flower continued to look fresh. Here are the data and
computer output.
Predictor
Constant
Sugar (tbs)
Coef
181.200
15.200
SE Coef
3.635
1.943
T
49.84
7.82
S = 7.52596
R-Sq = 86.0%
R-Sq(adj) = 84.5%
P
0.000
0.000
Sugar
(tbs.)
0
0
0
1
1
1
2
2
2
3
3
3
Freshness
(hours)
168
180
192
192
204
204
204
210
210
222
228
234
Construct and interpret a 99% confidence interval for the slope of the true regression line.
Can you use your calculator for the DO step?
HW #22: page 759 (3–11 odd, 17, 19–24)
117
Thursday, March 3: Chapters 3/12.1 Review
A random sample of 11 high schools was selected from Michigan. The percent of students who qualify
for free/reduced lunch and the average SAT math score of each high school in the sample were recorded.
Here are a scatterplot with the least-squares regression line, a residual plot, and some computer output.
Predictor
Coef
SE Coef
T
P
Constant
577.9
12.5
46.16
0.000
Foot length
-1.993
0.276
-7.22
0.000
S = 23.3168
R-Sq = 85.29%
R-Sq(adj) = 83.66%
(a) Describe the association in the scatterplot.
(b) What is the equation of the least-squares regression line that describes the relationship between
percent free/reduced lunch and average SAT math score? Define any variables that you use.
(c) Interpret the slope and y intercept of the least-squares regression line.
(d) Calculate and interpret the residual for East Kentwood High School, with 58% FRL and 490 SAT.
(e) By about how much do the actual average SAT math scores typically vary from the values predicted
by the least-squares regression line with x = percent free/reduced lunch?
118
(f) Interpret the value of r2.
(g) Find the correlation.
(h) What effect does the school with 19% FRL and 545 SAT have on the correlation? The standard
deviation of the residuals? The equation of the least-squares regression line? Explain.
(i) Do these data provide convincing evidence that schools with a greater percentage of students that
qualify for free/reduced lunch have smaller average SAT scores?
(j) Construct and interpret a 95% confidence interval for the true slope of the least-squares regression
line relating average SAT score to percent of students that qualify for free/reduced lunch. Is this
interval consistent with your decision in the previous question? Explain.
HW #23: page 795 (1–4), page 797 (1, 3–8, 11)
Monday, March 6: Chapters 3/12.1 Review
HW #24: page 203 Chapter 3 AP Practice Test (1–13)
Tuesday, March 7: Chapters 3/12.1 Test
119
Wednesday, March 8: 12.2 Transformations to Achieve Linearity—Power Models
Read 765–771
When associations are non-linear, what two approaches can we take to model the associations? Which
approach will we use?
What is a power model? What are some examples of power models?
What does a country’s income per person (measured in gross domestic
product per person, adjusted for purchasing power) say about the
under-5 child mortality rate (per 1000 live births) in that country? Here
are the data for a random sample of 14 countries in 2009 (data from
www.gapminder.org).
(a) Sketch a scatterplot and describe why it would not be appropriate to
compute a least-squares regression line.
Country
Switzerland
Timor-Leste
Uganda
Ghana
Peru
Cambodia
Suriname
Armenia
Sweden
Niger
Serbia
Kenya
Fiji
Grenada
Income
Per
Person
38,004
2476
1202
1383
7859
1831
8199
4523
32,021
643
10,005
1494
4016
8827
Under5
Mortality
Rate
4.4
56.4
127.5
68.5
21.3
87.5
26.3
21.6
2.8
160.3
7.1
84
17.6
14.5
(b) What kind of power model might be appropriate to use for these data?
(c) Use an appropriate power transformation to make the association linear. Sketch the resulting
scatterplot, calculate the equation of the least-squares regression line, and sketch the residual plot.
(d) Predict the child mortality rate for the United States, who has an income per person of 41,256.
HW #25 page 785 (31–34)
**NOTE: On 33 and 34, make sure to use BOTH models.
120
Friday, March 10: Transforming with Logs: Power and Exponential Models
Read 771–774
Besides using power transformations, how can you linearize an association the follows a power model in
the form y = axp?
(e) Refer to child mortality example. Use an appropriate logarithmic transformation to make the
association linear. Sketch the resulting scatterplot, calculate the equation of the least-squares regression
line, and sketch the residual plot.
(f) Predict the child mortality rate for the United States, who has an income per person of 41,256.
Read 774–782
What is an exponential model? What are some examples of exponential models?
How can you linearize a relationship that follows an exponential model?
121
In the April 1, 2011 edition of the Arizona Daily Star, the following data was
presented about mandatory fees at the University of Arizona.
(a) Letting x = 4 represent 2004-05, graph a scatterplot. Does the relationship look
linear?
Year
Fees
2004-05 89
2005-06 93
2006-07 160
2007-08 213
2008-09 257
2009-10 302
2010-11 623
2011-12 913
(b) Sketch a scatterplot of ln(fees) vs. year, calculate the equation of the least-squares regression line,
and sketch a residual plot.
(c) Could a power model be better? Sketch a scatterplot of ln(fees) vs. ln(year), calculate the equation of
the least-squares regression line, and sketch a residual plot.
(d) Based on your answers to (b) and (c), would an exponential model or a power model be more
appropriate for this data? Explain.
(e) Use the model you chose in part (d) to predict the fees for the 2012–13 school year. Do you expect
this prediction to be too low, too high, or just about right? Explain.
HW #26: page 787 (35–41 odd)
122
Monday, March 20: Prop 301 Test
(sub)
HW #27 Article and Questions
Tuesday, March 21: Chapter 12.2 Review
A student opened a bag of M&M’s, dumped them out, and ate all the ones
with the M on top. When he finished, he put the remaining 30 M&M’s back
in the bag and repeated the same process over and over until all the M&M’s
were gone. Here is a table showing the number of M&M’s remaining at the
end of each “course”.
(a) Sketch a scatterplot of this data. Does the relationship look like it follows
a linear, exponential, or power model?
Course
1
2
3
4
5
6
7
M&M’s remaining
30
13
10
3
2
1
0
(b) A scatterplot of the natural log of the number of M&M’s remaining versus course number is shown
below. The last observation in the table is not included since ln(0) is undefined. Explain why it would
be reasonable to use an exponential model to describe the relationship between the number of M&M’s
remaining and the course number.
3.5
lnRemaining
3.0
2.5
2.0
1.5
1.0
0.5
0.0
1
2
3
4
Course
5
6
(c) Minitab output from a linear regression analysis on the transformed data is shown below. Give the
equation of the least-squares regression line defining any variables you use.
Predictor
Constant
Course
Coef
4.0593
-0.68073
S = 0.198897
SE Coef
0.1852
0.04755
R-Sq = 98.1%
T
21.92
-14.32
P
0.000
0.000
R-Sq(adj) = 97.6%
(d) Use your model from part (c) to predict the original number of M&M’s in the bag.
123
(e) Calculate a 95% confidence interval for the slope of the least-squares regression line using the
transformed data. Assume all the conditions for inference have been met.
(f) Super-fun bonus question! In theory, the number of M&M’s remaining after each course should be
half of the number left after the previous course. Is your confidence interval in part (e) consistent with
this theory?
HW #28: page 789 (43, 45, 47–50), page 796 (5, 6)
Wednesday, March 22: Chapter 12.2 Review / FRAPPY
Friday, March 24: Quiz on 12.2
124