Wednesday, August 11 (131 minutes)

CHAPTER 3 BIVARIATE DATA
Day 0: 3.1 Describing Relationships
READ PAGES 141-149 & COMPLETE THE TWO EXAMPLES AFTER THE CHAPTER 2 TEST
***BE READY TO START DAY 1 ON TUESDAY, OCTOBER 21st***
Read 141-142
Why do we study relationships between two variables?
To make predictions, to explain phenomena
Read 143-144
 on page 143, describe how we used these principles in chapters 1-2
 explanatory variables also help predict changes in the response variable
Alternate Example: Identify the explanatory and response variables
a) amount of rain; weed growth
b) winning percentage of a basketball team; attendance at games
c) resting pulse rate; amount of daily exercise
Read 144-149 Scatterplots are the only choice for displaying relationships between two Q variables
How do you know which variable to put on which axis? Where do you start each axis?
Always plot the explanatory variable, if there is one, on the horizontal axis (the x-axis) of a scatterplot.
As a reminder, we usually call the explanatory variable the x and the response variable y. If there is no
explanatory-response distinction, either variable can go on the horizontal axis.
Not necessarily at (0,0)!
What is the easiest way to lose points when making a scatterplot?
Not labeling the x and y axis and the scale.
Alternate Example: Track and Field Day! The table below shows data for 13 students in a statistics class.
Each member of the class ran a 40-yard sprint and then did a long jump (with a running start). Make a
scatterplot of the relationship between sprint time (in seconds) and long jump distance (in inches).
Sprint Time (s)
Long Jump Distance (in)
5.41
171
5.05
184
9.49
48
8.09
151
7.01
90
7.17
65
6.83
94
6.73
78
8.01
71
5.68
130
5.78
173
6.31
143
6.04
141
Day 1: Describing Relationships
What four characteristics should you consider when interpreting a scatterplot?
 Direction: But not an absolute relationship
 Form: linear, non-linear--don’t let one or two points sway you! Clusters (show two positive assoc
clusters that form a neg assoc overall)
 Strength: how closely the points follow the form
 Outliers: outside the overall pattern (no specific rule)
Does a strong association between two variables indicate a cause-and-effect relationship?
 AP Calculus exams and global warming
 Ice cream sales and drowning deaths
 (Page 148)Other explanations for manatee deaths: changes in food supply, population getting larger,
but death rate staying the same.
Alternate Example: The following scatterplot shows the amount of
carbs (in grams) and amount of fat (in grams) of beef sandwiches
from McDonalds.
(a) Describe the relationship between carbs and fat.
(b) Is there a cause-and-effect relationship between carbs and fat?
Explain.
Read 149-150: Using technology to create scatterplots
 How to use different marking symbols
HW #1: page 158 (1, 5, 7, 9, 11)
Day 2: 3.1 Correlation
Just like two distributions can have the same shape and center with different spreads, two associations can have
the same direction and form, but very different strengths.
Read 150-151
What is the correlation r?
Correlation measures direction and strength of a LINEAR association between two quantitative
variables. It does not assess form!
What are some characteristics of the correlation?
 –1  r  1
 r < 0 means negative association, r > 0 positive association
 r close to 0 means weak
 r close to  1 means strong
Can you determine the form of a relationship using only the correlation?
No – You should always look at a graph of the data as well.
Is correlation a resistant measure of strength?
 An “outlier” in the pattern increases r
 An “outlier” out of the pattern decreases r
Read 152-155
Do you need to know the formula for correlation? NO! You should use your calculator to calculate r.
Read 155-156
What are some additional facts about correlation?
Correlation makes no distinction between the explanatory and response variables
Because r uses the standardized values of the observations, r does not change when we change the units
of x, y, or both.
The correlation r itself has no unit of measure. It is just a number.
Correlation does not describe a curved relationship.
HW #2: page 160 (15-18, 21, 27-32)
Day 3: 3.2 Least-Squares Regression
Read 164-167
What is the general form of a regression equation? What is the difference between y and ŷ ?
 𝑦̂ = a + bx
 𝑦̂ (read “y hat”) is the predicted value of the response variable for a given value of the explanatory
variable x.
 b is the slope, the amount by which y is predicted to change when x is increased by one unit.
 a is the y-intercept, the predicted value of y when x = 0.
 y is the actual value of the response variable for a given value of the explanatory variable x.
 Formula sheet on AP exam
Alternate Example: Used Hondas
The following scatterplot shows the number of
miles driven (in thousands) and advertised
price (in thousands) for 11 used Honda CR-Vs
from the 2002-2006 model years. The
regression line shown on the scatterplot is
ŷ = 18773 – 0.08618x.
 Note that the y intercept doesn’t appear
on the graph. Why?
a) Interpret the slope and y intercept of a regression line.
Miles
Driven
22000
29000
35000
39000
45000
49000
55000
56000
69000
70000
86000
Price
17998
16450
14998
13998
14599
14988
13599
14599
11998
14450
10998
b) Predict the price of a used CR-V with 50,000 miles.
c) Predict the price of a used CR-V with 250,000 miles. How confident are you in this prediction?
What is extrapolation? Is it a good idea to extrapolate?
Extrapolation is the use of a regression line for prediction far outside the interval of values of the
explanatory variable x used to obtain the line. Such predictions are often not accurate and therefore it
is not a good idea to extrapolate.
Alternate Example: Using the Track and Field data from
earlier, the equation of the least-squares regression line is
ŷ = 305 – 27.6x where y = long jump distance and
x = sprint time.
a) Interpret the slope.
b) Does it make sense to interpret the y-intercept? Explain.
Read 168-171
What is a residual? How do you interpret a residual?
 Vertical deviations that represent “leftover” variation in the response variable after fitting the
regression line.
 Residual = Actual – Predicted (AP)
 Connection with standard deviation and z-scores
 Must have size and direction of the error (overpredicts/underpredicts the y value by this much)
Calculate and interpret the residual for the Honda CR-V with 86,000 miles and an asking price of $10,998.
How can we determine the “best” regression line for a set of data?
The line that makes the sum of the residuals as small as possible. (Least-Squares Regression Line-LSRL)
Is the least-squares regression line resistant to outliers?
 Absolutely Not. The LSRL is influenced by outliers.
Alternate Example: McDonalds Beef Sandwiches.
Carbs (g)
Fat (g)
31
9
33
12
34
23
37
19
40
26
40
42
45
29
37
24
38
28
(a) Calculate the equation of the least-squares regression line using technology. Make sure to define variables!
Sketch the scatterplot with the graph of the least-squares regression line.
(b) Interpret the slope and y-intercept in context.
(c) Calculate and interpret the residual for the Big Mac, with 45g of carbs and 29g of fat.
HW #3: page 191 (35, 37, 39, 41, 45)
Day 4: 3.2 More about Residuals
Read 174-178
What is a residual plot? What is the purpose of a residual plot?
A residual plot is a scatterplot of the residuals against the explanatory variable x
The purpose of a residual plot is to investigate FORM!
It helps us assess how well a regression line fits the data.
What two things do you look for in a residual plot? How can you tell if a linear model is appropriate?
 Leftover patterns
 Size of residuals
 Linear model is appropriate if the residual plot has no pattern. Tube-like scatterplot around the zero
line.
Construct and interpret a residual plot for the Honda CR-V data.
What is the standard deviation of the residuals? How do you calculate and interpret it?
S = how far “off” the prediction values are on average
We calculate s using one variable statistics for the residual list.
Calculate and interpret the standard deviation of the residuals for the Honda CR-V data.
Suppose that you see a used Honda CR-V for sale. Predict the asking price for this CR-V.
Day 4: 3.2 More about Residuals
Read 179-181
What is the coefficient of determination r2? How do you calculate and interpret r2?
R2 is another numerical quantity that tells us how well the least-squares regression line predicts values
of the response variable y.
Interpret - ____% or proportion of the variation in (response variable) is explained by the linear model
relating (response variable) to (explanatory variable).
How is r 2 related to r? How is r 2 related to s?
 r 2 is literally r times r or r-squared.
 r 2 and s both measure how well the least-squares regression line models the data (how much scatter
there is from the least-squares regression line)
 s is measured in the units of the response variable, r2 is on a standard scale
Using the McDonalds beef sandwich data:
a) Construct and interpret a residual plot
b) Calculate and interpret the standard deviation of the residuals
c) Calculate and interpret r 2 .
HW #4 page 191 (53, 55, 57, 61)
Day 5: Chapter 3 Review (so far)
Using the top ten money winners from the 2009 LPGA Tour, we can
investigate the relationship between average driving distance and driving
accuracy using a scatterplot. Here we will use average driving distance
(in yards) as the explanatory variable and driving accuracy (proportion of
drives that land in the fairway) as the response variable.
1.
Player
Jiyai Shin
Lorena Ochoa
Ai Miyazato
Cristie Kerr
Na Yeon Choi
Suzann Pettersen
Yani Tseng
In-Kyung Kim
Paula Creamer
Anna Nordqvist
Average
Driving
Distance
246.8
265.2
254.3
263.7
255.5
268.1
269.2
249.3
248.6
245.7
Driving
Accuracy
0.824
0.718
0.757
0.714
0.733
0.660
0.654
0.748
0.811
0.770
Explain why average driving distance should be the explanatory
variable.
2. Draw a scatterplot for this association and discuss the noticeable
features.
3. Calculate the equation of the least squares regression line and graph it
on the scatterplot.
4. Interpret the slope and y-intercept in the context of the problem.
5. Calculate and interpret the value of the correlation coefficient.
6. If the distance was measured in feet instead of yards, how would the correlation change? Explain.
7. Calculate and interpret the residual for Lorena Ochoa.
8. Sketch the residual plot. What information does this provide?
9. Calculate and interpret the value of s in the context of the problem.
10. Calculate and interpret the value of r 2 in the context of the problem.
11. If you were to use the percentage of shots that land in the fairway instead of the proportion of shots that
land in the fairway, how would the values of r 2 and s change?
12. Predict the driving accuracy for an LPGA golfer with an average driving distance of 300 yards. How
confident are you in your prediction? Explain.
HW #6 page 193 (54, 56, 58, 59, 60)
Day 6: 3.2 Computer Output/ Regression Wisdom
Read 181-183
Alternate Example: Mentos and Diet Coke
When Mentos are dropped into a newly opened bottle of Diet Coke, carbon dioxide is released from the Diet
Coke very rapidly, causing the Diet Coke to be expelled from the bottle. Will more Diet Coke be expelled
when there is a larger number of Mentos dropped in the bottle? Two statistics students, Brittany and Allie,
decided to find out. Using 16 ounce (2 cup) bottles of Diet Coke, they dropped either 2, 3, 4, or 5 Mentos into a
randomly selected bottle, waited for the fizzing to die down, and measured the number of cups remaining in the
bottle. Then, they subtracted this measurement from the original amount in the bottle to calculate the amount of
Diet Coke expelled (in cups). Output from a regression analysis is shown below.
1.45
0.10
1.40
0.05
1.30
Residual
Amount Expelled
1.35
1.25
0.00
1.20
-0.05
1.15
1.10
-0.10
1.05
2.0
2.5
3.0
3.5
Mentos
4.0
4.5
5.0
(a) What is the equation of the least-squares
regression line? Define any variables you use.
1.15
Predictor
Constant
Mentos
1.20
1.25
Fitted Value
Coef
1.00208
0.07083
S = 0.0672442
= 58.4%
1.30
SE Coef
0.04511
0.01228
R-Sq = 60.2%
1.35
T
22.21
5.77
P
0.000
0.000
R-Sq(adj)
(b) Interpret the slope of the least-squares regression line.
(c) What is the correlation?
(d) Is a linear model appropriate for this data? Explain.
(e) Calculate and interpret the residual for bottle of diet coke that had 2 mentos and lost 1.25 cups.
(f) Interpret the values of r 2 and s.
Read 183-188
Does it matter which variable is x and which is y?
 Not for correlation, yes for regression.
What should you always do before calculating the correlation or least-squares regression line?
 Look at the scatterplot!!!!!
How do outliers affect the correlation, least-squares regression line, and standard deviation of the residuals?
Are all outliers influential?
CORRELATION
LSRL
S
Here is a scatterplot showing the cost in dollars and the battery
life in hours for a sample of netbooks (small laptop
computers). What effect do the two netbooks that cost $500
have on the equation of the least-squares regression line,
correlation, standard deviation of the residuals, and r 2 ?
Explain.
Here is a scatterplot showing the relationship between the number
of fouls and the number of points scored for NBA players in the
2010-2011 season.
a) Describe the association.
b) Should NBA players commit more fouls if they want to score more points? Explain.
HW #7 page 194 (63, 65, 71-78)
OUTLIERS
Day 7: 3.2 Regression to the Mean
Read 172-174
How can you calculate the equation of the least-squares regression line using summary statistics?
These are on the formula sheet!
What happens to the predicted value of y for each increase of 1 standard deviation in x?
HW #8 page 198 Chapter 3 Review Exercises
Day 8: Review & QUIZ
Day 9: Review & FRAPPY
MONDAY, NOVEMBER 3rd: CHAPTER 3 TEST
TUESDAY, NOVEMBER 4th: Cumulative Review
WEDNESDAY, NOVEMBER 5th: TEST (CHAPTERS 1-4) No Retest Available