Section 7.2: Determining the Four Characteristics of an Association

SECTION 7.2
MATH 24: PRESTATISTICS
Section 7.2: Determining the Four
Characteristics of an Association
Objective 1: Determine the Shape of an Association
Objectives
o
o
o
o
o
o
Determine the shape of
an association
Determine the strength
of an association
Compute and interpret
the linear correlation
coefficient
Determine whether
there are any outliers
and what do with them
Determine the four
characteristics of an
association in the
correct order
Explain why strong (or
weak) association does
not guarantee causation
In general, if the points of a scatterplot lie close to (or directly on) a line, we
say the variables are linearly associated and that there is a linear
association. If the points of a scatterplot lie close to (or directly on) a curve
that is not a line, we say there is a nonlinear association. If no line or curve
comes close to all the points of a scatterplot, we say there is no association.
Example: Given the following scatterplots, determine whether there is a
linear association, a nonlinear association, or no association.
(a)
(b)
___________________
(c)
_________________
(d)
___________________
_________________
Objective 2: Determine the Strength of an Association
If a curve or line directly passes through all of the points of a scatterplot, we say that there is an exact
association with respect to the curve or line. If a curve comes quite close to all the points, we say there
is a strong association with respect to the curve or line. If a curve comes somewhat close to all the
points, we say there is a weak association.
Example: Determine whether the following associations are exact, strong, or weak.
1
Ledesma
SECTION 7.2
MATH 24: PRESTATISTICS
Objective 3: Compute and Interpret the Linear Correlation Coefficient
Just as there is a way to measure the spread of a distribution for a single numerical variable by using the
standard deviation, there is a way to measure the strength of a linear association between two
numerical variables. To measure the strength of an association, we will use the linear correlation
coefficient, which we represent with the letter 𝒓. (Since it is difficult to calculate π‘Ÿ by hand, we will use
our calculators to calculate this value for us.)
Properties of the Linear Correlation Coefficient
Assume π‘Ÿ is the linear correlation coefficient for the association between two numerical variables.
Then,
o
o
o
o
o
The values of π‘Ÿ are between _____ and _____, inclusive.
If π‘Ÿ is ________________, then the variables are _______________ associated.
If π‘Ÿ is ________________, then the variables are _______________ associated.
If π‘Ÿ = _____ , then there is no linear association.
The larger the value of |π‘Ÿ|, the stronger the linear association will be.
ο‚§
ο‚§
o
o
Two variables with a linear correlation coefficient of 0.98 will have a stronger linear association
than two variables with a linear correlation coefficient of 0.56.
Two variables with a linear correlation coefficient of -0.98 will have a stronger linear association
than two variables with a linear correlation coefficient of -0.56.
If π‘Ÿ = ______ , then the points lie exactly on a line and there is an exact, positive, linear
association.
If π‘Ÿ = ______ , then the points lie exactly on a line and there is an exact, negative, linear
association.
Example: Match the given linear correlation coefficients with the given scatterplots. Then, describe the
strength of linear association for each scatterplot.
π‘Ÿ=0
2
π‘Ÿ = 0.6
π‘Ÿ = βˆ’1
π‘Ÿ = βˆ’0.9
π‘Ÿ=1
π‘Ÿ = 0.9
Ledesma
SECTION 7.2
MATH 24: PRESTATISTICS
PROCEDURE: TO FIND THE LINEAR CORRELATION COEFFICIENT USING THE TI83/TI84,
1. ENTER THE VALUES OF YOUR EXPLANATORY VARIABLE IN L1 AND ENTER THE VALUES OF YOUR RESPONSE VARIABLE
IN L2.
2. PRESS |STAT| > |CALC| THEN SELECT 4:LINREG(ax+b).
o IF YOU HAVE A TI83, THEN PRESS |ENTER|.
o IF YOU HAVE A TI84, THE LEFT SCREEN WILL SHOW. FOR XLIST, TYPE IN L1 AND FOR YLIST, TYPE IN L2 AND
KEEP FREQLIST AND STORE REGEQ BLANK. THEN, GO TO CALCULATE AND PRESS |ENTER|.
Linear correlation
coefficient
Objective 4: Determine the Four Characteristics of an Association in the Correct Order
We determine the four characteristics of an association in the following order:
1. Identify all outliers.
o
o
For outliers that stem from errors in measurement or recording, we would correct the errors if
possible. If the errors cannot be corrected, we would remove the outliers.
For other outliers, determine whether they should be analyzed in a separate study.
2. Determine the shape of the association (linear, nonlinear, or none).
3. Determine the strength of the association (exact, strong, or weak).
o If the association is linear, then base the strength of the association on the scatterplot
AND the linear correlation coefficient.
o If the association is nonlinear, then base the strength of the association on the
scatterplot only.
4. Determine the direction (positive, negative, or neither).
Objective 5: Explain Why a Strong (or Weak) Association Does Not Mean Causation
The scatterplot to the left compares the U.S. per-person consumption of margarine (in
pounds) and the divorce rate in Maine. If we were to calculate the linear correlation
coefficient, we would find that π‘Ÿ = 0.99 which means there is a strong, positive, linear
association between the amount of margarine consumption and the divorce rate in
Maine. Does this mean that the amount of margarine consumed causes the divorce
rate in Maine to also increase? Of course not! All we can say is that there is an
association, as the amount of margarine consumption increases, it so happens that
the divorce rate in Maine also increases.
For another example, it turns out that as ice cream consumption increases, drowning deaths increase In other
words, there is a positive association between ice cream consumption and drowning deaths. However, ice cream
consumption does not cause drowning deaths. So why is there a positive association? In this case, there is a lurking
variable which is temperature. During the hot summer months, ice cream consumption increases, and drowning
deaths increase too, because people are more likely to go swimming when it is hot out.
3
Ledesma
SECTION 7.2
MATH 24: PRESTATISTICS
Example: The heights and weights of the 60 players picked in the 2014 draft for NBA basketball teams are
described by the scatterplot below.
(a) Describe the four characteristics of the association. You do not have to compute π‘Ÿ.
(b) What does the positive association mean in this situation?
(c) The correlation coefficient is 0.72. Does this support your analysis of the strength of the association in part
(a)? Explain.
(d) Estimate the weight of the 2014 draft pick Alec Brown, who is 85 inches tall.
(e) There are 4 players with a height of 74 inches. Explain why there are only three such points.
(f) If a player had a late growth spurt, would that guarantee he would gain weight?
4
Ledesma
SECTION 7.2
MATH 24: PRESTATISTICS
Example: The figure below displays a scatterplot that compares the percentage of adults who exercise with the
percentage of adults who are obese for each of the 50 states, Puerto Rico, and District of Columbia.
(a) Explain why the red dot in the scatterplot might be considered an outlier. What does this mean in this
situation?
(b) In part (a), you analyzed a possible outlier. Describe the other three characteristics of an association. You do
not have to compute π‘Ÿ.
Example: The governor of a certain state says parents should exercise more to set a good example for their
teenagers. The percentages of parents who exercise and the percentages of teenagers who exercise are compared
by the scatterplot below for the 40 states in which the data were available. Do the scatterplot and π‘Ÿ support the
governor’s assumption that a change in parents’ exercise habits will lead to a change in their teenagers exercise
habits?
5
Ledesma
SECTION 7.2
MATH 24: PRESTATISTICS
Example: The Women’s 500-Meter Speed Skating Times are given below for various years. Let t be the number of
years since 1970 and let w be the winning time.
(a) Construct a scatterplot.
(b) Is there a linear association, a nonlinear association, or no association?
(c) Compute π‘Ÿ. On the basis of the scatterplot and π‘Ÿ, determine the strength of the association.
(d) Is the association positive, negative, or neither? What does that mean in this situation.
6
Ledesma