Note

Prediction
Prediction
Using the
Average
Using the
Average
Part III
Correlation and Regression
Graph of
Averages
Examples
Examples
Overview
Fallacies
Fallacies
Non-linear
Data
Non-linear
Data
Regression
Fallacy
Regression
Fallacy
Extrapolation
Extrapolation
Using the
Average
Thr
Regression
Method
Predicting a value
The histogram below shows the heights of 1078 men. We pick
one man at random and we should guess his height. What is
our best guess?
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
0.15
Examples
Percentiles
Regression
with
Percentiles
Examples
Regression
Line
The Methods
0.10
Examples
Graph of
Averages
Examples
Percentiles
Regression
with
Percentiles
0.05
The Methods
Density per unit
Regression
Line
Fallacies
Fallacies
Extrapolation
0.00
Overview
Regression
Fallacy
70
65
70
75
80
Father's height (inches)
Now we will discuss another line: the regression line.
Predicting a value
As the histogram approximately follows the normal curve, our
best guess is the average.
Examples
Overview
Non-linear
Data
60
Chapter 10
and 12
Regression
Estimation
Graph of
Averages
75
Regression
with
Percentiles
Overview
Prediction
Son's height (inches)
Regression
with
Percentiles
Examples
Percentiles
60
Percentiles
The Methods
0.15
Examples
Regression
Line
0.10
The Methods
65
Chapter 10
Regression
Chapter 12
The Regression Line
Regression
Line
0.05
Graph of
Averages
Examples
Regression
Estimation
In the last chapter, we learned about the SD-line: the SD-line
goes through the point of averages and has slope SDx /SDy or
-SDx /SDy , depending on the sign of r .
Density per unit
Examples
Regression
Estimation
Thr
Regression
Method
60
62
64
66
68
70
Height (inches)
72
74
76
78
Non-linear
Data
Regression
Fallacy
Extrapolation
0.00
Thr
Regression
Method
Chapter 10
and 12
Context
Chapter 10
and 12
80
Chapter 10
and 12
60
62
64
66
68
70
Height (inches)
72
74
76
78
Regression
Chapter 10
and 12
Examples
Regression
Estimation
80
Thr
Regression
Method
In regression, we have two correlated variables. If we have
knowledge of the value of one variable, we can use this
knowledge to make better predictions about the value of the
other variable.
Graph of
Averages
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Son's height (inches)
The Methods
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Denition
The regression line for y on x estimates the average value of y
corresponding to each value of x . Associated with an increase
of 1 SD in x , there is an increase of only r SDs in y .
Graph of
Averages
Regression
Line
75
Regression
Line
The Methods
Examples
Percentiles
70
Using the
Average
Regression
with
Percentiles
Examples
65
Prediction
The regression line
Chapter 10
and 12
Overview
Fallacies
Fallacies
Non-linear
Data
60
Non-linear
Data
Regression
Fallacy
Regression
Fallacy
Extrapolation
60
65
70
75
80
Extrapolation
Father's height (inches)
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Extrapolation
80
Examples
Regression
Estimation
80
Graph of
Averages
The Methods
Examples
Percentiles
Regression
with
Percentiles
75
Regression
Line
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Thr
Regression
Method
70
Examples
Using the
Average
65
The Methods
Prediction
Son's height (inches)
Regression
Line
Son's height (inches)
Graph of
Averages
75
Examples
Regression
Estimation
70
Thr
Regression
Method
Denition
This way of using the correlation coecient to estimate the
average value of y for each value of x is called the regression
method and the resulting value of y is called the regression
estimate.
65
Using the
Average
60
Prediction
Example 1: the height of father
and son
Chapter 10
and 12
60
The regression method
Chapter 10
and 12
Non-linear
Data
60
65
70
75
Father's height (inches)
80
Regression
Fallacy
Extrapolation
60
65
70
Father's height (inches)
75
80
The gure shows heights of 1078 pairs of
fathers and sons. The summary statistics
are
• average height of father ≈ 68in (Avgx )
• SD for father's height ≈ 2.7in (SDx )
• average height of son ≈ 69in (Avgy )
• SD for son's height ≈ 2.8in (SDy )
• r = 0.50
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Fallacies
Non-linear
Data
60
Regression
Fallacy
80
Non-linear
Data
Regression
Fallacy
Father's height (inches)
The average heigth of their sons is only part of 1SDy above the
son's overall average height.
Chapter 10
and 12
Prediction
Using the
Average
Using the
Average
Thr
Regression
Method
80
Example 1: the height of father
and son
Thr
Regression
Method
Regression
Estimation
Examples
Regression
Line
The Methods
Examples
Graph of
Averages
70
Son's height (inches)
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Regression
with
Percentiles
60
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
0.15
62
64
66
68
70
72
74
76
78
Son's height (inches)
Example 1: the height of father
and son
Examples
65
Regression
Estimation
60
Extrapolation
75
Prediction
75
75
Chapter 10
and 12
70
70
Extrapolation
65
80
60
Examples
Overview
Son's height (inches)
Regression
with
Percentiles
Regression
Line
65
Percentiles
Graph of
Averages
60
Examples
Examples
Histogram showing the heights of sons whose fathers are around
1SDx above average in height (value of SD line in red, value of
regression line in blue).
0.10
The Methods
Son's height (inches)
Regression
Line
75
Graph of
Averages
Thr
Regression
Method
Regression
Estimation
80
Regression
Estimation
Using the
Average
Density per unit
Examples
The vertical strip represents fathers who are around 1SDx above
average in height (SD line in red, regression line in blue).
Prediction
0.05
Thr
Regression
Method
Example 1: the height of father
and son
Chapter 10
and 12
0.00
Using the
Average
70
Prediction
Example 1: the height of father
and son
65
Chapter 10
and 12
Examples
60
65
70
75
80
Father's height (inches)
This is where the correlation coecient r = 0.5 comes in.
Associated with an increase of 1SDx in height of fathers, there
is an increase of only 0.5SDy in height of sons, on average.
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
60
65
70
75
80
Father's height (inches)
Specically, take fathers who are 1SDx above average,
average of fathers + 1 × SDx = 68in + 1 × 2.7in = 70.7in
Examples
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Regression
with
Percentiles
60
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Examples
60
65
70
75
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
80
Father's height (inches)
The average height of their sons will be approximately
average of sons + 1 × r × SDy = 69in + 0.5 × 1 × 2.8in = 70.4in
Example 2: math SAT scores and
1st year GPAs
•
Examples
Regression
Estimation
•
•
•
SAT score:
Avg = 550, SD = 80
rst year GPA:
Avg = 2.6, SD = 0.6
r = 0.4
The scatter diagram is
football shaped
A student is chosen at random.
Predict his/her rst year GPA.
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
80
70
Graph of
Averages
Overview
Fallacies
60
Non-linear
Data
Regression
Fallacy
65
70
75
80
Father's height (inches)
Extrapolation
The regression line goes through the point of averages: fathers
of average height should also have sons of average height.
Example 2: math SAT scores and
1st year GPAs
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
•
Examples
•
Regression
Estimation
•
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
A student is chosen at random
and has SAT score 650. Predict
her/his rst year GPA.
75
The Methods
Son's height (inches)
Regression
Line
65
Graph of
Averages
Examples
70
Regression
Estimation
Examples
All the points with coordinates
father's height, estimate for son's height
will fall on the regression line.
65
Thr
Regression
Method
60
Using the
Average
Thr
Regression
Method
80
Prediction
Using the
Average
Regression
Estimation
Example 1: the height of father
and son
Chapter 10
and 12
75
Prediction
Example 1: the height of father
and son
Son's height (inches)
Chapter 10
and 12
Non-linear
Data
Regression
Fallacy
Extrapolation
•
SAT score: average = 550, SD = 80
rst year GPA: average = 2.6, SD = 0.6
r = 0.4
The scatter diagram is football shaped
A student is chosen at random. Predict his/her rst year GPA.
Solution:
our best guess is the average GPA: 2.6.
A student is chosen at random and has SAT score 650. Predict
her/his rst year GPA.
−550
this student is 65080
= 1.25SD above average on
the SAT. So the regression estimate for her GPA is
Solution:
2.6 + 0.4 × 1.25 × 0.6 = 2.9
68
Non-linear
Data
8
36
Examples
60
65
70
75
Father's height (inches)
Regression
Fallacy
Extrapolation
Graph of averages
Using the
Average
1. Start with the original data
Thr
Regression
Method
Fallacies
Graph of
Averages
Regression
Line
70
The Methods
Examples
Percentiles
65
Regression
with
Percentiles
Extrapolation
70
15
4
60
65
70
75
Father's height (inches)
The regression line is a smoothed version of this graph. If the
graph of averages follows a straight line, that line is the
regression line.
Graph of averages
2. Round each of the father's heights to the nearest inch.
Fallacies
Non-linear
Data
Regression
Fallacy
60
139
Examples
Overview
60
Examples
Overview
Son's height (inches)
Regression
with
Percentiles
75
Examples
Percentiles
115
Prediction
How do we do this?
Regression
Estimation
Examples
36
101
Chapter 10
and 12
Examples
The Methods
8
Fallacies
Regression
Estimation
Regression
Line
68
Examples
Overview
Chapter 10
and 12
Graph of
Averages
74
Regression
with
Percentiles
Non-linear
Data
142
6
3
134
3
Percentiles
139
101
4
Extrapolation
Thr
Regression
Method
The Methods
15
60
Regression
Fallacy
Using the
Average
115
64
Fallacies
Prediction
Regression
Line
134
3
Overview
142
6
3
66
157
50
157
50
64
70
77
Graph of
Averages
77
75
Examples
28
70
Regression
with
Percentiles
28
66
Percentiles
Son's height (inches)
Examples
Regression
Estimation
72
Graph of
Averages
The Methods
Thr
Regression
Method
65
Regression
Estimation
Regression
Line
Using the
Average
Examples
74
Examples
Prediction
60
Thr
Regression
Method
Denition
The graph of averages shows the average y -value for each given
x -value.
Son's height (inches)
Using the
Average
Son's height (inches)
Prediction
Graph of averages
Chapter 10
and 12
72
Graph of averages
Chapter 10
and 12
Non-linear
Data
55
60
65
70
Father's height (inches)
75
80
Regression
Fallacy
Extrapolation
55
60
65
70
Father's height (inches)
75
80
Thr
Regression
Method
Examples
Regression
Estimation
Regression
with
Percentiles
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
60
Examples
Overview
Fallacies
Non-linear
Data
55
Regression
Fallacy
Extrapolation
Prediction
Using the
Average
60
65
70
75
80
Father's height (inches)
How do we nd the regression line?
The regression line is the line that goes through the point of the
averages and has slope
Thr
Regression
Method
r
Examples
SDy
SDx
.
Regression
Estimation
Non-linear
Data
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Regression
Line
The Methods
The Methods
Examples
Examples
Percentiles
Percentiles
Regression
with
Percentiles
Regression
with
Percentiles
Examples
Examples
Overview
Overview
Regression
Fallacy
Extrapolation
value of y in standard units = r × value of x in standard units
or
zy
= r × zx .
70
75
80
How do we nd the regression line?
Like any other line, the regression line also has a standard
equation
y = slope × x + intercept
Examples
Graph of
Averages
A point on the line will thus always fulll
65
Father's height (inches)
Regression
Estimation
Regression
Line
Non-linear
Data
60
Extrapolation
Graph of
Averages
Fallacies
55
Regression
Fallacy
Fallacies
80
Chapter 10
and 12
Fallacies
75
Percentiles
The Methods
70
Examples
Regression
Line
65
The Methods
Graph of
Averages
70
Regression
Line
65
Son's height (inches)
Graph of
Averages
60
75
Examples
Regression
Estimation
4. Plot the regression line, a smoothed version of the graph of
averages.
75
Using the
Average
70
Prediction
65
Thr
Regression
Method
3. For each value of the father's heights, nd the average over
all corresponding son's heights.
60
Using the
Average
Son's height (inches)
Prediction
Graph of averages
Chapter 10
and 12
Son's height (inches)
Graph of averages
Chapter 10
and 12
Non-linear
Data
Regression
Fallacy
Extrapolation
60
65
70
75
Father's height (inches)
80
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
How do we nd the regression line?
The standard equation for the regression line is
y
= slope × x + intercept.
We know that the slope of the regression line is
slope = r
SDy
SDx
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Examples
Percentiles
We also know that the regression line goes through the point of
the averages. That is, the point (Avgx ,Avgy ) is on the
regression line. Therefore,
intercept = Avgy − slope × Avgx .
Regression
with
Percentiles
Example 3
Fallacies
Non-linear
Data
Regression
Fallacy
Regression method 1
Step 1: Convert x to standard units zx .
Step 2: Compute zy = r × zx .
Step 3: Convert zy back to original units y .
Regression method 2
Step 1: Find the slope of the regression line.
Step 2: Find the intercept of the regression line.
Step 3: Find y = slope × x + intercept.
Example 3
Chapter 10
and 12
Prediction
HANES study: height and
weight of 988 men age 18-24
• Height:
Avg = 70 inches,
SD = 3 inches
• Weight:
Avg = 162 pounds,
SD = 30 pounds
• r = 0.47
Estimate the average weight of
men that are 73 inches tall
We now have two methods which both will give us the same
regression estimate for y if we have a value x .
(Assume we also have Avgx , Avgy , SDx , SDy , and r )
Examples
Overview
Extrapolation
Prediction
Using the
Average
Using the
Average
The Methods
Extrapolation
Chapter 10
and 12
Prediction
Regression
Line
.
How do we nd a regression
estimate?
Chapter 10
and 12
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Let's rst look at the solution if we use regression
method 1:
Solution:
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
Step 1: Convert x to standard units zx .
Step 2: Compute zy = zx × r .
Step 3: Convert zy back to original units.
Here, height is the x value and weight is the y value.
Example 3
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Prediction
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
=
Extrapolation
Regression
Line
The Methods
Examples
Percentiles
x
− Avex
SDx
73 − 70
=
=1
3
Examples
Regression
Fallacy
Extrapolation
Using the
Average
Thr
Regression
Method
Examples
Graph of
Averages
Regression
Line
The Methods
Examples
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Let's then look at the solution if we use regression
method 2:
Solution:
Examples
Overview
Fallacies
= zy × SDy + Avey = 0.47 × 30 + 162 = 176
= 1 × 0.47 = 0.47
Example 3
Chapter 10
and 12
Regression
with
Percentiles
y
= zx × r .
Extrapolation
Percentiles
back to original units.
zy
zy
Regression
Fallacy
Solution:
zy
Step 2: Compute
Non-linear
Data
Regression
Estimation
Step 3: Convert
Solution:
Fallacies
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Fallacies
Non-linear
Data
The Methods
Prediction
Regression
with
Percentiles
Overview
Regression
Line
Examples
Prediction
Graph of
Averages
Graph of
Averages
Overview
Example 3
Chapter 10
and 12
Examples
Examples
Regression
Estimation
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Regression
with
Percentiles
to standard units zx .
x
zx
Regression
Fallacy
Regression
Estimation
Thr
Regression
Method
Percentiles
Step 1: Convert
Non-linear
Data
Thr
Regression
Method
Using the
Average
Examples
Solution:
Fallacies
Using the
Average
Example 3
Chapter 10
and 12
Non-linear
Data
Regression
Fallacy
Extrapolation
Step 1: Find the slope of the regression line.
Step 2: Find the intercept of the regression line.
Step 3: Find y = slope × x + intercept.
Again, height is the x value and weight is the y value.
Example 3
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Prediction
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Step 1: Find the slope of the regression line.
Non-linear
Data
slope = r
Regression
Fallacy
Extrapolation
SDy
SDx
30
= 0.47 ×
= 4.7
3
Example 3
Chapter 10
and 12
Prediction
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Regression
Fallacy
Extrapolation
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Regression
with
Percentiles
Examples
Overview
Non-linear
Data
Regression
Fallacy
Solution:
Step 2: Find the intercept of the regression line.
intercept = Avgy − slope × Avgx = 162 − 4.7 × 70 = −167
Extrapolation
Chapter 10
and 12
Using the
Average
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Regression
Estimation
Solution:
Percentiles
Step 3: Find
HANES study: height and weight of 988 men age 18-24
• Height: average = 70 inches, SD = 3 inches
• Weight: average = 162 pounds, SD = 30 pounds
• Correlation coecient r = 0.47
Estimate the average weight of men that are 73 inches tall
Fallacies
Prediction
Thr
Regression
Method
Examples
Graph of
Averages
Regression
Line
The Methods
Examples
Example 4: percentile (ranks)
SAT scores and rst year GPA:
• SAT score: average = 550, SD = 80
• rst year GPA: average = 2.6, SD = 0.6
• r = 0.4 and the scatter diagram is football shaped
A student is chosen at random, and is at the 90th percentile of
the SAT scores. Predict his/her percentile rank on the rst-year
GPA.
Regression
with
Percentiles
y
= slope × x + intercept.
Fallacies
Non-linear
Data
Thr
Regression
Method
Percentiles
Fallacies
Using the
Average
Using the
Average
Examples
Solution:
Example 3
Chapter 10
and 12
Examples
Overview
Fallacies
y
= 4.7 × 73 − 167 = 176
Non-linear
Data
Regression
Fallacy
Extrapolation
What do we do now?
Regression for percentiles and
percentile ranks
Chapter 10
and 12
Prediction
Prediction
Using the
Average
Using the
Average
Thr
Regression
Method
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Examples
If we are interested in percentiles and percentile ranks, we must
change our regression method as we don't have x , but the
percentile. Also, we are not interested in nding y , but the
percentile rank.
Regression
Estimation
It still holds that
Percentiles
zy
= r × zx .
Further, we have learned that we can use the normal table to
get zx from the percentile or vice versa.
Graph of
Averages
Regression
Line
The Methods
Examples
Regression
with
Percentiles
Examples
Overview
Regression
Fallacy
Extrapolation
Chapter 10
and 12
Prediction
Prediction
Using the
Average
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
SAT scores and rst year GPA:
• SAT score:
Avg = 550, SD = 80
• rst year GPA:
Avg = 2.6, SD = 0.6
• r = 0.4 and the scatter
diagram is football shaped
A student is chosen at random,
and is at the 90th percentile of
the SAT scores. Predict his/her
percentile rank on the rst-year
GPA.
Regression method 1, for percentiles and percentile rank
Step 1: Find zx using the normal table.
Step 2: Compute zy = zx × r .
Step 3: Convert zy to percentile rank using the normal table.
Non-linear
Data
Extrapolation
Example 4: percentile (ranks)
We can thus use the following method:
Fallacies
Regression
Fallacy
Chapter 10
and 12
Regression for percentiles and
percentile ranks
Chapter 10
and 12
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
Example 4: percentile (ranks)
SAT scores and rst year GPA:
• SAT score: average = 550, SD = 80
• rst year GPA: average = 2.6, SD = 0.6
• r = 0.4 and the scatter diagram is football shaped
A student is chosen at random, and is at the 90th percentile of
the SAT scores. Predict his/her percentile rank on the rst-year
GPA.
Solution:
Step 1: Find
using the normal table.
90th percentile rank ⇒ 10% of the area is to the right of z
⇒ 80% of the area is between −z and z
⇒ normal table says: 80.64% of the area is between -1.3 and
1.3
⇒ zx ≈ 1.3
zx
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Solution:
Examples
Examples
Overview
Overview
Fallacies
Non-linear
Data
Step 2: Compute
zy
= zx × r .
Regression
Fallacy
Extrapolation
zy
= 1.3 × 0.4 = 0.52
Regression for percentiles and
percentile ranks
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
Regression method 1, for percentiles and percentile rank
Step 1: Find zx using the normal table.
Step 2: Compute zy = zx × r .
Step 3: Convert zy to percentile rank using the normal table.
Fallacies
•
•
Non-linear
Data
Regression
Fallacy
Extrapolation
Extrapolation
•
Note that we did not use information about average and
SD!
We only used the normal table and r because the whole
problem is worked in standard units.
We can use the normal table because the scatter diagram
is football shaped.
zy
to percentile rank using the normal table.
By normal table: 38.29% of the area is between -0.5 and 0.5
⇒ 50% − 0.5 × 38.29% = 30.86% of the area is left of -0.5
⇒ 30.86% + 38.29% = 69.15% of the area is left of zy = 0.5
⇒ We perdict that the student is at the 69th percentile rank on
the rst-year GPA
Regression method 1, an overview
Chapter 10
and 12
Prediction
Thr
Regression
Method
Examples
y
x
Regression
Estimation
Graph of
Averages
Regression
Line
Examples
Comments:
Examples
Overview
Regression
Fallacy
Step 3: Convert
The Methods
Examples
Regression
with
Percentiles
Non-linear
Data
Using the
Average
The Methods
Percentiles
Fallacies
Solution:
6
Regression
Line
Thr
Regression
Method
SAT scores and rst year GPA:
• SAT score: average = 550, SD = 80
• rst year GPA: average = 2.6, SD = 0.6
• r = 0.4 and the scatter diagram is football shaped
A student is chosen at random, and is at the 90th percentile of
the SAT scores. Predict his/her percentile rank on the rst-year
GPA.
5
Graph of
Averages
Using the
Average
3
Examples
Regression
Estimation
Prediction
zx
zy = r × zx
zy
Percentiles
Regression
with
Percentiles
2
Thr
Regression
Method
SAT scores and rst year GPA:
• SAT score: average = 550, SD = 80
• rst year GPA: average = 2.6, SD = 0.6
• r = 0.4 and the scatter diagram is football shaped
A student is chosen at random, and is at the 90th percentile of
the SAT scores. Predict his/her percentile rank on the rst-year
GPA.
Examples
Overview
Fallacies
1
Using the
Average
Example 4: percentile (ranks)
Non-linear
Data
Regression
Fallacy
Extrapolation
0
Prediction
Chapter 10
and 12
4
Example 4: percentile (ranks)
Chapter 10
and 12
percentile
percentile rank
Examples
Examples
Regression
Estimation
Graph of
Averages
Graph of
Averages
Regression
Line
Regression
Line
The Methods
The Methods
Examples
Examples
Percentiles
Percentiles
Regression
with
Percentiles
Regression
with
Percentiles
Examples
Examples
Overview
Overview
Fallacies
Fallacies
Non-linear
Data
Non-linear
Data
Regression
Fallacy
Regression
Fallacy
Extrapolation
Extrapolation
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Using the
Average
Thr
Regression
Method
65
70
75
When not to use the regression line
If there is a non-linear association between the two variables,
the regression line smoothes away too much.
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
55
60
65
70
75
Father's height (inches)
80
Extrapolation
80
Father's height (inches)
Graph of
Averages
Regression
Fallacy
Extrapolation
Prediction
60
Regression
Estimation
75
Regression
Line
Son's height (inches)
Graph of
Averages
Chapter 10
and 12
55
Examples
80
Examples
Regression
Estimation
70
Thr
Regression
Method
For each scatter diagram, two regression lines can be drawn:
one for predicting y on x , and another one for predicting x on
y.
65
Using the
Average
60
Prediction
Two regression lines
80
Thr
Regression
Method
Regression
Estimation
Chapter 10
and 12
If r is between 0 and 1, we predict something in between, and
the regression method tells us precisely what.
75
Using the
Average
70
Thr
Regression
Method
Prediction
If r = 1, we would predict y = x .
If r = 0, we would predict y = Avgy .
60
Using the
Average
Regression to the mean
Son's height (inches)
Prediction
Chapter 10
and 12
65
Regression to the mean
Chapter 10
and 12
Then it is better to use the graph of averages.
If there is a non-linear association between the two variables,
the regression line smoothes away too much.
Using the
Average
Thr
Regression
Method
Examples
Examples
Regression
Estimation
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
1
5
39
31
Examples
Fallacies
Extrapolation
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
39
31
The Methods
Percentiles
60
53
58
60
68
66
59
60
Regression
with
Percentiles
Examples
53
58
60
Overview
68
Fallacies
66
59
Non-linear
Data
Non-linear
Data
Regression
Fallacy
Regression
Line
Examples
Percentiles
Overview
1
5
Graph of
Averages
Examples
Regression
with
Percentiles
If there is a non-linear association between the two variables,
the regression line smoothes away too much.
Then it is better to use the graph of averages.
The regression fallacy
Preschool program for boosting children's IQs
• Children are tested when they enter (pre-test)
• Children are tested when they leave (post-test)
Results:
• Pre-test: average = 100, SD = 15
• Post-test: average = 100, SD = 15
So it seems the program didn't have much eect.
A closer look at the data showed:
• Children who were below average on the pre-test had an
average gain of 5 IQ points
• Children who were above average on the pre-test had an
average loss of about 5 IQ points
Regression
Fallacy
Extrapolation
Then it is better to use the graph of averages.
The regression fallacy
Chapter 10
and 12
Prediction
Using the
Average
Thr
Regression
Method
What is going on? Actually, nothing but chance error. The base
equation is
Examples
Regression
Estimation
Graph of
Averages
Regression
Line
observed test score = true score + chance error
Suppose the chance error is as likely to be negative as positive.
0.030
Thr
Regression
Method
Prediction
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
Density
Using the
Average
When not to use the regression line
Chapter 10
and 12
0.015
Prediction
When not to use the regression line
0.000
Chapter 10
and 12
60
80
100
120
140
Test Score
Assume too that the distribution of the scores follows the
normal curve, with an average of 100 and an SD of 15.
The regression fallacy
Chapter 10
and 12
Graph of
Averages
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Extrapolation
Chapter 10
and 12
Density
Examples
Regression
Estimation
0.000
Thr
Regression
Method
Using the
Average
Thr
Regression
Method
0.015
Using the
Average
Examples
Regression
Estimation
Graph of
Averages
60
80
100
120
140
Regression
Line
The Methods
Test Score
Examples
Now consider a child who scored 130 on the rst test. There
are two explanations
• true score below 130, with a positive chance error
• true score above 130, with a negative chance error
The rst explanation is more likely, because there are more
children with a true IQ somewhat below 130 than children with
a true score somewhat above 130.
Warning
Percentiles
Regression
with
Percentiles
Examples
Overview
Fallacies
Non-linear
Data
Regression
Fallacy
Chapter 10
and 12
Prediction
Using the
Average
Using the
Average
Thr
Regression
Method
Thr
Regression
Method
Examples
Examples
Regression
Estimation
Regression
Estimation
Graph of
Averages
The Methods
Examples
Percentiles
Regression
with
Percentiles
Graph of
Averages
The regression line can be used to make predictions for
individuals. But if you have to extrapolate far from the data, or
to a dierent group of subjects, be careful.
If someone scores above (below) average on the rst test, the
true score is likely to be a bit lower (higher) than the observed
score. If this person takes the test again, the second score is
likely to be a little bit lower (higher) than the rst.
Denition
In test-retest situation, the bottom group on the rst test will
on average show some improvement on the second test - and
the top group will fall back. This is the regression eect.
Denition
Thinking that the regression eect must be due to something
important, not just chance error, is called the regression fallacy.
Extrapolation
Prediction
Regression
Line
The regression fallacy
Prediction
0.030
Prediction
Chapter 10
and 12
Regression
Line
The Methods
Examples
Percentiles
Regression
with
Percentiles
Examples
Examples
Overview
Overview
Fallacies
Fallacies
Non-linear
Data
Non-linear
Data
Regression
Fallacy
Regression
Fallacy
Extrapolation
Extrapolation
Example 5: The Olympic Games
2156
Chapter 10
and 12
Prediction
Example 5: The Olympic Games
2156
Chapter 10
and 12
Prediction
Using the
Average
Using the
Average
Thr
Regression
Method
Thr
Regression
Method
Examples
Examples
Regression
Estimation
Regression
Estimation
Graph of
Averages
Graph of
Averages
Regression
Line
Regression
Line
The Methods
The Methods
Examples
Examples
Percentiles
Percentiles
Regression
with
Percentiles
Regression
with
Percentiles
Examples
Examples
Overview
Overview
Fallacies
Fallacies
Non-linear
Data
Non-linear
Data
Regression
Fallacy
Regression
Fallacy
Extrapolation
Extrapolation
Example 5: The Olympic Games
2156