Predicting from correlations

Predicting from correlations
Phil 12: Logic and Decision Making
Fall 2010
UC San Diego
10/25/2010
Monday, October 25, 2010
Review
• Correlations: relations between variables
- May or may not be causal
- Enable prediction of value of one variable
from value of another
• To test correlational (and causal) claims, need
to make predictions that are testable
Monday, October 25, 2010
Operationally “define” terms
Construct validity
Construct Validity
• Does the way you operationalize a variable really
capture that variable?
-
Does a ruler (grains of barley) really measure
height?
-
Does an intelligence test measure intelligence?
Monday, October 25, 2010
Does a word-list test measure memory?
Does the Body Mass Index (BMI) really
measure the health of your body weight?
Clicker question 1
If someone wanted to object that operationally defining
fitness in terms of how much a person can bench press
lacks construct validity, a good strategy would be to:
A. Forget it—this is a fine operational definition
B. Find a counter example—an individual who is fit but
can’t bench press very much
C. Find a counter example—an individual who is not fit
but can bench press a lot
D. Show that how much a person can bench press is
not a good measure of fitness
Monday, October 25, 2010
Operational definitions are not
definitions
• An operational definition provides one way to
measure a variable
-
There will typically be alternatives
The alternatives may not always agree
• Even when construct validity is high, the
operational definition does not provide
necessary and sufficient conditions for the term
Monday, October 25, 2010
so a single counterexample is not
problematic
Relating Score Variables
•
•
Same items measured on two score variables
Is there any systematic relation between the score
on one variable and the score on another?
Monday, October 25, 2010
Clicker question 2
•
•
Same items measured on two score variables
Is there any systematic relation between the score on
one variable and the score on another?
Participant 1 2 3 4
Spelling
15 14 15 12
Math
12 17 17 12
A. Yes
B. No
C. Not sure
Monday, October 25, 2010
5
6
8
6
4
5
7
8
10
8
9
9
9
9
8
10 11 12 13 14 15
12 18 13 10 10 11
14 16 14 10 13 15
Relating Score Variables
•
•
Same items measured on two score variables
Is there any systematic relation between the score
on one variable and the score on another?
Participant 1 2 3 4
Spelling
15 14 15 12
Math
12 17 17 12
•
5
6
8
6
4
5
7
8
10
8
9
9
9
9
8
10 11 12 13 14 15
12 18 13 10 10 11
14 16 14 10 13 15
Often it is difficult to determine if there is a
regular pattern by just looking at scores
(i.e., eyeballing the data)
Monday, October 25, 2010
Important to graph or diagram the data
Scatterplots
Participant 1 2 3 4
Spelling
15 14 15 12
Math
12 17 17 12
5
6
8
6
4
5
7
8
10
8
9
9
9
9
8
10 11 12 13 14 15
12 18 13 10 10 11
14 16 14 10 13 15
20
Math scores
16
12
8
4
0
0
4
8
12
Spelling scores
Monday, October 25, 2010
16
20
Scatterplots - 2
No correlation
Negative correlation
Monday, October 25, 2010
Positive correlation
Nonlinear correlation
Measuring correlation
•
Karl Pearson developed a measure of correlation,
known as Pearson’s Product Moment
Correlation (r)
-1.0
Perfect
negative correlation
Monday, October 25, 2010
0
1.0
No correlation
Perfect
positive correlation
Pearson Correlation Coefficient
Participant 1 2 3 4
Spelling
15 14 15 12
Math
12 17 17 12
•
5
6
8
6
4
5
7
8
10
8
9
9
9
9
8
10 11 12 13 14 15
12 18 13 10 10 11
14 16 14 10 13 15
Pearson’s Product Moment Correlation r = .857
-
Monday, October 25, 2010
Notable features:
•
Positive value: as spelling score increases, math score
tends to increase as well
•
Very high: correlation strong (as opposed to moderate
or weak)
Therefore: Strong positive correlation between spelling
scores and math scores
Correlation Coefficients
•
Monday, October 25, 2010
Height and weight are positively correlated
-
In this graph, Pearson r=.67
-
Contains two subgroups: men (•) and women (•)
May exhibit different correlations
- For females (red) only, r =.47
- For males (blue) only, r = .68
How much does the
correlation account for?
•
•
Correlations are typically not perfect (r=1 or r=-1)
-
Evaluate the correlation in terms of how much of the variance in
one variable is accounted for by the variance in another
•
variance=∑ (X-mean)2/N
Amount of variance of Y accounted for (on the variable whose value
is being predicted) equals:
Variance explained/total variance
-
This turns out to be the square of the Pearson coefficient: r2
This means, for variables X and Y:
•
•
Monday, October 25, 2010
For r=.80, 64% of the variance of Y is explained by variance of X
For r=.30, 9% of the variance of Y is explained by variance of X
Variance Accounted for
r2
= .56
'%"
'%"
'$"
'$"
'#"
'#"
'!"
'!"
&"
&"
%"
%"
$"
$"
#"
#"
!"
!"
#"
$"
%"
&"
'!"
'#"
!"
!"
#"
r = .75
r2 = .30
%"
&"
'!"
'#"
r = -.75
'%"
'%"
"%'
'$"
'$"
"$'
'#"
'#"
"#'
'!"
'!"
"!'
&"
&"
"&
%"
%"
"%
$"
$"
"$
#"
#"
"#
!"
!"
!"
#"
$"
%"
&"
r = .55
Monday, October 25, 2010
$"
'!"
'#"
"!
"#!"'
"!#"'
"$"
&
"%"
%
"&"
$
r = -.55
'!"
"#
'#"
"!
Variance accounted for - 2
•
Height only partially accounts for weight
-
Monday, October 25, 2010
For females, r =.47, so r2=.22
For males, r = .68, so r2=.46
Clicker question 3
For the correlation between the average speed a person
drives and gas mileage, r = -.90.
This indicates:
A.
higher average speed is a strong predictor of higher gas
mileage
B.
higher average speed is a weak predictor of higher gas mileage
C.
higher average speed is a strong predictor of lower gas
mileage
D. higher average speed is a weak predictor of lower gas mileage
Monday, October 25, 2010
Clicker question 4
For the correlation between the average speed a
person drives and gas mileage, r = -.90.
How much of the variance in gas mileage can be
accounted for by average speed?
A. 90%
B. 19%
C. 81%
D. Cannot tell from the information given
Monday, October 25, 2010
Prediction
•
•
A major reason to be interested in correlation
-
If two variables are correlated, we can use the value of
an item on one variable to predict the value on
another
•
Employment prediction: prediction of future job
performance based on years of experience
•
Actuarial prediction: how long one will live based on
how often one skydives
•
Risk assessment: prediction of how much risk an
activity poses in terms of its values on other variables
Prediction employs the regression line
Monday, October 25, 2010
Criterion variable
Regression line
•
Start with scatter plot
of data points
•
Find line which allows
for the best prediction
of the criterion
variable (one to be
predicted) from that of
the predictor variable
Predictor variable
Monday, October 25, 2010
red line which
minimizes the
(square of the)
distances of the blue
lines
Equation for Regression line
y = a + bx
y = predicted or criterion variable
x = predictor variable
a = y-intercept—regression constant
b = slope—regression coefficient
Note: the regression coefficient is
not the same as the Pearson
coefficient r
Monday, October 25, 2010
Understanding the Regression Line
• Assume the regression line equation between
the variables mpg (y) and weight (x) of several
car models is
- mpg = 62.85 - 0.011*weight
The regression constant, 62.85, represents
the projected value of a car weighing 0 lbs
MPG is expected to decrease by 1.1 mpg
for every additional 100 lb. in car weight
-
Monday, October 25, 2010