Predicting from correlations Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/25/2010 Monday, October 25, 2010 Review • Correlations: relations between variables - May or may not be causal - Enable prediction of value of one variable from value of another • To test correlational (and causal) claims, need to make predictions that are testable Monday, October 25, 2010 Operationally “define” terms Construct validity Construct Validity • Does the way you operationalize a variable really capture that variable? - Does a ruler (grains of barley) really measure height? - Does an intelligence test measure intelligence? Monday, October 25, 2010 Does a word-list test measure memory? Does the Body Mass Index (BMI) really measure the health of your body weight? Clicker question 1 If someone wanted to object that operationally defining fitness in terms of how much a person can bench press lacks construct validity, a good strategy would be to: A. Forget it—this is a fine operational definition B. Find a counter example—an individual who is fit but can’t bench press very much C. Find a counter example—an individual who is not fit but can bench press a lot D. Show that how much a person can bench press is not a good measure of fitness Monday, October 25, 2010 Operational definitions are not definitions • An operational definition provides one way to measure a variable - There will typically be alternatives The alternatives may not always agree • Even when construct validity is high, the operational definition does not provide necessary and sufficient conditions for the term Monday, October 25, 2010 so a single counterexample is not problematic Relating Score Variables • • Same items measured on two score variables Is there any systematic relation between the score on one variable and the score on another? Monday, October 25, 2010 Clicker question 2 • • Same items measured on two score variables Is there any systematic relation between the score on one variable and the score on another? Participant 1 2 3 4 Spelling 15 14 15 12 Math 12 17 17 12 A. Yes B. No C. Not sure Monday, October 25, 2010 5 6 8 6 4 5 7 8 10 8 9 9 9 9 8 10 11 12 13 14 15 12 18 13 10 10 11 14 16 14 10 13 15 Relating Score Variables • • Same items measured on two score variables Is there any systematic relation between the score on one variable and the score on another? Participant 1 2 3 4 Spelling 15 14 15 12 Math 12 17 17 12 • 5 6 8 6 4 5 7 8 10 8 9 9 9 9 8 10 11 12 13 14 15 12 18 13 10 10 11 14 16 14 10 13 15 Often it is difficult to determine if there is a regular pattern by just looking at scores (i.e., eyeballing the data) Monday, October 25, 2010 Important to graph or diagram the data Scatterplots Participant 1 2 3 4 Spelling 15 14 15 12 Math 12 17 17 12 5 6 8 6 4 5 7 8 10 8 9 9 9 9 8 10 11 12 13 14 15 12 18 13 10 10 11 14 16 14 10 13 15 20 Math scores 16 12 8 4 0 0 4 8 12 Spelling scores Monday, October 25, 2010 16 20 Scatterplots - 2 No correlation Negative correlation Monday, October 25, 2010 Positive correlation Nonlinear correlation Measuring correlation • Karl Pearson developed a measure of correlation, known as Pearson’s Product Moment Correlation (r) -1.0 Perfect negative correlation Monday, October 25, 2010 0 1.0 No correlation Perfect positive correlation Pearson Correlation Coefficient Participant 1 2 3 4 Spelling 15 14 15 12 Math 12 17 17 12 • 5 6 8 6 4 5 7 8 10 8 9 9 9 9 8 10 11 12 13 14 15 12 18 13 10 10 11 14 16 14 10 13 15 Pearson’s Product Moment Correlation r = .857 - Monday, October 25, 2010 Notable features: • Positive value: as spelling score increases, math score tends to increase as well • Very high: correlation strong (as opposed to moderate or weak) Therefore: Strong positive correlation between spelling scores and math scores Correlation Coefficients • Monday, October 25, 2010 Height and weight are positively correlated - In this graph, Pearson r=.67 - Contains two subgroups: men (•) and women (•) May exhibit different correlations - For females (red) only, r =.47 - For males (blue) only, r = .68 How much does the correlation account for? • • Correlations are typically not perfect (r=1 or r=-1) - Evaluate the correlation in terms of how much of the variance in one variable is accounted for by the variance in another • variance=∑ (X-mean)2/N Amount of variance of Y accounted for (on the variable whose value is being predicted) equals: Variance explained/total variance - This turns out to be the square of the Pearson coefficient: r2 This means, for variables X and Y: • • Monday, October 25, 2010 For r=.80, 64% of the variance of Y is explained by variance of X For r=.30, 9% of the variance of Y is explained by variance of X Variance Accounted for r2 = .56 '%" '%" '$" '$" '#" '#" '!" '!" &" &" %" %" $" $" #" #" !" !" #" $" %" &" '!" '#" !" !" #" r = .75 r2 = .30 %" &" '!" '#" r = -.75 '%" '%" "%' '$" '$" "$' '#" '#" "#' '!" '!" "!' &" &" "& %" %" "% $" $" "$ #" #" "# !" !" !" #" $" %" &" r = .55 Monday, October 25, 2010 $" '!" '#" "! "#!"' "!#"' "$" & "%" % "&" $ r = -.55 '!" "# '#" "! Variance accounted for - 2 • Height only partially accounts for weight - Monday, October 25, 2010 For females, r =.47, so r2=.22 For males, r = .68, so r2=.46 Clicker question 3 For the correlation between the average speed a person drives and gas mileage, r = -.90. This indicates: A. higher average speed is a strong predictor of higher gas mileage B. higher average speed is a weak predictor of higher gas mileage C. higher average speed is a strong predictor of lower gas mileage D. higher average speed is a weak predictor of lower gas mileage Monday, October 25, 2010 Clicker question 4 For the correlation between the average speed a person drives and gas mileage, r = -.90. How much of the variance in gas mileage can be accounted for by average speed? A. 90% B. 19% C. 81% D. Cannot tell from the information given Monday, October 25, 2010 Prediction • • A major reason to be interested in correlation - If two variables are correlated, we can use the value of an item on one variable to predict the value on another • Employment prediction: prediction of future job performance based on years of experience • Actuarial prediction: how long one will live based on how often one skydives • Risk assessment: prediction of how much risk an activity poses in terms of its values on other variables Prediction employs the regression line Monday, October 25, 2010 Criterion variable Regression line • Start with scatter plot of data points • Find line which allows for the best prediction of the criterion variable (one to be predicted) from that of the predictor variable Predictor variable Monday, October 25, 2010 red line which minimizes the (square of the) distances of the blue lines Equation for Regression line y = a + bx y = predicted or criterion variable x = predictor variable a = y-intercept—regression constant b = slope—regression coefficient Note: the regression coefficient is not the same as the Pearson coefficient r Monday, October 25, 2010 Understanding the Regression Line • Assume the regression line equation between the variables mpg (y) and weight (x) of several car models is - mpg = 62.85 - 0.011*weight The regression constant, 62.85, represents the projected value of a car weighing 0 lbs MPG is expected to decrease by 1.1 mpg for every additional 100 lb. in car weight - Monday, October 25, 2010
© Copyright 2026 Paperzz