EDF 6472 Introduction to Data Analysis in Education Assignments Due October 15, 2012 – Solutions Hinkle, et al. 2. Student 1 2 3 4 5 6 7 8 Special events Depression Special events Depression per week score Student per week score 0 15 9 3 2 2 3 10 3 4 2 12 11 4 2 1 11 12 1 8 3 5 13 1 10 1 8 14 1 12 2 15 15 2 8 0 13 We are given the following information: XY 166 Y 128 Y 1378 n 15 X 26 X 64 2 2 a. A scatterplot of the data looks like the plot below. Social events per week by depression score 16 14 Depression score 12 10 8 6 4 2 0 -1 0 1 2 3 Social events per week 4 5 2 We can find the regression equation for predicting depression scores from the number of social events using the following formulas: b n XY X Y n X 2 X 2 15166 26128 2490 3328 838 2.95 2 960 676 284 1564 26 and a Y b X n 128 2.9526 128 76.7 128 76.7 204.7 13.65 15 15 15 15 Therefore the regression equation is Yˆ 2.95 X 13.65 d. The depression score value for a student who has attended three social events is Yˆ 2.953 13.65 8.85 13.65 4.80 e. The standard error of estimate is found, using formula 6.11 by sY . X sY 1 r n 1 n 2 2 Using the raw score formula for the standard deviation (equation 3.12 in our text) we can find the standard deviation of the depression scores using the formula sY Y i 2 Y n 2 i n 1 1378 128 2 15 1378 16384 15 1378 1092.27 285.73 15 1 14 14 14 20.41 4.52 The Pearson's Product-Moment Correlation for these data is found using the formula rXY n X n XY X Y 2 X 2 nY 2 Y 2490 3328 960 67620670 16384 2 15166 26128 1564 26 151378 128 2 2 838 838 838 .76 2844286 1217224 1103.28 3 sY . X sY 1 r n 1 n 2 4.52 1 .76 15 1 15 2 2 2 So, 4.52 1 .58 14 13 4.52 .42 1.08 4.52.651.04 3.06 Using Formula 6.12 we find sY . X sY 1 r 2 4.52 1 .76 4.52 1 .58 4.52 .42 4.52.65 2.94 2 7. The dean of students at Southeastern State University is interested in the relationship between students’ grades and their part time work. Data from the 25 students who hold part-time jobs were collected on number of hours worked per week (X) and last semester’s grade point average (Y). Student 1 2 3 4 5 6 7 8 9 10 11 12 13 Hours Hours GPA Student GPA worked worked 17 2.9 14 19 3.1 7 3.2 15 23 2.5 10 2.5 16 27 2.4 32 1.9 17 18 3.2 20 3.0 18 10 3.5 22 2.1 19 32 2.2 15 2.4 20 18 3.0 12 3.3 21 22 2.5 19 2.7 22 15 3.3 13 3.1 23 16 3.1 26 2.3 24 19 2.3 23 2.7 25 22 2.7 25 3.3 We are also told that n 25 X 482 X 10276 2 XY 1290.70 Y 69.20 Y 196.22 2 4 a. Plot the data in a scatterplot. 3.5 GPA 3.0 2.5 2.0 1.5 5 10 15 20 25 30 35 hrs_worked b. Determine the regression equation for predicting GPA (Y) from hours worked (X). The regression formula is in the form Y bX a . The value of the slope of the line (b) is found using the equation b n XY X Y n X 2 X 2 251290.7 48269.2 1086.90 0.044 2510276 232324 24576 The value of the Y-intercept of the line (a) is found using the formula a Y b X n 69.2 0.044 X 482 90.408 3.616 25 25 5 So, the regression equation for finding GPA from the number of hours worked is Y 0.044 X 3.616 c. Draw the regression line on the scatterplot. We know that when X equals 0, the predicted value of Y is equal to a (the Y-intercept), by definition. This is one point on the line. We can find the second point by assigning any value to X and calculating the predicted Y. Let us use X=10, so the predicted value of Y is (-.044)10 + 3.616 = (-.44) + 3.616 = 3.176. This is the second point on the line. Now we can just connect these points and continue the line out beyond them. 4.0 3.0 2.0 GPA 1.0 0.0 0 10 20 30 40 hrs_worked d. Predict the GPA for a student who worked 25 hours. Since the regression equation is Y 0.044 X 3.616 , for a student who worked 25 hours per week, the predicted GPA would be GPA 0.04425 3.616 1.1 3.616 2.516 6 e. The standard error of estimate is found, using formula 6.11 by sY . X sY 1 r n 1 n 2 2 Using the raw score formula for the standard deviation (equation 3.12 in our text) we can find the standard deviation of the depression scores using the formula sY Y i 2 Y n 2 i n 1 196.22 69.20 2 25 196.22 4788.64 25 196.22 191.55 25 1 24 24 4.67 24 .20 .45 The Pearson's Product-Moment Correlation for these data is found using the formula rXY n X n XY X Y 2 X 2 nY 2 Y 2 251290.70 482 69.20 2510276 482 25196.22 69.20 32267.50 33354.4 256900 2323244905.50 4788.64 2 1086.90 24576116.86 2 1086.90 1086.90 .64 1694.68 2871951.36 So, using formula 6.11, sY . X sY 1 r 2 n 1 n 2 .45 1 .64 25 1 25 2 2 .45 1 .41 24 23 .45 .59 1.04 .45.77 1.02 .353 And using formula 6.12, 2 sY . X sY 1 r 2 45 1 .64 .45 1 .41 .45 .59 .45.77 .346 7 Green, et al. Lesson 33 – Betty is interested in determining whether the number of publications by a professor can be predicted from work ethic. She has access to a sample of 50 social science professors who were teaching at the same university for a 10-year period. Betsy has collected data on the number of publications each professor has (num_pubs). She also has scores that reflect professors’ work ethic (work_eth). These scores range from 1 to 50, with 50 indicating a very strong work ethic. 5. Conduct a bivariate linear regression to evaluate Betsy’s research question. From the output identify the following: a. Significance test to assess the predictability of number of publications from work ethic. First be sure to bring the file Lesson 33 Exercise File 2 into your computer’s memory. You should see the Data View window that looks like this. Now, to run the bivariate regression, click the Analyze menu on top of the Data View screen and then choose the Regression item on the menu. You will see the screen shown on the next page. 8 Click on the Linear choice on the top of the submenu and we will see the dialog box below. Now, since we want to predict the number of publications, it is our dependent variable. Select this variable in the window on the left in the Linear Regression dialog box and move it into the Dependent box using the right arrow. The work ethic is the predictor variable and this makes it the independent variable. Select it and move it into the Independent(s) window in the dialong box. The dialog box should look like the one on the next page. 9 Now click on the OK button to obtain the output. The table shown below shows the results of a significance test for the prediction of the prediction of the number of publications from the work ethic. Specifically, it is an analysis of variance that tests the null hypothesis that the correlation between the two variables is zero. ANOVA b Model 1 Regres sion Residual Total Sum of Squares 1922.444 4387.556 6310.000 df 1 48 49 Mean Square 1922.444 91.407 a. Predictors: (Constant), work_eth Work Ethic b. Dependent Variable: num_pubs Number of publications F 21.032 Sig. .000a 10 Note that the significance (Sig.) is less than .05 or .01. This tells us that there is less than a .001 chance that the null hypothesis (r = 0) is true for the population that this sample of professors came from. So, we conclude that the prediction would have been better than chance for all members of the population of interest. b. Regression equation. The table below gives the values of b and a in the regression equation. Coeffi cientsa Model 1 (Const ant) work_eth Work Et hic Unstandardized Coeffic ient s B St d. Error -2. 963 2.823 .450 .098 St andardiz ed Coeffic ient s Beta .552 t -1. 050 4.586 Sig. .299 .000 a. Dependent Variable: num_pubs Number of publicat ions The table shows us that the value of b (SPSS puts it in a column labeled B) for the predictor variable work_eth is .450 and that the value of a (SPSS calls it the Constant) is -2.963. So the regression equation for predicting number of publications from work ethic is Y .450 X ( 2.963) . c. Correlation between number of publications and work ethic. The correlation between the independent and dependent variables is shown in the table below. Model Summary Model 1 R R Square a .552 .305 Adjust ed R Square .290 St d. Error of the Es timate 9.561 a. Predic tors: (Constant), work_et h W ork Ethic The correlation between the two variable is found in the cell labeled R (it is actually a simple bivariate correlation and probably should be labeled r, but SPSS must do its thing). A correlation of .552 tells us that 30.5% (r2) of the variance in the number of publications can be predicted by the work ethic score. 11 6. Create a scatterplot of the predicted and residual scores, using the steps described in Using SPSS Graphs to Display Results (on page 279). What does this graph tell you about your analysis? Using the directions we get the following graph. Scatterplot Dependent Variable: Number of publications Regression Studentized Residual 4 3 2 1 0 -1 -2 -2 -1 0 1 2 Regression Standardized Predicted Value Note that for low predicted values the points cluster closely. At higher values they vary much more greatly. This may indicate a violation of the assumption of heteroscadacity.
© Copyright 2026 Paperzz