EDF 6472

EDF 6472
Introduction to Data Analysis in Education
Assignments Due October 15, 2012 – Solutions
Hinkle, et al.
2.
Student
1
2
3
4
5
6
7
8
Special events Depression
Special events Depression
per week
score
Student
per week
score
0
15
9
3
2
2
3
10
3
4
2
12
11
4
2
1
11
12
1
8
3
5
13
1
10
1
8
14
1
12
2
15
15
2
8
0
13
We are given the following information:
 XY  166
Y  128
Y 1378
n  15
 X  26
 X  64
2
2
a. A scatterplot of the data looks like the plot below.
Social events per week by depression score
16
14
Depression score
12
10
8
6
4
2
0
-1
0
1
2
3
Social events per week
4
5
2
We can find the regression equation for predicting depression scores from the number of
social events using the following formulas:
b
n  XY   X Y
n X 2    X 
2

15166  26128 2490  3328  838


 2.95
2
960  676
284
1564  26
and
a
Y  b X
n

128   2.9526 128   76.7  128  76.7 204.7



 13.65
15
15
15
15
Therefore the regression equation is Yˆ  2.95 X  13.65
d. The depression score value for a student who has attended three social events is
Yˆ  2.953  13.65  8.85  13.65  4.80
e. The standard error of estimate is found, using formula 6.11 by
sY . X  sY
 1  r  n  1 n  2
2
Using the raw score formula for the standard deviation (equation 3.12 in our text) we
can find the standard deviation of the depression scores using the formula
sY 
Y
i
2

Y  n 
2
i
n 1
1378  128 2 15
1378  16384 15
1378  1092.27
285.73



15  1
14
14
14
 20.41  4.52
The Pearson's Product-Moment Correlation for these data is found using the formula
rXY 

n X
n  XY   X Y
2
  X 
2
 nY
2
 Y 
2490  3328

960  67620670  16384
2

15166  26128
 1564  26 151378  128 
2
2
 838
 838
 838


 .76
2844286 1217224 1103.28
3
sY . X  sY
 1  r  n  1 n  2  4.52 1   .76   15  1 15  2 
2
2
So,




4.52 1  .58 14 13  4.52 .42 1.08  4.52.651.04   3.06
Using Formula 6.12 we find
sY . X  sY 1  r 2  4.52 1   .76  4.52 1  .58  4.52 .42  4.52.65  2.94
2
7. The dean of students at Southeastern State University is interested in the relationship
between students’ grades and their part time work. Data from the 25 students who
hold part-time jobs were collected on number of hours worked per week (X) and last
semester’s grade point average (Y).
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
Hours
Hours
GPA Student
GPA
worked
worked
17
2.9
14
19
3.1
7
3.2
15
23
2.5
10
2.5
16
27
2.4
32
1.9
17
18
3.2
20
3.0
18
10
3.5
22
2.1
19
32
2.2
15
2.4
20
18
3.0
12
3.3
21
22
2.5
19
2.7
22
15
3.3
13
3.1
23
16
3.1
26
2.3
24
19
2.3
23
2.7
25
22
2.7
25
3.3
We are also told that
n  25
 X  482
 X  10276
2
 XY  1290.70
 Y  69.20
 Y 196.22
2
4
a. Plot the data in a scatterplot.
3.5
GPA
3.0
2.5
2.0
1.5
5
10
15
20
25
30
35
hrs_worked
b. Determine the regression equation for predicting GPA (Y) from hours worked (X).

The regression formula is in the form Y  bX  a . The value of the slope of the
line (b) is found using the equation
b
n  XY   X  Y
n  X 2   X 
2

251290.7  48269.2   1086.90  0.044
2510276  232324
24576
The value of the Y-intercept of the line (a) is found using the formula
a
 Y  b X
n

69.2   0.044 X 482  90.408

 3.616
25
25
5
So, the regression equation for finding GPA from the number of hours worked is

Y
 0.044 X  3.616
c. Draw the regression line on the scatterplot.
We know that when X equals 0, the predicted value of Y is equal to a (the Y-intercept),
by definition. This is one point on the line. We can find the second point by assigning
any value to X and calculating the predicted Y. Let us use X=10, so the predicted value
of Y is (-.044)10 + 3.616 = (-.44) + 3.616 = 3.176. This is the second point on the
line. Now we can just connect these points and continue the line out beyond them.
4.0
3.0
2.0
GPA
1.0
0.0
0
10
20
30
40
hrs_worked
d. Predict the GPA for a student who worked 25 hours.

Since the regression equation is Y  0.044 X  3.616 , for a student who worked 25
hours per week, the predicted GPA would be
GPA   0.04425  3.616   1.1  3.616  2.516
6
e. The standard error of estimate is found, using formula 6.11 by
sY . X  sY
 1  r  n  1 n  2
2
Using the raw score formula for the standard deviation (equation 3.12 in our text) we
can find the standard deviation of the depression scores using the formula
sY 
Y
i
2

Y  n 
2
i
n 1
196.22  69.20 2 25
196.22  4788.64 25
196.22  191.55



25  1
24
24
4.67
24
 .20  .45
The Pearson's Product-Moment Correlation for these data is found using the formula
rXY 

n X
n  XY   X  Y
2
  X 
2
 nY
2
  Y 
2
251290.70  482 69.20

 2510276  482  25196.22  69.20 
32267.50  33354.4

256900  2323244905.50  4788.64
2
 1086.90

24576116.86
2
 1086.90
 1086.90

 .64
1694.68
2871951.36
So, using formula 6.11,

sY . X  sY 1  r 2
 n  1 n  2  .45 1   .64   25  1 25  2 
2




.45 1  .41 24 23  .45 .59 1.04  .45.77 1.02   .353
And using formula 6.12,
2
sY . X  sY 1  r 2  45 1   .64   .45 1  .41  .45 .59  .45.77  .346




7
Green, et al.
Lesson 33 – Betty is interested in determining whether the number of publications by a
professor can be predicted from work ethic. She has access to a sample of 50 social
science professors who were teaching at the same university for a 10-year period. Betsy
has collected data on the number of publications each professor has (num_pubs). She
also has scores that reflect professors’ work ethic (work_eth). These scores range from 1
to 50, with 50 indicating a very strong work ethic.
5. Conduct a bivariate linear regression to evaluate Betsy’s research question. From the
output identify the following:
a. Significance test to assess the predictability of number of publications from work
ethic.
First be sure to bring the file Lesson 33 Exercise File 2 into your computer’s
memory. You should see the Data View window that looks like this.
Now, to run the bivariate regression, click the Analyze menu on top of the Data
View screen and then choose the Regression item on the menu. You will see the
screen shown on the next page.
8
Click on the Linear choice on the top of the submenu and we will see the dialog box
below.
Now, since we want to
predict the number of
publications, it is our
dependent variable.
Select this variable in
the window on the left
in the Linear Regression
dialog box and move it
into the Dependent box
using the right arrow.
The work ethic is the
predictor variable and
this makes it the
independent variable.
Select it and move it
into the Independent(s)
window in the dialong
box. The dialog box should look like the one on the next page.
9
Now click on the OK
button to obtain the
output.
The table shown below shows the results of a significance test for the prediction of
the prediction of the number of publications from the work ethic. Specifically, it is
an analysis of variance that tests the null hypothesis that the correlation between the
two variables is zero.
ANOVA b
Model
1
Regres sion
Residual
Total
Sum of
Squares
1922.444
4387.556
6310.000
df
1
48
49
Mean Square
1922.444
91.407
a. Predictors: (Constant), work_eth Work Ethic
b. Dependent Variable: num_pubs Number of publications
F
21.032
Sig.
.000a
10
Note that the significance (Sig.) is less than .05 or .01. This tells us that there is less
than a .001 chance that the null hypothesis (r = 0) is true for the population that this
sample of professors came from. So, we conclude that the prediction would have
been better than chance for all members of the population of interest.
b. Regression equation.
The table below gives the values of b and a in the regression equation.
Coeffi cientsa
Model
1
(Const ant)
work_eth Work Et hic
Unstandardized
Coeffic ient s
B
St d. Error
-2. 963
2.823
.450
.098
St andardiz ed
Coeffic ient s
Beta
.552
t
-1. 050
4.586
Sig.
.299
.000
a. Dependent Variable: num_pubs Number of publicat ions
The table shows us that the value of b (SPSS puts it in a column labeled B) for the
predictor variable work_eth is .450 and that the value of a (SPSS calls it the Constant)
is -2.963. So the regression equation for predicting number of publications from work

ethic is Y  .450 X  ( 2.963) .
c. Correlation between number of publications and work ethic.
The correlation between the independent and dependent variables is shown in the table
below.
Model Summary
Model
1
R
R Square
a
.552
.305
Adjust ed
R Square
.290
St d. Error of
the Es timate
9.561
a. Predic tors: (Constant), work_et h W ork Ethic
The correlation between the two variable is found in the cell labeled R (it is actually a
simple bivariate correlation and probably should be labeled r, but SPSS must do its
thing). A correlation of .552 tells us that 30.5% (r2) of the variance in the number of
publications can be predicted by the work ethic score.
11
6. Create a scatterplot of the predicted and residual scores, using the steps described
in Using SPSS Graphs to Display Results (on page 279). What does this graph tell
you about your analysis?
Using the directions we get the following graph.
Scatterplot
Dependent Variable: Number of publications
Regression Studentized
Residual
4
3
2
1
0
-1
-2
-2
-1
0
1
2
Regression Standardized Predicted Value
Note that for low predicted values the points cluster closely. At higher values they vary
much more greatly. This may indicate a violation of the assumption of heteroscadacity.