Comparing Group Means using Regression

Lecture 10 - Qualitative and Quantitative IVs
What’s special about mixing Qualitative and Quantitative IVs?
Nothing
What Procedure should I use – REGRESSION (or its R equivalent) or GLM (or R equivalent)?
If you’re using SPSS, either REGRESSION or GLM can be used to perform all of the analyses to be
described here.
Use the procedure that is easiest for you.
There are two procedures in Rcmdr – the Linear Regression procedure and the Linear Model procedure.
They’re roughly analogous to SPSS’s REGRESSION and GLM procedures.
Simplest Example – One qualitative and one quantitative factor
Suppose that two groups are given training on how to perform a job. One of the groups has been
given the old training method. It is Group 0. The other group has been given a recently purchased
new training program. It is group 1. Suppose also that a test of cognitive ability, an ability that is
probably related to job performance, has been given each person. The dependent variable is the final
performance after completion the training program. That final performance depends on both the type of
training and also on cognitive ability. In the following, cognitive ability is X and final performance is
Y. The data are as follows . . .
CA
Group
54
49
66
45
65
47
50
39
53
29
52
38
32
24
58
49
62
46
35
50
55
55
44
44
56
56
55
67
37
52
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Perf (Y)
45
60
68
43
61
29
46
49
59
41
58
42
30
30
54
54
77
67
50
58
47
74
47
62
71
54
67
68
58
54
REGRESSION Menu Sequence
Note that since in this example the Qualitative factor has only
two levels, we do not have to create group coding variables.
(Whew!)
Lecture 11_Qualitative And Quantitative IVs - 1
8/1/2017
The REGRESSION analysis – same old stuff – a two-predictor analysis.
Regression
[DataSet0] G:\MDBT\P513\P513L07B-QualQuant\Training program and cognitive ability data.sav
Variables Entered/Removeda
Model
Variables Entered
1
CA, Groupb
a. Dependent Variable: Y
b. All requested variables entered.
Variables
Removed
Method
. Enter
Model Summary
Model
R
R Square
Adjusted R Square
1
.781a
.610
.581
a. Predictors: (Constant), CA, Group
Std. Error of the
Estimate
8.210
ANOVAa
Model
1
Regression
Residual
Sum of Squares
2844.774
1819.926
Total
df
2
27
4664.700
Mean Square
1422.387
67.405
F
21.102
Sig.
.000b
29
a. Dependent Variable: Y
b. Predictors: (Constant), CA, Group
Coefficientsa
Model
1
(Constant)
Unstandardized Coefficients
B
Std. Error
14.644
7.095
Group
CA
a. Dependent Variable: Y
9.946
.707
3.057
.145
Standardized
Coefficients
Beta
.399
.598
t
2.064
Sig.
.049
3.253
4.877
.003
.000
0
1
Verbal Interpretation of B Coefficients
BGroup = 9.946: Among persons of equal cognitive ability, a 1-point change in GROUP, i.e., moving
from Group 0 to Group 1 is associated with a 9.946 increase in performance. Difference is significant.
BCogability = .707: Among persons in the same group, a 1-point increase in cognitive ability
is associated with a .707 increase in performance. Relationship is statistically significant.
Predicted Y = 14.644 + .707*X + 9.946*GROUP.
When one of the variables is a grouping variable, analysts often write separate equations for each group.
For people in Group 1, Predicted Y = 14.644 + .707*X + 9.946*1 = 24.590 + .707*X
For people in Group 0, Predicted Y = 14.644 + .707*X + 9.946*0 = 14.644 + .707*X
Lecture 11_Qualitative And Quantitative IVs - 2
8/1/2017
The same analysis using GLM
Why Group is in the covariates field
You may recall that I told you to put quantitative variables in the Covariate(s) field in GLM
Group is a qualitative variable. Why did I not put it in the Fixed Factor(s) field.
The answer is that since Group has only two values, it can be put in EITHER the Fixed Factor(s) field or
in the Covariate(s) field. The results will be the same, although they may be formatted slightly
differently. I chose the Covariate(s) field because the formatting of output is slightly easier to
understand.
I chose my preferred list of options.
Lecture 11_Qualitative And Quantitative IVs - 3
8/1/2017
The GLM Output
Note that because I put Group in the
Covariates field, GLM does not
know that there are two groups of
participants. It thinks there is just
one group.
Descriptive Statistics
Dependent Variable: Y
Mean
Std. Deviation
54.10
N
12.683
30
Tests of Between-Subjects Effects
Dependent Variable: Y
Type III
Sum of
Source
Squares
Corrected
2844.774a
Model
Intercept
287.107
Group
713.443
CA
1603.141
Error
1819.926
Total
92469.000
2
Mean
Square
1422.387
1
1
1
27
287.107
713.443
1603.141
67.405
df
F
21.102
Sig.
.000
Partial Eta
Squared
.610
Noncent.
Parameter
42.204
Observed
Powerb
1.000
4.259
10.584
23.784
.049
.003
.000
.136
.282
.468
4.259
10.584
23.784
.512
.880
.997
Noncent.
Parameter
2.064
3.253
4.877
Observed
Powera
.512
.880
.997
30
Corrected
4664.700
29
Total
a. R Squared = .610 (Adjusted R Squared = .581)
b. Computed using alpha = .05
Parameter Estimates
Dependent Variable: Y
Paramet
Std.
er
B
Error
t
Intercept 14.644
7.095
2.064
Group
9.946
3.057
3.253
CA
.707
.145
4.877
a. Computed using alpha = .05
Sig.
.049
.003
.000
95% Confidence
Interval
Lower
Upper
Bound
Bound
.085
29.202
3.673
16.219
.409
1.004
Lecture 11_Qualitative And Quantitative IVs - 4
Partial Eta
Squared
.136
.282
.468
8/1/2017
The same output, but this time with Group in the Fixed Factor(s) field
Between-Subjects
Factors
N
Group
0
1
15
15
Descriptive Statistics
Dependent Variable: Y
Group
Mean
0
47.67
1
60.53
Total
54.10
Std. Deviation
12.251
9.716
12.683
N
15
15
30
Tests of Between-Subjects Effects
Dependent Variable: Y
Type III
Sum of
Source
Squares
Corrected
2844.774a
Model
Intercept
496.499
CA
1603.141
Group
713.443
Error
1819.926
Total
92469.000
2
Mean
Square
1422.387
1
1
1
27
496.499
1603.141
713.443
67.405
df
F
21.102
Sig.
.000
Partial Eta
Squared
.610
Noncent.
Parameter
42.204
Observed
Powerb
1.000
7.366
23.784
10.584
.011
.000
.003
.214
.468
.282
7.366
23.784
10.584
.744
.997
.880
Partial Eta
Squared
.276
.468
.282
Noncent.
Parameter
3.206
4.877
3.253
Observed
Powerb
.871
.997
.880
.
.
.
30
Corrected
4664.700
29
Total
a. R Squared = .610 (Adjusted R Squared = .581)
b. Computed using alpha = .05
Parameter Estimates
Dependent Variable: Y
95% Confidence
Interval
Lower
Upper
Bound
Bound
8.854
40.325
.409
1.004
-16.219
-3.673
Paramet
Std.
er
B
Error
t
Sig.
Intercept 24.590
7.669
3.206
.003
CA
.707
.145
4.877
.000
[Group=
-9.946
3.057 -3.253
.003
0]
[Group=
0a
.
.
.
1]
a. This parameter is set to zero because it is redundant.
b. Computed using alpha = .05
.
.
The essential results are the same. The output has been formatted to acknowledge that there are
two groups.
Lecture 11_Qualitative And Quantitative IVs - 5
8/1/2017
The same Analysis in Rcmdr
R  rcmdr  Import data  from SPSS file . . .
Note that if
you’re going to
be doing only
regression, you
should uncheck
the “Convert
value labels. . .”
box.
Statistics  Fit Models  Linear Regression
> RegModel.1 <- lm(y~ca+group, data=trainingprograms)
> summary(RegModel.1)
Call:
lm(formula = y ~ ca + group, data = trainingprograms)
Residuals:
Min
1Q
-18.8551 -4.9045
Median
0.4651
3Q
6.7782
These are the
commands that
Rcmdr gave to
R.
Max
10.7317
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.6438
7.0954
2.064 0.04876 *
ca
0.7066
0.1449
4.877 4.24e-05 ***
group
9.9460
3.0571
3.253 0.00306 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.21 on 27 degrees of freedom
Multiple R-squared: 0.6099,
Adjusted R-squared: 0.581
F-statistic: 21.1 on 2 and 27 DF, p-value: 3.031e-06
The regression parameters are identical to those obtained in SPSS. Any difference would be due to
printing format.
There is another Rcmdr procedure that will yield the same results.
Lecture 11_Qualitative And Quantitative IVs - 6
8/1/2017
Statistics  Fit Models  Linear Model
Double-click each variable’s name in order to put it into the field. Don’t worry about the stuff we
haven’t covered yet. We’ll get to some of it in the Advanced SPSS course.
> LinearModel.2 <- lm(y ~ group + ca, data=trainingprograms)
> summary(LinearModel.2)
Call:
lm(formula = y ~ group + ca, data = trainingprograms)
Residuals:
Min
1Q
-18.8551 -4.9045
Median
0.4651
3Q
6.7782
Max
10.7317
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.6438
7.0954
2.064 0.04876 *
group
9.9460
3.0571
3.253 0.00306 **
ca
0.7066
0.1449
4.877 4.24e-05 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.21 on 27 degrees of freedom
Multiple R-squared: 0.6099,
Adjusted R-squared: 0.581
F-statistic: 21.1 on 2 and 27 DF, p-value: 3.031e-06
Note that the output of this different command is IDENTICAL to that for the above command.
The above command was for only linear REGRESSION.
This command is analogous to GLM – it can include grouping variables. In this analysis, since Group
has only two values, I treated it as a quantity for the analysis.
Lecture 11_Qualitative And Quantitative IVs - 7
8/1/2017
Visualizing the Data – A Scatterplot which contains markers for group membership –
Start here on 411/17.
A useful way of visualizing the relationship of a dependent variable to a continuous independent
variable and also a dichotomous independent variable is with
A plot of the dependent variable vs. the continuous independent variable using different symbols
for the two groups represented by the dichotomous IV.
In this case, we would plot Y vs. CA with different symbols used for Group 0 and Group 1.
The following represents such a plot . . .
This plot allows you to see . . .
1) The relationship of Y to X – within each group circled by a green ellipse and overall (see the red
ellipse which highlights the overall relationship.
2. The relationship of Y to Group – Compare the heights of the two green ellipses.
Lecture 11_Qualitative And Quantitative IVs - 8
8/1/2017
Relationship of the Visual Aid to the Coefficients Table – The Group variable
Regression formula is Predicted Y = 14.644 + 9.946*Group + 0.707*X
So, for Group 1:
And for Group 0:
Predicted Y = 14.644 + 9.946*1 + .707*X = 24.590 + 0.707*X
Predicted Y = 14.644 + 9.936*0 + .707*X = 14.644 + 0.707*X
I’ve drawn the regression for Group 1 and that for Group 0 below. Note that the within-group
regressions have the same slope.
So the only difference in the two within-group regressions is the Y-intercept.
G1: Y-hat = 24.590 + .707X
G0: Y-hat = 14.644 + .707X
Partial regression coefficient
for Group (9.946).
The difference in intercepts of the two lines represents the partial regression coefficient for
Group. That difference is 9.946.
Recall from the definition of a partial regression coefficient that the Group B value is the expected
change in Y when Group changes by 1 unit (going from 0 to 1 or vice versa) among persons with the
same X value.
The concept “among persons with the same X value” can be illustrated in the figure with a
rectangular sliver over the scatterplot. So within the rectangle are persons satisfying the condition that
they’re equal on Cogability.
Lecture 11_Qualitative And Quantitative IVs - 9
8/1/2017
Relationship of the Visual Aid to the Coefficients Table – The Cogability (quantitative) IV
Within each group, Y = Intercept + .707*Cognitive Ability
The common slope(s) of the two lines represents the partial regression coefficient for the
quantitative predictor.
G1: Y-hat = 24.590 + .707X
G0: Y-hat = 14.644 + .707X
The B value for the quantitative IV (0.707) is the common slope of the lines.
Recall from the definition of a partial regression coefficient that the Cogability B is the expected change
in Y when Cogability changes by 1 unit (going from 0 to 1 or vice versa) among persons with the same
Group value.
So each line represents the concept people equal on Group.
Lecture 11_Qualitative And Quantitative IVs - 10
8/1/2017
The Y vs. X Graph with group-specific lines through each group of points.
When you ask SPSS to draw lines for each group, the lines that SPSS draws will have slopes specific to
the group. They won’t be set equal to the value in the Coefficients Table.
Later we’ll see that the extent to which the group-specific lines are not parallel will reflect the extent
to which the two variable interact.
More on that in the chapter on Moderation.
Lecture 11_Qualitative And Quantitative IVs - 11
8/1/2017
Regression with One 3-Category Qualitative and One Quantitative Independent
Variable
(Data are RandomizedBlocks in P595B HA folder. N is 240, too large to allow the data to be displayed
here.)
A company is considering switching from the current training program to one of two alternative
programs. (Tell Tom story here.)
The current program has been in effect for many years. Two alternative programs have recently
become available.
The first is a video / computer based method, involving presenting key information on DVDs and
then using a stand-alone computer program to present drill-and-practice on the material presented.
The 2nd is a completely web-based training method, in which the information is presented over the
web and interactive drill-and-practice are also presented over the web. Before the company decides to
make any kind of switch, it must determine whether there are any significant differences in learning
after having gone through the three programs.
Two hundred forty participants are randomly assigned to one of three groups of trainees with 80
participants in each group. One group receives the current training, the 2nd group receives the
video/computer training, and the third receives the web-based training.
A measure of cognitive ability (X), known to predict performance after training, is taken prior to
beginning training.
So the question we wish to answer is: Are there significant differences between the group means
when controlling for differences in cognitive ability?
The data are as follows. X is the cognitive ability measure. Y is the performance after training.
TRTMENT is the training program: 1=standard; 2=video/computer, and 3=web-based.
So, for this problem, we’re comparing TRTMENT means controlling for cognitive ability.
De scriptiv e S tatis tics
TRTME N
1
T
2
CA
PERF
X
Y
N
80
80
Me an
49. 68
110 .06
Me dian
50. 50
111 .00
Std .
De viatio n
10. 45
15. 51
N
80
80
Me an
49. 54
111 .95
Me dian
50. 00
114 .00
9.6 5
13. 96
Std .
De viatio n
3
To tal
N
80
80
Me an
49. 21
117 .94
Me dian
51. 00
120 .00
Std .
De viatio n
10. 91
15. 28
N
240
240
Me an
49. 47
113 .32
Me dian
50. 00
114 .00
Std .
De viatio n
10. 31
15. 25
Note that the differences between groups in Cognitive Ability
are small.
You might wonder why we bother to control for it.
It turns out that controlling for a variable that is related to the
dependent variable even though it is unrelated to the other
independent variables increases our power to detect
differences associated with the other independent variables.
So it pays to include a covariate, as long as that covariate is
related to the dependent variable.
Lecture 11_Qualitative And Quantitative IVs - 12
8/1/2017
Recall, the questions we’re asking are the following:
1) Among persons equal in cognitive ability (X), are there mean differences between the groups in
performance after training? This is the main question.
2) Among persons within the same group, is there a relationship between performance after
training and cognitive ability? This is a question that we assume will be answered positively,
otherwise we wouldn’t control for cognitive ability. But we will check it anyway.
The data prior to group-coding variables
The data originally consists of just 4 columns –
the ID column, the Y column, the X column
and the TRTMENT column.
The rows are in 1 – 2 – 3 order of trtment. This
is not often the case but was done here as part
of the randomization procedure.
Creation of Group Coding Variables
To analyze the problem using the REGRESSION procedure, we must create group coding variables for
the TRTMENT variable. Which method should we use??
Since one of the groups is a natural control group, we’ll use dummy coding, using TRTMENT=1 as
the reference group. So the coding will be
TRTMENT
1
2
3
DCODE1
0
1
0
DCODE2
0
0
1
<==== Reference group
So, after creating group coding variables and using them to represent the groups, the data editor looks
like the following
Lecture 11_Qualitative And Quantitative IVs - 13
8/1/2017
Testing the two hypotheses . . .
1. The significance of differences in mean performance between groups, controlling for X.
The first test is assessed by computing the increase in R2 due to addition of the group coding
variables to the equation.
Since there is a set of IVs representing the groups, we have to use the techniques discuss previous
involving groups of independent variables (GRE tests in the previous example).
We do a two-step regression.
First, we regress Y onto just X. Then we add the group coding variables to the equation.
Specifically, first, we enter X
Second, we enter the two group-coding variables representing TRTMENT.
We assess the significance of R2 change in Step 2.
2. The significance of the relationship of Y to X, controlling for group differences.
Since X is a single variable, we can assess its significance by simply examining the t value (and its pvalue) in the Coefficients box.
That t assesses the significance of X controlling for all the other variables, specifically, the group coding
variables.
Lecture 11_Qualitative And Quantitative IVs - 14
8/1/2017
Regression
A two-step regression analysis is conducted because one of the tests involves a set of independent
variables.
Step 1: Enter continuous predictor, X.
Step 2: Add the set of group-coding variables, DCODE1, DCODE2.
[DataSet1] F:\MdbT\P595B\HAs\RandomizedBlocks.sav
De scriptiv e Statis tics
x
Me an Std . Deviatio n
49 .48
10 .307
y
N
24 0
11 3.32
15 .247
24 0
dco de1
.33
.47 2
24 0
dco de2
.33
.47 2
24 0
Group
Standard
Video
Web
DCODE1
0
1
0
DCODE2
0
0
1
Correla tions
Pe arson Correlatio n x
x
1.0 00
y
.43 0
dco de1
.00 4
dco de2
-.0 18
y
.43 0
1.0 00
-.0 64
.21 5
dco de1
.00 4
-.0 64
1.0 00
-.5 00
dco de2
-.0 18
.21 5
-.5 00
1.0 00
Va riable s Entered/Rem ov e db
Mo del
1
Va riable s En tered
xa
2
dco de1, dcod e2 a
Va riable s
Re move d Me thod
. En ter
Significance of the increase in
R2 (from 0) when X
(cognitive ability) was added
to the equation, p < .001.
. En ter
a. All requ ested varia bles entered.
b. De pend ent V ariab le: y
Model S umm ary
Mo del
1
Ch ange Stat istics
Std . Erro r of
R
R S quare Ad justed R S quare the Estim ate R S quare Ch ange F Chang e
df1
df2
Sig . F Chang e
.43 0 a
.18 5
.18 1
13 .798
.18 5
53 .847
1
23 8
.00 0
2
.48 7 b
.23 7
.22 7
13 .404
.05 2
8.0 92
2
23 6
.00 0
a. Pre dicto rs: (Consta nt), x
b. Pre dicto rs: (Consta nt), x, dco de1, dcod e2
Significance of the change in
R2 when the group-coding
variables were added to the
equation, p < .001.
Lecture 11_Qualitative And Quantitative IVs - 15
8/1/2017
ANOVAc
Mo del
1
2
Re gressi on
Su m of Squa res
102 50.9 65
df
Me an S quare
102 50.9 65
1
Re sidua l
453 08.9 68
238
To tal
555 59.9 33
239
Re gressi on
131 58.5 65
3
438 6.18 8
Re sidua l
424 01.3 69
236
179 .667
To tal
555 59.9 33
239
F
53. 847
Sig .
.00 0 a
24. 413
.00 0 b
Performance is related to the
whole collection of IVs in each
model.
For Mode1 1, X is the only
predictor.
For Model 2, the X plus the
two group coding variables
make up the collection of IVs.
190 .374
a. Pre dicto rs: (Consta nt), x
b. Pre dicto rs: (Consta nt), x, dcod e1, d code 2
c. De pend ent V ariabl e: y
The significance of each individual variable can be obtained from the Coefficients table below.
We want the significance of X, the continuous predictor for the 2nd hypothesis..
We may also want the significance of the dummy coding variables..
Coeffici ents a
Un stand ardized
Co effici ents
Mo del
1
(Co nstan t)
x
B
Std . Erro r
81 .880
4.3 76
Sta ndardized
Co effici ents
Be ta
Co rrelat ions
t
18 .712
.63 5
.08 7
.43 0
78 .182
4.4 40
.64 2
.08 4
.43 4
dco de1
1.9 76
2.1 19
dco de2
8.1 72
2.1 20
Sig .
Ze ro-ord er
.00 0
7.3 38
.00 0
17 .609
.00 0
7.6 28
.06 1
.93 2
.25 3
3.8 55
Pa rtial
Pa rt
.43 0
.43 0
.43 0
.00 0
.43 0
.44 5
.43 4
.35 2
-.0 64
.06 1
.05 3
.00 0
.21 5
.24 3
.21 9
There was a
significant
relationship of Y to
X among persons
in the same group..
dco de1
dco de2
2
(Co nstan t)
x
a. De pend ent V ariab le: y
Ex clude d Va riabl es b
Co llinea rity
Sta tistics
Mo del
1
dco de1
Be ta In
-.0 65 a
t
-1. 117
dco de2
.22 3 a
3.9 14
Sig .
Pa rtial Corre lation
.26 5
-.0 72
.00 0
.24 6
To leran ce
1.0 00
Mean of Group 3
was significantly
larger than the
mean of the control
group among
people equal on X.
1.0 00
a. Pre dicto rs in the M ode l: (Co nstan t), x
b. De pend ent V ariab le: y
I’ve never used this
table.
So, our conclusion is that there ARE significant differences between the group post-training means after
controlling for cognitive ability. The t-test in the Coefficients Box tells us that only Treatment 3
(dcode2) is significantly different from the control or standard method.
Lecture 11_Qualitative And Quantitative IVs - 16
8/1/2017
The visual aid plot of Y vs. X with different plotting symbols for each group.
(Regression lines with slopes specific to each group were added because they’re so easy to get. Note
however that the analysis assumes that the regression lines all have the same slope.)
1) Note that the Treatment 3 line – the green one - is above the Treatment 2 line (the red one) which is
(usually) above the Treatment 1 line (the dotted one). This hints at differences between average
performance between groups.
2) Note the generally positive relationship of Y to X, overall and within each group. Looks like the
dependent variable is related to the cognitive ability, which makes it a good covariate.
Lecture 11_Qualitative And Quantitative IVs - 17
8/1/2017
The analysis of the same data with GLM (much easier)
Start here on 4/12/16.
The data, again.
Specifying the analysis . . .
Putting the name of a variable
in the Fixed Factor(s) field
tells GLM that the variable
needs group coding variables.
GLM will automatically
create them.
Don’t put the name of a
quantitative variable in the
Fixed Factor(s) field. GLM
will create many many many
group coding variables, then
perhaps terminate with an
error message.
Lecture 11_Qualitative And Quantitative IVs - 18
8/1/2017
Plots . . .
Post hocs . . .
Alas, Post Hocs are not available when you have a continuous Covariate.
So, we can’t, for example, use Post Hocs to discover which pairs of means are
significantly different from each other.
This is a problem GLM or perhaps it’s simply a problem with mathematical
statisticians who are too lazy to discover post hoc tests when there are covariates.
Options . . .
Lecture 11_Qualitative And Quantitative IVs - 19
8/1/2017
Results . . .
Univariate Analysis of Variance
[DataSet1] G:\MdbT\P595B\HAs\RandomizedBlocks.sav
Between-Subjects Factors
N
trtment
1
80
2
80
3
80
Descriptive Statistics
Dependent Variable:y
trtment
Mean
Std. Deviation
N
1
110.06
15.514
80
2
111.95
13.965
80
3
117.94
15.276
80
Total
113.32
15.247
240
Good news – Using GLM
allows you to get group means
and standard deviations
without having to use a
different procedure.
Levene's Test of Equality of Error Variancesa
Dependent Variable:y
F
df1
.064
df2
2
Sig.
237
.938
Tests the null hypothesis that the error variance of
the dependent variable is equal across groups.
a. Design: Intercept + x + trtment
This compares variances of the
dependent variable across
groups.
We passed the test – the data
suggest that the null
hypothesis of equal population
variances should be retained.
Lecture 11_Qualitative And Quantitative IVs - 20
8/1/2017
Tests of Between-Subjects Effects
Dependent Variable:y
Partial
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
Eta
Noncent.
Observed
Squared
Parameter
Powerb
13158.565a
3
4386.188
24.413
.000
.237
73.239
1.000
Intercept
66125.624
1
66125.624
368.046
.000
.609
368.046
1.000
x
10453.806
1
10453.806
58.184
.000
.198
58.184
1.000
2907.600
2
1453.800
8.092
.000
.064
16.183
.956
Error
42401.369
236
179.667
Total
3137320.000
240
55559.933
239
Corrected Model
trtment
Corrected Total
a. R Squared = .237 (Adjusted R Squared = .227)
b. Computed using alpha = .05
Corrected Model:
Intercept:
Same information as in the REGRESSION Anova box.
Test of the hypothesis that the population Y-intercept (“Constant” in
REGRESSION output) is zero. I’ve dimmed them to remind you that they’re
technical stuff that has nothing to do with the hypotheses.
X:
Test of the hypothesis that in the population, controlling for trtment, the slope
relating DV to X is zero.
Conclusion: Among persons equal on trtment (i.e., in the same group) there is a
significant relationship of Y to X (cognitive ability)
Trtment:
Test of the hypothesis that the population means of the 3 conditions are equal
when controlling for differences in X.
Conclusion:
Among persons equal on X (cognitive ability) there are significant differences in
the means of the three groups.
The plot of group means for each TRTMENT
condition.
Lecture 11_Qualitative And Quantitative IVs - 21
8/1/2017
New Topic - Creating Scale scores – May be covered at end of Sem.
Questions like: “Does Conscientiousness
Test Performance?” are answered by computing scale
Not covered onpredict
3/31/15.
scores.
Not covered on 4/12/16.
Not covered
on 4/11/17.
A scale score is computed from
a collection
of conscientiousness items. This scale score represents
Conscientiousness.
A scale score may be computed from the items of a measure of performance. That scale score would
represent Performance.
Finally, the correlation coefficient between the two scale scores is computed.
Procedure for computing a scale score
Data: Biderman, Nguyen, & Sebren, 2008.
GET FILE='G:\MdbR\1Sebren\SebrenDataFiles\SebrenCombined070726NOMISS2EQ1.sav'.
The data typically are entered in the order in which they appear on questionnaire data sheets.
In this case, 50 columns contain the responses to the 50-item IPIP Big Five exactly as they appear on the
questionnaire.
The next 20 or so columns contain the reverse coded responses to the negatively-worded items. I
created them using the SPSS RECODE command.
1. Reverse score the negatively-worded items.
q2 q4 q6 q8 q10 q12 q14 q16 q18 q20 q22 q24 q26 q28 q29 q30 q32 q34 q36 q38 q39 q44 q46 q49
Here’s syntax to perform the recode:
SAVE FILE IMMEDIATELY BEFORE WHAT FOLLOWS.
recode q2 q4 q6 q8 q10 q12 q14 q16 q18 q20 q22 q24 q26 q28 q29 q30 q32 q34 q36 q38 q39 q44
q46 q49 (1=7)(2=6)(3=5)(4=4)(5=3)(6=2)(7=1) into
q2r q4r q6r q8r q10r q12r q14r q16r q18r q20r q22r q24r q26r q28r q29r q30r q32r q34r q36r
q38r q39r q44r q46r q49r.
SAVE FILE UNDER A DIFFERENT NAME IMMEDIATELY AFTER THIS.
You don’t need to do this using syntax. It can be done using pull-down menus or by hand. But it must
be done.
Lecture 11_Qualitative And Quantitative IVs - 22
8/1/2017
2. Define Missing Values Tell SPSS if specific values are to be treated as missing.
This is very important. A fairly recent thesis student lost several days because the student
created scale scores without declaring missing values. MISSING VALUES MUST BE DECLARED
FOR ALL ITEMS.
3. Determine which items belong to which scale?
The IPIP items are distributed as follows: E A C S O E A C S O E A C S O . . .
That is the 1st, 6th, 11th, 16th, 21st, 26th, 31st, 36th, 41st, and 46th items are E items.
The 2nd, 7th, 12th, 17th, etc are A. And so forth.
4. Compute Scale scores.
4a. In syntax
To compute the E scale score,
E = (q1 + q6r + q11 + q16r + q21 + q26 + q31 + q36r + q41 + q46r) / 10.
Manual Arithmetic.
In syntax, that would be
compute e = (q1+q6r+q11+q16r+q21+q26+q31+q36r+q41+q46r)/10.
If it’s computed this way, the result for any case with a missing value will be treated as missing.
It can also be computed as
compute e = mean(q1,q6r,q11,q16r,q21,q26,q31,q36r,q41,q46r).
If it’s computed this way, if a response is missing, the mean will be taken across the remaining
nonmissing items.
So, after all negatively worded items have been recoded, the syntax to compute all of the Big Five scale
scores would be
compute
compute
compute
compute
compute
e
a
c
s
o
=
=
=
=
=
mean(q1,q6r,q11,q16r,q21,q26r,q31,q36r,q41,q46r).
mean(q2r,q7,q12r,q17,q22r,q27,q32r,q37,q42,q47).
mean(q3,q8r,q13,q18r,q23,q28r,q33,q38r,q43,q48).
mean(q4r,q9,q14r,q19,q24r,q29r,q34r,q39r,q44r,q49r).
mean(q5,q10r,q15,q20r,q25,q30r,q35,q40,q45,q50).
Cut this page and the previous page out and paste it on your wall for when you analyze your thesis
data.
Lecture 11_Qualitative And Quantitative IVs - 23
8/1/2017
4b. Computing a scale score using the TRANSFORM menu . .
Repeat the above for each scale, substituting appropriate item names.
Lecture 11_Qualitative And Quantitative IVs - 24
8/1/2017
5. Run FREQUENCIES on scale scores.
Negatively
skewed.
Lecture 11_Qualitative And Quantitative IVs - 25
8/1/2017
6. Run Reliabilities of each scale
Reliability Statistics
Cronbach's
Alpha Based on
Cronbach's
Standardized
Alpha
Items
.792
N of Items
.794
e
10
Reliability Statistics
Cronbach's
Alpha Based on
Cronbach's
Standardized
Alpha
Items
.833
N of Items
.832
a
10
Reliability Statistics
Cronbach's
Alpha Based on
Cronbach's
Standardized
Alpha
Items
.799
N of Items
.799
c
10
Cronbach's
Alpha Based on
Cronbach's
Standardized
Alpha
Items
.825
N of Items
.836
s
10
Reliability Statistics
Cronbach's
Alpha Based on
Cronbach's
Standardized
Alpha
Items
.848
N of Items
.849
o
10
Lecture 11_Qualitative And Quantitative IVs - 26
8/1/2017
7. Compute correlations between scale scores.
Correlations
hcon C
summated scale
scores from IPIP
50-item Big Five
hext
hext
Pearson Correlation
hagr
1.000
Sig. (2-tailed)
scale
hsta
hopn
.254
.155
.194
.241
.003
.074
.024
.005
N
135
135
135
135
135
Pearson Correlation
.254
1.000
.421
.224
.426
Sig. (2-tailed)
.003
.000
.009
.000
N
135
135
135
135
135
Pearson Correlation
.155
.421
1.000
.277
.238
scores from IPIP 50-item Big Sig. (2-tailed)
Five scale
N
.074
.000
.001
.005
135
135
135
135
135
hsta
Pearson Correlation
.194
.224
.277
1.000
.226
Sig. (2-tailed)
.024
.009
.001
N
135
135
135
135
135
Pearson Correlation
.241
.426
.238
.226
1.000
Sig. (2-tailed)
.005
.000
.005
.008
N
135
135
135
135
hagr
hcon C summated scale
hopn
.008
135
The mean of the correlations between scale scores is
(.254+.155+.194+.241+.421+.224+.426+.277+.238+.226)/10 = .265.
Wait!! Aren’t the Big Five dimensions supposed to be independent dimensions of personality? If so,
why are they generally positive correlated?
This question is leading to a ton of research right now. Key phrases: higher order factors of the Big
Five; general factor of personality.
Lecture 11_Qualitative And Quantitative IVs - 27
8/1/2017
8. Compute correlations of scale scores with variables your theory says they should correlate
with.
INCLUDE SCATTERPLOTS TO CHECK FOR NONLINEARITY.
Correlations
hcon C
summated scale
scores from IPIP
50-item Big Five
scale
hcon C summated scale
scores from IPIP 50-item Big
Pearson Correlation
test
1.000
Sig. (2-tailed)
.086
.320
Five scale
test
N
135
135
Pearson Correlation
.086
1.000
Sig. (2-tailed)
.320
N
135
135
In this case, the correlation is not significant. After two years of thinking about it, we hit upon the idea
that perhaps the correlation was suppressed by method bias. That turned out to be a viable hypothesis.
Lecture 11_Qualitative And Quantitative IVs - 28
8/1/2017