Lecture 11_Qualitative and Quantitative IVs Visualizing Bivariate Regression When 1 IV is a dichotomy and 1 is Continuous Suppose that two groups are given training on how to perform a job. One of the groups has been given the old training method. It is Group 0. The other group has been given a recently purchased new training program. It is group 1. Suppose also that a test of cognitive ability, probably an ability that is related to the job, has been given each person. The dependent variable is the final performance after completion the training program. That final performance depends on both the type of training and probably also on cognitive ability. In the following, cognitive ability is X and final performance is Y. . The data are as follows . . . CogAbility 54 49 66 45 65 47 50 39 53 29 52 38 32 24 58 49 62 46 35 50 55 55 44 44 56 56 55 67 37 52 GROUP Performance 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Number of cases read: 45 60 68 43 61 29 46 49 59 41 58 42 30 30 54 54 77 67 50 58 47 74 47 62 71 54 67 68 58 54 30 Number of cases listed: Lecture 11_Qualitative And Quantitative IVs - 1 30 7/28/2017 Visualizing the Data – A two-group Scatterplot A useful way of visualizing the relationship of a dependent variable to a continuous independent variable and also a dichotomous independent variable is with a plot of the dependent variable vs. the continuous independent variable using different symbols for the two groups represented by the dichotomous iv. In this case, we would plot final performance vs. initial ability (Y vs. X) with different symbols used for Group 0 and Group 1. The following represents such a plot . . . This plot allows you to see both the relationship of Y to X (generally increasing) but also the relationship of Y to Group (larger Y's for Group 1). It also lets you compare the relationship of Y to X within Group 0 with the relationship of Y to X within Group1. In this case the relationships are similar - generally increasing. Later in the course, when we study nonlinear relationships, we'll see examples in which the Y-X relationship varies from one group to the other. Note that this plot would not work if GROUP had many values, say 20+ values. That’s because there would be 20 different plotting symbols, and it would be impossible to discern relationships of Y to X with 20+ different symbols strewn about the graph. So this only works when the Qualitative independent variable has just a few values. Lecture 11_Qualitative And Quantitative IVs - 2 7/28/2017 The regression analysis Note that since the Qualitative factor has only two levels, we do not have to create group coding variables. Regression Va riabl es Entere d/Rem ov e db Mo del 1 Va riable s Re move d Va riable s En tered X, GRO UP a Me thod . En ter a. All requ ested vari ables ente red. b. De pend ent V ariab le: Y Model S ummary Mo del 1 R .78 1 a Std . Erro r of the Estim ate 8.2 1003 R S quare Ad justed R S quare .61 0 .58 1 a. Pre dicto rs: (Consta nt), X , GROUP ANOVA b Mo del 1 Re gression Su m of Squa res 28 44.77 4 df 2 Me an S quare 14 22.38 7 67 .405 Re sidua l 18 19.92 6 27 To tal 46 64.70 0 29 F 21 .102 Sig . .00 0 a a. Pre dicto rs: (Consta nt), X , GROUP b. De pend ent V ariab le: Y Coeffici ents a Un stand ardized Co efficients Mo del 1 (Co nstan t) B 14 .644 Std . Erro r 7.0 95 9.9 46 3.0 57 .70 7 .14 5 GROUP X Sta ndardized Co effici ents Be ta t Sig . 2.0 64 .04 9 .39 9 3.2 53 .00 3 .59 8 4.8 77 .00 0 a. De pend ent V ariab le: Y Verbal Interpretation of B Coefficients BX = .707: Among persons in the same group, a 1-point increase in cognitive ability 0 is associated with a .707 increase in performance. Relationship is statistically significant. 1 BGroup = 9.946: Among persons of equal cognitive ability, a 1-point change in GROUP, i.e., moving from Group 0 to Group 1 is associated with a 9.946 increase in performance. Difference is significant. Predicted Y = 14.644 + .707*X + 9.946*GROUP. When one of the variables is a grouping variable, analysts often write separate equations for each group. For people in Group 1, Predicted Y = 14.644 + .707*X + 9.946*1 = 24.590 + .707*X For people in Group 0, Predicted Y = 14.644 + .707*X + 9.946*0 = 14.644 + .707*X Lecture 11_Qualitative And Quantitative IVs - 3 7/28/2017 Showing the regression parameters on the graph The separate group regression lines are drawn on the graph. Group 1: Predicted Y = 24.590 + .707*X Group 0: Predicted Y = 14.644 + .707*X 80 Y-hat = 24.644 + .707X 70 Y-hat = 14.590 + .707X 60 50 40 GROUP 30 TP=1 Y 20 TP=0 20 30 40 50 60 70 X This plot precisely represents the results of the MR analysis. Specifically . . . 1. The common slope represents the partial regression coefficient for X - .707. It's the expected change in Y when X increases by 1 among persons in the same group. As shown above there are two lines – one for Group 0 and the other for Group 1. Both have the same slope: 0.707. 2. The difference in heights of the two lines represents the partial regression coefficient for Group. It's the expected change in Y when Group changes by 1 unit (going from 0 to 1 or vice versa) among persons with the same X value. Lecture 11_Qualitative And Quantitative IVs - 4 7/28/2017 More on the partial regression coefficient for the dichotomous variable, Group in this case. 80 70 60 50 40 GROUP Partial regression coefficient forTP=1 the Group (9.946). 30 Y 20 TP=0 20 30 40 50 60 70 Cognitive INITABILAbility The partial regression coefficient for the dichotomous predictor (Group) is the difference in height of the two lines. Among persons equal on X (that is, among person at a particular position on the X-axis a one -unit difference in group (going from one group to the other) is associated with a 9.946 difference in Y. 80 Think of the people equal on X as those within a thin vertical slice of the scatterplot, as illustrated here. 70 60 50 40 GROUP 30 TP=1 Y 20 20 TP=0 30 40 50 INITABIL Cognitive Ability 60 70 A bunch of people with the same X value. X Lecture 11_Qualitative And Quantitative IVs - 5 7/28/2017 The Y vs. X Graph with group-specific lines through each group of points. 80 70 60 50 40 GROUP 30 Y 1.00 20 .00 20 30 40 50 60 70 X This plot is different from that on the previous page in that the lines through each group’s points have been allowed to have slopes appropriate to the group through which they’re drawn. In the previous graph, the slopes of the lines were constrained to be equal to 0.707, the partial regression slope for X. The regression analysis conducted for these data assumed equal slopes in the two populations. The graph here provides a visual test of that assumption. To my eyes the two slopes are pretty nearly the same, but not precisely parallel. There is a formal test of the inequality of the slopes. If that test is significant, then a more complicated analysis must be performed. More on that in the lecture on moderation and in the lecture on analysis of covariance. Lecture 11_Qualitative And Quantitative IVs - 6 7/28/2017 Regression with One 3-Category Qualitative and One Quantitative Independent Variable (Data are RandomizedBlocks in P595B HA folder. N is 240, too large to allow the data to be displayed here.) A company is considering switching from the current training program to one of two alternative programs. The current program has been in effect for many years. Two alternative programs have recently become available. The first is a video / computer based method, involving presenting key information on DVDs and then using a stand-alone computer program to present drill-and-practice on the material presented. The 2nd is a completely web-based training method, in which the information is presented over the web and interactive drill-and-practice are also presented over the web. Before the company decides to make any kind of switch, it must determine whether there are any significant differences in learning after having gone through the three programs. Two hundred forty participants are randomly assigned to one of three groups of trainees with 80 participants in each group. One group receives the current training, the 2nd group receives the video/computer training, and the third receives the web-based training. A measure of cognitive ability (X), known to predict performance after training, is taken prior to beginning training. The data are as follows. X is the cognitive ability measure. Y is the performance after training. TRTMENT is the training program: 1=standard; 2=video/computer, and 3=web-based. So, for this problem, we’re comparing TRTMENT means controlling for cognitive ability. Summarize De scriptiv e S tatis tics TRTME N 1 T 2 X N 80 Me an 49. 68 110 .06 Me dian 50. 50 111 .00 Std . De viatio n 10. 45 15. 51 N 80 80 Me an 49. 54 111 .95 Me dian 50. 00 114 .00 9.6 5 13. 96 Std . De viatio n 3 To tal Y 80 N 80 80 Me an 49. 21 117 .94 Me dian 51. 00 120 .00 Std . De viatio n 10. 91 15. 28 N 240 240 Me an 49. 47 113 .32 Me dian 50. 00 114 .00 Std . De viatio n 10. 31 15. 25 Lecture 11_Qualitative And Quantitative IVs - 7 7/28/2017 The questions we’re asking are the following: 1) Among persons equal in cognitive ability (X), are there mean differences between the groups in performance after training? This is the main question. 2) Among persons within the same group, is there a relationship between performance after training and cognitive ability? This is a question that we assume will be answered positively, otherwise we wouldn’t control for cognitive ability. But we will check it anyway. The graphical representation with plot of Y vs. X with different plotting symbols for each group. (Individual group regression lines were added because they’re so easy to get.) 1) Note that the Treatment 3 line – the green one - is above the Treatment 2 line (the red one) which is (usually) above the Treatment 1 line (the dotted one). This hints at differences between average performance between groups. 2) Note the generally positive relationship of Y to X, overall and within each group. Looks like the dependent variable is related to the cognitive ability. The analysis using the Regression Procedure The data originally consists of just 3 columns - the Y column, the TRTMENT column, and the X column. Lecture 11_Qualitative And Quantitative IVs - 8 7/28/2017 Creation of Group Coding Variables To analyze the problem using the REGRESSION procedure, we must create group coding variables for the TRTMENT variable. Since one of the groups is a natural control group, we’ll use dummy coding, using TRTMENT=1 as the reference group. So the coding will be TRTMENT 1 2 3 DCODE1 0 1 0 DCODE2 0 0 1 So, after creating group coding variables and using them to represent the groups, the data editor might look like the following The hypotheses being tested. 1. We want to assess the significance of differences in mean performance between groups, controlling for X. The first test is assessed by computing the increase in R2 due to addition of the group coding variables to the equation. We do a regression without the group coding variables, then adding the group coding variables, as a set, to the equation. Specifically, first, we enter X Second, we enter the two group-coding variables representing TRTMENT. We assess the significance of R2 change in Step 2. 2. We want to assess the significance of the relationship of Y to X, controlling for group differences. Since X is a single variable, we can assess its significance by simply examining the t value (and its pvalue) in the Coefficients box. Lecture 11_Qualitative And Quantitative IVs - 9 7/28/2017 Regression A two-step regression analysis is conducted. Two steps are needed because one of the tests involves a set of independent variables. Step 1: Enter continuous predictor, X. Step 2: Add the set of group-coding variables, DCODE1, DCODE2. [DataSet1] F:\MdbT\P595B\HAs\RandomizedBlocks.sav De scriptiv e Statis tics x Me an Std . Deviatio n 49 .48 10 .307 y N 24 0 11 3.32 15 .247 24 0 dco de1 .33 .47 2 24 0 dco de2 .33 .47 2 24 0 Group Standard Video Web DCODE1 0 1 0 DCODE2 0 0 1 Correla tions Pe arson Correlatio n x x 1.0 00 y .43 0 dco de1 .00 4 dco de2 -.0 18 y .43 0 1.0 00 -.0 64 .21 5 dco de1 .00 4 -.0 64 1.0 00 -.5 00 dco de2 -.0 18 .21 5 -.5 00 1.0 00 Va riable s Entered/Rem ov e db Mo del 1 Va riable s En tered xa 2 dco de1, dcod e2 a Va riable s Re move d Me thod . En ter Significance of the increase in R2 (from 0) when X (cognitive ability) was added to the equation, p < .001. . En ter a. All requ ested varia bles entered. b. De pend ent V ariab le: y Model S umm ary Mo del 1 Ch ange Stat istics Std . Erro r of R R S quare Ad justed R S quare the Estim ate R S quare Ch ange F Chang e df1 df2 Sig . F Chang e .43 0 a .18 5 .18 1 13 .798 .18 5 53 .847 1 23 8 .00 0 2 .48 7 b .23 7 .22 7 13 .404 .05 2 8.0 92 2 23 6 .00 0 a. Pre dicto rs: (Consta nt), x b. Pre dicto rs: (Consta nt), x, dco de1, dcod e2 Significance of the increase in R2 when the group-coding variables were added to the equation, p < .001. Lecture 11_Qualitative And Quantitative IVs - 10 7/28/2017 ANOVAc Mo del 1 2 Re gressi on Su m of Squa res 102 50.9 65 df Me an S quare 102 50.9 65 1 Re sidua l 453 08.9 68 238 To tal 555 59.9 33 239 Re gressi on 131 58.5 65 3 438 6.18 8 Re sidua l 424 01.3 69 236 179 .667 To tal 555 59.9 33 239 F 53. 847 Sig . .00 0 a 24. 413 .00 0 b Performance is related to the whole collection of IVs in each model. For Mode1 1, X is the only predictor. For Model 2, the X and the two group coding variables make up the collection of IVs. 190 .374 a. Pre dicto rs: (Consta nt), x b. Pre dicto rs: (Consta nt), x, dcod e1, d code 2 c. De pend ent V ariabl e: y The significance of each individual variable can be obtained from the Coefficients table below. You’ll probably want the significance of X, the continuous predictor. You may also want the significance of the dummy coding variables. Coeffici ents a Un stand ardized Co effici ents Mo del 1 (Co nstan t) x B Std . Erro r 81 .880 4.3 76 Sta ndardized Co effici ents Be ta Co rrelat ions t 18 .712 .63 5 .08 7 .43 0 78 .182 4.4 40 .64 2 .08 4 .43 4 dco de1 1.9 76 2.1 19 dco de2 8.1 72 2.1 20 Sig . Ze ro-ord er .00 0 7.3 38 .00 0 17 .609 .00 0 7.6 28 .06 1 .93 2 .25 3 3.8 55 Pa rtial Pa rt .43 0 .43 0 .43 0 .00 0 .43 0 .44 5 .43 4 .35 2 -.0 64 .06 1 .05 3 .00 0 .21 5 .24 3 .21 9 dco de1 dco de2 2 (Co nstan t) x a. De pend ent V ariab le: y Ex clude d Va riabl es b Co llinea rity Sta tistics Mo del 1 dco de1 Be ta In -.0 65 a t -1. 117 dco de2 .22 3 a 3.9 14 Sig . Pa rtial Corre lation .26 5 -.0 72 .00 0 .24 6 To leran ce 1.0 00 1.0 00 a. Pre dicto rs in the M ode l: (Co nstan t), x b. De pend ent V ariab le: y Mean of Group 3 was significantly larger than the mean of the standard group among people of equal X. I’ve never used this table. Lecture 11_Qualitative And Quantitative IVs - 11 7/28/2017 The analysis with GLM The data, again. Specifying the analysis . . . Putting the name of a variable in the Fixed Factor(s) field tells GLM that the variable needs group coding variables. GLM will automatically create them. Don’t put the name of a quantitative variable in the Fixed Factor(s) field. GLM will create many many many group coding variables, then perhaps terminate with an error message. Lecture 11_Qualitative And Quantitative IVs - 12 7/28/2017 Plots . . . Post hocs . . . Alas, Post Hocs are not available when you have a continuous Covariate. So, we can’t, for example, compare the group means with the means of the control group. This is a problem with the use of GLM and one of the reasons it may pay to know how to do this analysis using the REGRESSION procedure . Options . . . Lecture 11_Qualitative And Quantitative IVs - 13 7/28/2017 Results . . . Univariate Analysis of Variance [DataSet1] G:\MdbT\P595B\HAs\RandomizedBlocks.sav Between-Subjects Factors N trtment 1 80 2 80 3 80 Descriptive Statistics Dependent Variable:y trtment Mean Std. Deviation N 1 110.06 15.514 80 2 111.95 13.965 80 3 117.94 15.276 80 Total 113.32 15.247 240 Levene's Test of Equality of Error Variancesa Dependent Variable:y F df1 .064 df2 2 Sig. 237 .938 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept + x + trtment Lecture 11_Qualitative And Quantitative IVs - 14 7/28/2017 Tests of Between-Subjects Effects Dependent Variable:y Partial Type III Sum of Source Squares df Mean Square F Sig. Eta Noncent. Observed Squared Parameter Powerb 13158.565a 3 4386.188 24.413 .000 .237 73.239 1.000 Intercept 66125.624 1 66125.624 368.046 .000 .609 368.046 1.000 x 10453.806 1 10453.806 58.184 .000 .198 58.184 1.000 2907.600 2 1453.800 8.092 .000 .064 16.183 .956 Error 42401.369 236 179.667 Total 3137320.000 240 55559.933 239 Corrected Model trtment Corrected Total a. R Squared = .237 (Adjusted R Squared = .227) b. Computed using alpha = .05 Corrected Model: Same information as in the REGRESSION Anova box. Intercept: Test of the hypothesis that the population Y-intercept (Constant in REGRESSION output) is zero. X: Test of the hypothesis that in the population, controlling for trtment, the slope relating DV to X is zero. Conclusion: Among persons equal on trtment (i.e., in the same group) there is a significant relationship of Y to X (cognitive ability) Trtment: Test of the hypothesis that the population means of the 3 conditions are equal when controlling for differences in X. Conclusion: Among persons equal on X (cognitive ability) there are significant differences in the means of the three groups. Lecture 11_Qualitative And Quantitative IVs - 15 7/28/2017 New Topic - Creating Scale scores – Start here on 4/9/13 Questions like: “Does Conscientiousness predict Test Performance?” are answered by computing scale scores. A scale score is computed from a collection of conscientiousness items. This scale score represents Conscientiousness. A scale score may be computed from the items of a measure of performance. That scale score would represent Performance. Finally, the correlation coefficient between the two scale scores is computed. Procedure for computing a scale score Data: Biderman, Nguyen, & Sebren, 2008. GET FILE='G:\MdbR\1Sebren\SebrenDataFiles\SebrenCombined070726NOMISS2EQ1.sav'. The data typically are entered in the order in which they appear on questionnaire data sheets. In this case, 50 columns contain the responses to the 50-item IPIP Big Five exactly as they appear on the questionnaire. The next 20 or so columns contain the reverse coded responses to the negatively-worded items. I created them using the SPSS RECODE command. 1. Reverse score the negatively-worded items. q2 q4 q6 q8 q10 q12 q14 q16 q18 q20 q22 q24 q26 q28 q29 q30 q32 q34 q36 q38 q39 q44 q46 q49 Here’s syntax to perform the recode: recode q2 q4 q6 q8 q10 q12 q14 q16 q18 q20 q22 q24 q26 q28 q29 q30 q32 q34 q36 q38 q39 q44 q46 q49 (1=7)(2=6)(3=5)(4=4)(5=3)(6=2)(7=1) into q2r q4r q6r q8r q10r q12r q14r q16r q18r q20r q22r q24r q26r q28r q29r q30r q32r q34r q36r q38r q39r q44r q46r q49r. You don’t need to do this using syntax. It can be done using pull-down menus or by hand. But it must be done. Lecture 11_Qualitative And Quantitative IVs - 16 7/28/2017 2. Define Missing Values Tell SPSS if specific values are to be treated as missing. This is very important. A fairly recent thesis student lost several days because the student created scale scores without declaring missing values. MISSING VALUES MUST BE DECLARED FOR ALL ITEMS. 3. Determine which items belong to which scale? The IPIP items are distributed as follows: E A C S O E A C S O E A C S O . . . That is the 1st, 6th, 11th, 16th, 21st, 26th, 31st, 36th, 41st, and 46th items are E items. The 2nd, 7th, 12th, 17th, etc are A. And so forth. 4. Compute Scale scores. 4a. In syntax To compute the E scale score, E = (q1 + q6r + q11 + q16r + q21 + q26 + q31 + q36r + q41 + q46r) / 10. Manual Arithmetic. In syntax, that would be compute e = (q1+q6r+q11+q16r+q21+q26+q31+q36r+q41+q46r)/10. If it’s computed this way, the result for any case with a missing value will be treated as missing. It can also be computed as compute e = mean(q1,q6r,q11,q16r,q21,q26,q31,q36r,q41,q46r). If it’s computed this way, if a response is missing, the mean will be taken across the remaining nonmissing items. So, after all negatively worded items have been recoded, the syntax to compute all of the Big Five scale scores would be compute compute compute compute compute e a c s o = = = = = mean(q1,q6r,q11,q16r,q21,q26r,q31,q36r,q41,q46r). mean(q2r,q7,q12r,q17,q22r,q27,q32r,q37,q42,q47). mean(q3,q8r,q13,q18r,q23,q28r,q33,q38r,q43,q48). mean(q4r,q9,q14r,q19,q24r,q29r,q34r,q39r,q44r,q49r). mean(q5,q10r,q15,q20r,q25,q30r,q35,q40,q45,q50). Cut this page and the previous page out and paste it on your wall for when you analyze your thesis data. Lecture 11_Qualitative And Quantitative IVs - 17 7/28/2017 4b. Computing a scale score using the TRANSFORM menu . . Repeat the above for each scale, substituting appropriate item names. Lecture 11_Qualitative And Quantitative IVs - 18 7/28/2017 5. Run FREQUENCIES on scale scores. Negatively skewed. Lecture 11_Qualitative And Quantitative IVs - 19 7/28/2017 6. Run Reliabilities of each scale Reliability Statistics Cronbach's N of Items e Alpha Based on Cronbach's Standardized Alpha Items .792 .794 10 Reliability Statistics a Cronbach's Alpha Based on Cronbach's Standardized Alpha Items .833 N of Items .832 10 Reliability Statistics c Cronbach's Alpha Based on Cronbach's Standardized Alpha Items .799 N of Items .799 10 s Cronbach's Alpha Based on Cronbach's Standardized Alpha Items .825 N of Items .836 10 Reliability Statistics Cronbach's o Alpha Based on Cronbach's Standardized Alpha Items .848 N of Items .849 10 Lecture 11_Qualitative And Quantitative IVs - 20 7/28/2017 7. Compute correlations between scale scores. Correlations hcon C summated scale scores from IPIP 50-item Big Five hext hext Pearson Correlation hagr 1.000 Sig. (2-tailed) scale hsta hopn .254 .155 .194 .241 .003 .074 .024 .005 N 135 135 135 135 135 Pearson Correlation .254 1.000 .421 .224 .426 Sig. (2-tailed) .003 .000 .009 .000 N 135 135 135 135 135 Pearson Correlation .155 .421 1.000 .277 .238 scores from IPIP 50-item Big Sig. (2-tailed) Five scale N .074 .000 .001 .005 135 135 135 135 135 hsta Pearson Correlation .194 .224 .277 1.000 .226 Sig. (2-tailed) .024 .009 .001 N 135 135 135 135 135 Pearson Correlation .241 .426 .238 .226 1.000 Sig. (2-tailed) .005 .000 .005 .008 N 135 135 135 135 hagr hcon C summated scale hopn .008 135 The mean of the correlations between scale scores is (.254+.155+.194+.241+.421+.224+.426+.277+.238+.226)/10 = .265. Wait!! Aren’t the Big Five dimensions supposed to be independent dimensions of personality? If so, why are they generally positive correlated? This question is leading to a ton of research right now. Key phrases: higher order factors of the Big Five; general factor of personality. Lecture 11_Qualitative And Quantitative IVs - 21 7/28/2017 8. Compute correlations of scale scores with variables your theory says they should correlate with. Correlations hcon C summated scale scores from IPIP 50-item Big Five scale hcon C summated scale scores from IPIP 50-item Big Pearson Correlation test 1.000 Sig. (2-tailed) .086 .320 Five scale test N 135 135 Pearson Correlation .086 1.000 Sig. (2-tailed) .320 N 135 135 In this case, the correlation is not significant. After two years of thinking about it, we hit upon the idea that perhaps the correlation was suppressed by method bias. That turned out to be a viable hypothesis. Lecture 11_Qualitative And Quantitative IVs - 22 7/28/2017
© Copyright 2025 Paperzz