1 of 24 Example: Quadratic Linear Model Gebotys and Roberts (1989) were interested in examining the effects of one variable (i.e. “age’) on the “seriousness rating of the crime” (y); however, they wanted to fit a quadratic curve to the data. The variables to be examined within this example are “age” and “age squared” (i.e., ‘agesq’). The model is: E(y|x)= B0 + B1age + B2age2 Complete the following steps in the Linear regression utilizing SPSS in order to follow the output explanation starting on page thirteen. 1. Pull up “Crime” data set. (or enter data below) September 2008 Dr. Robert Gebotys 2 of 24 Before proceeding to any analysis, the new variable ‘agesq’ needs to be created. To create ‘agesq’ do the following: 1. Click Transform on the main menu bar. 2. Click Compute on the Transform menu. You should now see the following box: September 2008 Dr. Robert Gebotys 3 of 24 3. Type ‘agesq’ in the Target Variable text box, located at the top left corner of the Compute Variable dialogue box. This is to specify the new variable that you are creating, ‘agesq’. 4. To specify the numeric expression of ‘agesq’ (that is, the squaring of ‘age’), click “age” in the variable source list and click the arrow button to the right of the variable source list box. Then click the “**” button on the lower right-hand side of the onscreen key pad followed by clicking the ‘2’ button on the keypad. As a result of the above clicks, the expression “age**2”will be entered into the Numeric Expression box. 5. Click the ‘OK’ command pushbutton of the dialogue box. You will immediately see that a new variable, ‘agesq’, has been entered into the Data Editor. (Note that the variable ‘agesq’ is set to two decimal points. If you would like to re-set this variable to “zero” decimal points to be congruent September 2008 Dr. Robert Gebotys 4 of 24 with other variables do the following: (i) Double click on ‘agesq’ variable label to “Define label box”; (ii) See section called ‘Change settings’ and click Type button; (iii) Change decimal places from “2” to “0”; (iv) Click button Continue; and finally (v) Click the ‘OK’ button. Your ‘agesq’ variable should now contain no decimal places). September 2008 Dr. Robert Gebotys 5 of 24 6. At this stage, you may save the data matrix on a diskette, for example, under the file name “crimeage2.sav”. Fitting a Quadratic Regression Curve on the Scatterplot 1. Start by creating a scatterplot for the set of data on age and crime seriousness. 2. To fit a quadratic regression curve, start by clicking on the scatterplot as it appears in the Results pane of the SPSS viewer window so that a thin black line surrounds the scatterplot. Next, click on Edit on the menu bar and select SPSS Chart Object from the Edit menu, followed by clicking Open on the submenu. This series of clicks will open a SPSS Chart Editor window. The scatterplot you have produced will appear in the Chart Editor window. September 2008 Dr. Robert Gebotys 6 of 24 3. Click Chart on the Chart Editor window menu bar, followed by clicking ‘Options…’ in the Chart menu. When the ‘Scatterplot Options’ dialogue box is activated, click the check box to the left of the Total in the ‘Fit Line’ box. Then click the ‘Fit Options…’ button in the ‘Fit Line’ box. This will open a ‘Scatterplot Options: Fit Line’ dialogue box as shown below. September 2008 Dr. Robert Gebotys 7 of 24 4. Click the ‘Quadratic regression’ box. Click the ‘Continue’ pushbutton. When the ‘Scatterplot Options’ dialogue box reappears, click the ‘OK’ command pushbutton. After a few seconds, you should see that a curve has already been fit to the scatterplot. A copy of the scatterplot is reproduced below. At this stage you can edit, save, and print the scatterplot. Scatterplot of "Serious" vs "Age" Quadratic Line Fit 100 80 SERIOUS 60 40 20 10 20 30 40 50 60 70 80 90 AGE September 2008 Dr. Robert Gebotys 8 of 24 Specifying the Regression Procedure for a Polynomial of Degree 2 Follow the Regression Procedure in “Example: The Linear Model with Normal Error”. The one exception is that both ‘age’ and ‘agesq’ should be defined as the independent variables in this model. This is done by clicking both ‘age’ and ‘agesq’ in the variable source list and then clicking the arrow button to the left of the ‘Independent[s:’ text box. The dialogue box should now resemble the one below. September 2008 Dr. Robert Gebotys 9 of 24 After completing the procedures as specified in the “Example: Linear Model with Normal Error” (i.e. designate ‘Statistics’, ‘Plots’ and ‘Save’ selections for this analysis), then click the ‘OK’ command pushbutton in the Linear Regression dialogue box. This will instruct SPSS to produce a set of output similar to that to be discussed in the next section. Alternate Method to Perform Linear Regression This is performed by utilizing the Run command in a SPSS Syntax window. That is, the selections you have made in the Linear Regression dialogue boxes are actually commands for the regression procedure. You can click the Paste command pushbutton in the Linear Regression dialogue box to paste this underlying command syntax into a Syntax window (i.e., a syntax window will be opened when the Paste command is clicked and the selections that you have made are reproduced in this window in the format of a command syntax or SPSS programming language). September 2008 Dr. Robert Gebotys 10 of 24 There is another way of specifying and running the regression procedure for fitting the quadratic model. If you have already saved the syntax commands in a file (e.g., crime.sps), as outlined in “Example: Linear Model with Normal Error”, you can specify and run the regression procedure by following the steps described below: 1. Insert the diskette containing the relevant file (e.g., crime.sps) into drive A. 2. Click File on the main menu bar. 3. Click Open in the File menu to activate and open File dialogue box. 4. Check if the current position of the drive (i.e. the “Look in” text box) is drive A. If not, change it to drive A by clicking on the arrow to the right of the text box and selecting “3.5 Floppy [A:]”. September 2008 Dr. Robert Gebotys 11 of 24 5. Check if the File type text box contains “SPSS syntax files (*.sps)”. If not, change it by clicking on the arrow to the right of the text box and selecting that option using a single click. 6. The relevant file (e.g., crime.sps) should now be listed in the large text box in the centre of the Open File dialogue box. Select the file with a single click and then proceed to that file by clicking the Open command pushbutton. This will open a window titled “crime – SPSS Syntax Editor”. Sometimes the only ‘problem’ is the solution to the ‘problem’…….That’s it! Focus on entering the syntax correctly……..then there is no ‘problem’, only ‘solutions’ or in SPSS lingo, ‘output’. September 2008 Dr. Robert Gebotys 12 of 24 7. Go to the end of the line “/METHOD=ENTER age” and click once to move your cursor/pointer to that location. Add a space after ‘age’ and then type ‘agesq’. Your command syntax should now look like that listed below: If you want to save or print this syntax window, you may do so now. 8. Click ‘Run’ on the Syntax Editor menu bar, and then click All on the Run menu. This will instruct the computer to perform the required regression procedure. September 2008 Dr. Robert Gebotys 13 of 24 Quadratic Linear Model SPSS Output Explanation The REGRESSION command fits linear models by least squares. The METHOD subcommand tells SPSS what variables are in the model. The DEPENDENT subcommand tells SPSS which is the y variable. The ENTER command defines the x variables. In this case we have asked SPSS to fit the model. E(y|x)=B0 + B1x + B2x2 The output, using SPSS windows or syntax, is interpreted as follows: Variables Entered/Removedb Model 1 Variables Entered AGESQ, a AGE Variables Removed Method . Enter a. All requested variables entered. b. Dependent Variable: SERIOUS September 2008 Dr. Robert Gebotys 14 of 24 Model Summaryb Model 1 Adjusted R Square .969 R R Square a .988 .976 Std. Error of the Estimate 3.74 Durbin-Watson 2.081 a. Predictors: (Constant), AGESQ, AGE b. Dependent Variable: SERIOUS ANOVAb Model 1 Regression Residual Total Sum of Squares 3896.297 97.803 3994.100 df 2 7 9 Mean Square 1948.149 13.972 F 139.434 Sig. .000a a. Predictors: (Constant), AGESQ, AGE b. Dependent Variable: SERIOUS Coefficientsa Unstandardized Coefficients Model 1 B (Constant) 20.683 AGE -8.49E-02 AGESQ 1.262E-02 Std. Error 8.823 .410 .004 Standardi zed Coefficien ts Beta -.069 1.055 t 2.344 -.207 3.174 Sig. .052 .842 .016 95% Confidence Interval for B Lower Upper Bound Bound -.180 41.545 -1.055 .885 .003 .022 a. Dependent Variable: SERIOUS September 2008 Dr. Robert Gebotys 15 of 24 In order to determine if the model is adequate we examine the ANOVA table. Note the degrees of freedom and F-statistic values. F = 139.434 which has an F distribution with 2 (number of parameters [3] – intercept B0 [1] = 3 – 1 = 2) and 7 (number of observations – number of parameters = 10 – 3 = 7) degrees of freedom. We reject Ho: B1 = B2 = 0 Ha: B1 ≠ B2 ≠ 0 with p-value less than .0001, the SIGNIF value on the output. The REGRESSION row refers to the model and the RESIDUAL row refers to the error component. The mean square of the residual is equal to s2, our estimate of sigma2, note s is also printed in the STD ERROR OF THE ESTIMATE column. s2 = MSE= 13.972 s= 3.74 September 2008 Dr. Robert Gebotys 16 of 24 In the same area we also have R2, R SQUARE printed where: R2 = SSM/SST = 3896/3994 = .97551 In other words 97.551% of the variance in seriousness is accounted for by the model (‘age’, ‘agesq’). In the Coefficients section the column model variable lists the variable ‘age’, ‘agesq’ and ‘constant’ these refer to the variables associated with the parameters B0, B1, and B2 in the model. The column labeled B given the least squares (b0 = 20.683, B1 = -.085, b2 = .0126) estimator for B0, B1, and B2. The equation is therefore E(y|x) = 20.683 – 085x + .012x2. September 2008 Dr. Robert Gebotys 17 of 24 The STD ERROR column is the standard error column for each of the parameters for example s(b0) = 8.823 s(b1) = .410 s(b2) = .004 the T column gives the corresponding t statistic for testing the hypothesis H0: B1 = 0 Ha: B1 ≠ 0 T = b1/s(b1) = -.207 H0: B2 = 0 Ha: B2 ≠ 0 T = 3.174 For B1 the t statistic has the value -.207, and for B2 the t statistic has the value 3.174. The column SIG gives the OLS or p-value for the test above. In this case we have p=.842 for B1 (not significant, therefore we cannot reject H0) and p=.016 for B2 (significant, therefore we can reject H0); however, p=.016 for B2 , therefore there is strong evidence against H0. Both are with 7 degrees of freedom. September 2008 Dr. Robert Gebotys 18 of 24 Casewise diagnostics gives a STD RESIDUAL COL with STANDARDIZED VALUES between ≠ 3. standard deviations which is reasonable. The Durbin-Watson Statistic is about 2 which indicates zero correlation. The leverage (LEVER) and Cook’s distance (COOK D) values for the 10th observation are relatively large (h10 = .8977, D = 405.069) indicating this is an influential observation. If we compare h10 to 2meanh where meanh = .2 (found in the summary statistics section) our suspicion that the 10th observation (a person 80 year old with a high seriousness rating) is influential is confirmed. Notice that the residual for this observation is small and therefore not an outlier. Casewise Diagnostics Case Number 1 2 3 4 5 6 7 8 9 10 Std. Residual -.812 .414 -.003 -.121 .937 .965 -1.736 -.666 .939 .082 SERIOUS 21 28 27 26 33 36 31 35 41 95 a Predicted Value 24.04 26.45 27.01 26.45 29.50 32.39 37.49 37.49 37.49 94.69 Residual -3.04 1.55 -1.08E-02 -.45 3.50 3.61 -6.49 -2.49 3.51 .31 a. Dependent Variable: SERIOUS September 2008 Dr. Robert Gebotys 19 of 24 Residuals Statisticsa Predicted Value Std. Predicted Value Standard Error of Predicted Value Adjusted Predicted Value Residual Std. Residual Stud. Residual Deleted Residual Stud. Deleted Residual Mahal. Distance Cook's Distance Centered Leverage Value Minimum 24.04 -.638 Maximum 94.69 2.758 Mean 37.30 .000 Std. Deviation 20.81 1.000 1.28 3.73 1.93 .72 10 -35.46 -6.49 -1.736 -2.013 -8.73 -2.872 .148 .000 .016 39.73 3.61 .965 1.690 130.46 2.034 8.079 405.070 .898 24.58 -7.11E-16 .000 .127 12.72 .077 1.800 40.618 .200 21.72 3.30 .882 1.156 41.60 1.400 2.357 128.055 .262 10 10 10 10 10 10 10 10 10 a. Dependent Variable: SERIOUS Look we got studentized and other residual statistics without asking for them. Bonus! September 2008 Dr. Robert Gebotys N 10 10 20 of 24 The histogram of residuals looks reasonable, although with 10 observations, this is difficult to judge. Histogram Dependent Variable: SERIOUS 3.5 3.0 2.5 2.0 Frequency 1.5 1.0 Std. Dev = .88 .5 Mean = 0.00 N = 10.00 0.0 -1.50 -1.00 -.50 0.00 .50 1.00 Regression Standardized Residual The probability plot has improved, from the previous “Example: Linear Model with Normal Error”, in the sense that the residuals more closely approximate a normal distribution. The large bulge present in the September 2008 Dr. Robert Gebotys 21 of 24 normal probability plot of residuals in the “Example: Linear Model with Normal Error” is no longer present in the polynomial of degree 2 model. Normal P-P Plot of Regression Standardized Residual Dependent Variable: SERIOUS 1.00 .75 Ex pe cte .50 d Cu m .25 Pr ob 0.00 0.00 .25 .50 .75 1.00 Observed Cum Prob September 2008 Dr. Robert Gebotys 22 of 24 The plot of y predicted vs e, displays a reasonable band shape as well. Scatterplot Dependent Variable: SERIOUS Regression Standardized Residual 1.0 .5 0.0 -.5 -1.0 -1.5 -2.0 -1.0 -.5 0.0 .5 1.0 1.5 2.0 2.5 3.0 Regression Standardized Predicted Value Yes! A reasonable band shape for the residuals and predicted values. September 2008 Dr. Robert Gebotys 23 of 24 In a future example, we will learn how to compare the polynomial model discussed in this example, and the linear model (previous example), utilizing the ANOVA technique. Although the polynomial model has a higher R2 than the line we do not know whether this improvement is statistically significant. In a future example we will learn how to compare these types of nested models. See next page for data table following analysis. Now let’s see how the additional variables in the data matrix compare to the syntax command for this analysis? September 2008 Dr. Robert Gebotys 24 of 24 age serious agesq 20 21 400 25 28 625 26 27 676 25 26 625 30 33 900 34 36 1156 40 31 1600 40 35 1600 40 41 1600 80 95 6400 January 2000 pre_1 24.035 26.452 27.011 26.452 29.499 32.392 37.488 37.488 37.488 94.694 res_1 -3.035 1.548 -0.011 -0.452 3.501 3.608 -6.488 -2.488 3.512 0.306 zpr_1 -0.638 -0.521 -0.495 -0.521 -0.375 -0.236 0.009 0.009 0.009 2.758 zre_1 coo_1 -0.812 0.315 0.414 0.016 -0.003 0.000 -0.121 0.001 0.937 0.044 0.965 0.062 -1.736 0.466 -0.666 0.068 0.939 0.136 0.082 405.070 lev_1 0.344 0.084 0.058 0.084 0.016 0.046 0.156 0.156 0.156 0.898 Dr. Robert Gebotys lmci_1 umci_1 lici_1 uici_1 18.148 29.923 13.416 34.655 22.658 30.246 16.833 36.070 23.496 30.526 17.499 36.523 22.658 30.246 16.833 36.070 26.483 32.516 20.160 38.839 29.010 35.774 22.928 41.856 33.013 41.964 27.581 47.395 33.013 41.964 27.581 47.395 33.013 41.964 27.581 47.395 85.866 103.522 82.201 107.186
© Copyright 2026 Paperzz