Example: Quadratic Linear Model

1 of 24
Example: Quadratic Linear Model
Gebotys and Roberts (1989) were interested in examining
the effects of one variable (i.e. “age’) on the “seriousness
rating of the crime” (y); however, they wanted to fit a
quadratic curve to the data. The variables to be examined
within this example are “age” and “age squared” (i.e.,
‘agesq’). The model is: E(y|x)= B0 + B1age + B2age2
Complete the following steps in the Linear regression
utilizing SPSS in order to follow the output explanation
starting on page thirteen.
1.
Pull up “Crime” data set. (or enter data below)
September 2008
Dr. Robert Gebotys
2 of 24
Before proceeding to any analysis, the new variable
‘agesq’ needs to be created. To create ‘agesq’ do the
following:
1.
Click Transform on the main menu bar.
2.
Click Compute on the Transform menu. You
should now see the following box:
September 2008
Dr. Robert Gebotys
3 of 24
3.
Type ‘agesq’ in the Target Variable text box,
located at the top left corner of the Compute
Variable dialogue box. This is to specify the new
variable that you are creating, ‘agesq’.
4.
To specify the numeric expression of ‘agesq’ (that
is, the squaring of ‘age’), click “age” in the variable
source list and click the arrow button to the right
of the variable source list box. Then click the “**”
button on the lower right-hand side of the onscreen key pad followed by clicking the ‘2’ button
on the keypad. As a result of the above clicks, the
expression “age**2”will be entered into the
Numeric Expression box.
5.
Click the ‘OK’ command pushbutton of the
dialogue box. You will immediately see that a new
variable, ‘agesq’, has been entered into the Data
Editor. (Note that the variable ‘agesq’ is set to two
decimal points. If you would like to re-set this
variable to “zero” decimal points to be congruent
September 2008
Dr. Robert Gebotys
4 of 24
with other variables do the following: (i) Double
click on ‘agesq’ variable label to “Define label
box”; (ii) See section called ‘Change settings’ and
click Type button; (iii) Change decimal places from
“2” to “0”; (iv) Click button Continue; and finally
(v) Click the ‘OK’ button. Your ‘agesq’ variable
should now contain no decimal places).
September 2008
Dr. Robert Gebotys
5 of 24
6.
At this stage, you may save the data matrix on a
diskette, for example, under the file name
“crimeage2.sav”.
Fitting a Quadratic Regression Curve on the Scatterplot
1.
Start by creating a scatterplot for the set of data on
age and crime seriousness.
2.
To fit a quadratic regression curve, start by
clicking on the scatterplot as it appears in the
Results pane of the SPSS viewer window so that a
thin black line surrounds the scatterplot. Next,
click on Edit on the menu bar and select SPSS
Chart Object from the Edit menu, followed by
clicking Open on the submenu. This series of clicks
will open a SPSS Chart Editor window. The
scatterplot you have produced will appear in the
Chart Editor window.
September 2008
Dr. Robert Gebotys
6 of 24
3.
Click Chart on the Chart Editor window menu
bar, followed by clicking ‘Options…’ in the Chart
menu. When the ‘Scatterplot Options’ dialogue
box is activated, click the check box to the left of
the Total in the ‘Fit Line’ box. Then click the ‘Fit
Options…’ button in the ‘Fit Line’ box. This will
open a ‘Scatterplot Options: Fit Line’ dialogue box
as shown below.
September 2008
Dr. Robert Gebotys
7 of 24
4.
Click the ‘Quadratic regression’ box. Click the
‘Continue’ pushbutton. When the ‘Scatterplot
Options’ dialogue box reappears, click the ‘OK’
command pushbutton. After a few seconds, you
should see that a curve has already been fit to the
scatterplot. A copy of the scatterplot is reproduced
below. At this stage you can edit, save, and print
the scatterplot.
Scatterplot of "Serious" vs "Age"
Quadratic Line Fit
100
80
SERIOUS
60
40
20
10
20
30
40
50
60
70
80
90
AGE
September 2008
Dr. Robert Gebotys
8 of 24
Specifying the Regression Procedure for a Polynomial
of Degree 2
Follow the Regression Procedure in “Example: The
Linear Model with Normal Error”. The one exception
is that both ‘age’ and ‘agesq’ should be defined as the
independent variables in this model. This is done by
clicking both ‘age’ and ‘agesq’ in the variable source
list and then clicking the arrow button to the left of the
‘Independent[s:’ text box. The dialogue box should now
resemble the one below.
September 2008
Dr. Robert Gebotys
9 of 24
After completing the procedures as specified in the
“Example: Linear Model with Normal Error” (i.e.
designate ‘Statistics’, ‘Plots’ and ‘Save’ selections for
this analysis), then click the ‘OK’ command pushbutton
in the Linear Regression dialogue box. This will
instruct SPSS to produce a set of output similar to that
to be discussed in the next section.
Alternate Method to Perform Linear Regression
This is performed by utilizing the Run command in a SPSS
Syntax window. That is, the selections you have made in the
Linear Regression dialogue boxes are actually commands
for the regression procedure. You can click the Paste
command pushbutton in the Linear Regression dialogue
box to paste this underlying command syntax into a Syntax
window (i.e., a syntax window will be opened when the
Paste command is clicked and the selections that you have
made are reproduced in this window in the format of a
command syntax or SPSS programming language).
September 2008
Dr. Robert Gebotys
10 of 24
There is another way of specifying and running the
regression procedure for fitting the quadratic model. If
you have already saved the syntax commands in a file
(e.g., crime.sps), as outlined in “Example: Linear Model
with Normal Error”, you can specify and run the
regression procedure by following the steps described
below:
1.
Insert the diskette containing the relevant file (e.g.,
crime.sps) into drive A.
2.
Click File on the main menu bar.
3.
Click Open in the File menu to activate and open
File dialogue box.
4.
Check if the current position of the drive (i.e. the
“Look in” text box) is drive A. If not, change it to
drive A by clicking on the arrow to the right of the
text box and selecting “3.5 Floppy [A:]”.
September 2008
Dr. Robert Gebotys
11 of 24
5.
Check if the File type text box contains “SPSS
syntax files (*.sps)”. If not, change it by clicking on
the arrow to the right of the text box and selecting
that option using a single click.
6.
The relevant file (e.g., crime.sps) should now be
listed in the large text box in the centre of the Open
File dialogue box. Select the file with a single click
and then proceed to that file by clicking the Open
command pushbutton. This will open a window
titled “crime – SPSS Syntax Editor”.
Sometimes the only ‘problem’
is the solution to the
‘problem’…….That’s it!
Focus on entering the syntax
correctly……..then there is
no ‘problem’, only ‘solutions’
or in SPSS lingo, ‘output’.
September 2008
Dr. Robert Gebotys
12 of 24
7.
Go to the end of the line “/METHOD=ENTER
age” and click once to move your cursor/pointer to
that location. Add a space after ‘age’ and then
type ‘agesq’. Your command syntax should now
look like that listed below:
If you want to save or print this syntax window,
you may do so now.
8.
Click ‘Run’ on the Syntax Editor menu bar, and
then click All on the Run menu. This will instruct
the computer to perform the required regression
procedure.
September 2008
Dr. Robert Gebotys
13 of 24
Quadratic Linear Model
SPSS Output Explanation
The REGRESSION command fits linear models by
least squares. The METHOD subcommand tells SPSS
what variables are in the model. The DEPENDENT
subcommand tells SPSS which is the y variable. The
ENTER command defines the x variables. In this case
we have asked SPSS to fit the model.
E(y|x)=B0 + B1x + B2x2
The output, using SPSS windows or syntax, is
interpreted as follows:
Variables Entered/Removedb
Model
1
Variables
Entered
AGESQ,
a
AGE
Variables
Removed
Method
.
Enter
a. All requested variables entered.
b. Dependent Variable: SERIOUS
September 2008
Dr. Robert Gebotys
14 of 24
Model Summaryb
Model
1
Adjusted R
Square
.969
R
R Square
a
.988
.976
Std. Error
of the
Estimate
3.74
Durbin-Watson
2.081
a. Predictors: (Constant), AGESQ, AGE
b. Dependent Variable: SERIOUS
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
3896.297
97.803
3994.100
df
2
7
9
Mean
Square
1948.149
13.972
F
139.434
Sig.
.000a
a. Predictors: (Constant), AGESQ, AGE
b. Dependent Variable: SERIOUS
Coefficientsa
Unstandardized
Coefficients
Model
1
B
(Constant)
20.683
AGE
-8.49E-02
AGESQ
1.262E-02
Std. Error
8.823
.410
.004
Standardi
zed
Coefficien
ts
Beta
-.069
1.055
t
2.344
-.207
3.174
Sig.
.052
.842
.016
95% Confidence Interval
for B
Lower
Upper
Bound
Bound
-.180
41.545
-1.055
.885
.003
.022
a. Dependent Variable: SERIOUS
September 2008
Dr. Robert Gebotys
15 of 24
In order to determine if the model is adequate we
examine the ANOVA table. Note the degrees of
freedom and F-statistic values.
F = 139.434
which has an F distribution with 2 (number of
parameters [3] – intercept B0 [1] = 3 – 1 = 2) and 7
(number of observations – number of parameters
= 10 – 3 = 7) degrees of freedom. We reject
Ho: B1 = B2 = 0
Ha: B1 ≠ B2 ≠ 0
with p-value less than .0001, the SIGNIF value on the
output. The REGRESSION row refers to the model
and the RESIDUAL row refers to the error component.
The mean square of the residual is equal to s2, our
estimate of sigma2, note s is also printed in the STD
ERROR OF THE ESTIMATE column.
s2 = MSE= 13.972
s= 3.74
September 2008
Dr. Robert Gebotys
16 of 24
In the same area we also have R2, R SQUARE printed
where:
R2 = SSM/SST
= 3896/3994
=
.97551
In other words 97.551% of the variance in seriousness is
accounted for by the model (‘age’, ‘agesq’).
In the Coefficients section the column model
variable lists the variable ‘age’, ‘agesq’ and ‘constant’
these refer to the variables associated with the
parameters B0, B1, and B2 in the model. The column
labeled B given the least squares (b0 = 20.683,
B1 = -.085, b2 = .0126) estimator for B0, B1, and B2.
The equation is therefore
E(y|x) = 20.683 – 085x + .012x2.
September 2008
Dr. Robert Gebotys
17 of 24
The STD ERROR column is the standard error column for
each of the parameters for example
s(b0) = 8.823
s(b1) = .410
s(b2) = .004
the T column gives the corresponding t statistic for testing
the hypothesis
H0: B1 = 0
Ha: B1 ≠ 0
T = b1/s(b1) = -.207
H0: B2 = 0
Ha: B2 ≠ 0
T = 3.174
For B1 the t statistic has the value -.207, and for B2
the t statistic has the value 3.174. The column SIG gives
the OLS or p-value for the test above. In this case we have
p=.842 for B1 (not significant, therefore we cannot reject
H0) and p=.016 for B2 (significant, therefore we can reject
H0); however, p=.016 for B2 , therefore there is strong
evidence against H0. Both are with 7 degrees of freedom.
September 2008
Dr. Robert Gebotys
18 of 24
Casewise diagnostics gives a STD RESIDUAL COL with
STANDARDIZED VALUES between ≠ 3. standard
deviations which is reasonable. The Durbin-Watson
Statistic is about 2 which indicates zero correlation. The
leverage (LEVER) and Cook’s distance (COOK D) values
for the 10th observation are relatively large
(h10 = .8977, D = 405.069) indicating this is an influential
observation. If we compare h10 to 2meanh where
meanh = .2 (found in the summary statistics section) our
suspicion that the 10th observation (a person 80 year old
with a high seriousness rating) is influential is confirmed.
Notice that the residual for this observation is small and
therefore not an outlier.
Casewise Diagnostics
Case Number
1
2
3
4
5
6
7
8
9
10
Std.
Residual
-.812
.414
-.003
-.121
.937
.965
-1.736
-.666
.939
.082
SERIOUS
21
28
27
26
33
36
31
35
41
95
a
Predicted
Value
24.04
26.45
27.01
26.45
29.50
32.39
37.49
37.49
37.49
94.69
Residual
-3.04
1.55
-1.08E-02
-.45
3.50
3.61
-6.49
-2.49
3.51
.31
a. Dependent Variable: SERIOUS
September 2008
Dr. Robert Gebotys
19 of 24
Residuals Statisticsa
Predicted Value
Std. Predicted Value
Standard Error of
Predicted Value
Adjusted Predicted Value
Residual
Std. Residual
Stud. Residual
Deleted Residual
Stud. Deleted Residual
Mahal. Distance
Cook's Distance
Centered Leverage Value
Minimum
24.04
-.638
Maximum
94.69
2.758
Mean
37.30
.000
Std.
Deviation
20.81
1.000
1.28
3.73
1.93
.72
10
-35.46
-6.49
-1.736
-2.013
-8.73
-2.872
.148
.000
.016
39.73
3.61
.965
1.690
130.46
2.034
8.079
405.070
.898
24.58
-7.11E-16
.000
.127
12.72
.077
1.800
40.618
.200
21.72
3.30
.882
1.156
41.60
1.400
2.357
128.055
.262
10
10
10
10
10
10
10
10
10
a. Dependent Variable: SERIOUS
Look we got
studentized and
other residual
statistics without
asking for them.
Bonus!
September 2008
Dr. Robert Gebotys
N
10
10
20 of 24
The histogram of residuals looks reasonable, although
with 10 observations, this is difficult to judge.
Histogram
Dependent Variable: SERIOUS
3.5
3.0
2.5
2.0
Frequency
1.5
1.0
Std. Dev = .88
.5
Mean = 0.00
N = 10.00
0.0
-1.50
-1.00
-.50
0.00
.50
1.00
Regression Standardized Residual
The probability plot has improved, from the previous
“Example: Linear Model with Normal Error”, in the
sense that the residuals more closely approximate a
normal distribution. The large bulge present in the
September 2008
Dr. Robert Gebotys
21 of 24
normal probability plot of residuals in the “Example:
Linear Model with Normal Error” is no longer present
in the polynomial of degree 2 model.
Normal P-P Plot of Regression
Standardized Residual
Dependent Variable: SERIOUS
1.00
.75
Ex
pe
cte .50
d
Cu
m .25
Pr
ob
0.00
0.00
.25
.50
.75
1.00
Observed Cum Prob
September 2008
Dr. Robert Gebotys
22 of 24
The plot of y predicted vs e, displays a reasonable band
shape as
well.
Scatterplot
Dependent Variable: SERIOUS
Regression Standardized Residual
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
-1.0
-.5
0.0
.5
1.0
1.5
2.0
2.5
3.0
Regression Standardized Predicted Value
Yes! A reasonable
band shape for the
residuals and
predicted values.
September 2008
Dr. Robert Gebotys
23 of 24
In a future example, we will learn how to compare
the polynomial model discussed in this example, and the
linear model (previous example), utilizing the ANOVA
technique. Although the polynomial model has a higher
R2 than the line we do not know whether this
improvement is statistically significant. In a future
example we will learn how to compare these types of
nested models.
See next page for data table following analysis.
Now let’s see how the
additional variables in
the data matrix compare
to the syntax command
for this analysis?
September 2008
Dr. Robert Gebotys
24 of 24
age serious agesq
20
21
400
25
28
625
26
27
676
25
26
625
30
33
900
34
36
1156
40
31
1600
40
35
1600
40
41
1600
80
95
6400
January 2000
pre_1
24.035
26.452
27.011
26.452
29.499
32.392
37.488
37.488
37.488
94.694
res_1
-3.035
1.548
-0.011
-0.452
3.501
3.608
-6.488
-2.488
3.512
0.306
zpr_1
-0.638
-0.521
-0.495
-0.521
-0.375
-0.236
0.009
0.009
0.009
2.758
zre_1 coo_1
-0.812
0.315
0.414
0.016
-0.003
0.000
-0.121
0.001
0.937
0.044
0.965
0.062
-1.736
0.466
-0.666
0.068
0.939
0.136
0.082 405.070
lev_1
0.344
0.084
0.058
0.084
0.016
0.046
0.156
0.156
0.156
0.898
Dr. Robert Gebotys
lmci_1 umci_1 lici_1 uici_1
18.148 29.923 13.416 34.655
22.658 30.246 16.833 36.070
23.496 30.526 17.499 36.523
22.658 30.246 16.833 36.070
26.483 32.516 20.160 38.839
29.010 35.774 22.928 41.856
33.013 41.964 27.581 47.395
33.013 41.964 27.581 47.395
33.013 41.964 27.581 47.395
85.866 103.522 82.201 107.186