Missing Values Analysis with IBM SPSS

Missing Values Analysis with IBM SPSS
Analyze, Missing Values Analysis
MVA VARIABLES=Gender Ideal Statoph Nucoph SATM Year Eye
/MAXCAT=25
/CATEGORICAL=Eye
/TTEST PROB PERCENT=5
/MPATTERN
/EM(TOLERANCE=0.001 CONVERGENCE=0.0001 ITERATIONS=25).
[DataSet] C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav
See IntroQ Questionnaire for a description of the questions asked.
Univariate Statistics
N
Gender
Ideal
Statoph
Nucoph
SATM
Year
Eye
Mean
667
662
658
665
529
667
666
1.26
70.32
6.29
58.04
505.29
1997.52
Std.
Deviation
No. of Extremesa
Missing
Count
.438
3.854
2.315
22.423
96.080
8.621
Percent
0
5
9
2
138
0
1
Low
.0
.7
1.3
.3
20.7
.0
.1
0
8
11
26
1
0
High
0
0
0
0
2
0
a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR).
Here we see that almost 21% of the cases are missing data on the SATM variable.
Summary of Estimated Means
Gender
All
Values
EM
Ideal
Statoph
Nucoph
SATM
Year
1.26
70.32
6.29
58.04
505.29 1997.52
1.26
70.32
6.29
58.02
504.60 1997.51
Please read David Howell’s document on the Expectation-Maximization algorithm. The table above and
that below show the results of SPSS’ EM procedure. The algorithm leads to an estimated mean of 504.6 and
standard deviation of 95.77 for SATM, not much different from the observed means for those cases on which we
do have data.
Summary of Estimated Standard Deviations
Gender
All
Values
.438
Ideal
3.854
Statoph
Nucoph
2.315
22.423
SATM
96.080
Year
8.621
EM
.437
3.851
2.308
22.431
95.770
8.620
Separate Variance t Testsa
Gender
t
Statoph
Nucoph
SATM
Year
1.5
.3
-2.7
-.1
.
-1.8
228.9
209.3
221.6
222.7
.
234.1
P(2-tail)
.132
.787
.007
.952
.
.066
# Present
529
527
525
527
529
529
# Missing
138
135
133
138
0
138
Mean(Present)
1.27
70.34
6.17
58.01
505.29
1997.23
Mean(Missing)
1.21
70.24
6.75
58.14
.
1998.65
df
SATM
Ideal
For each quantitative variable, pairs of groups are formed by indicator variables
(present, missing).a
a. Indicator variables with less than 5% missing are not displayed.
These t tests compare the group of cases with data on SATM to the group of cases without data on
SATM. Notice that those who did not answer the SATM question scored significantly higher on the Statophobia
item than did those who did answer the SATM question. There is also a hint that the frequency of failure to
answer the SATM question has increased over the years.
I have cut out most of the table below, but left in enough to show you how SPSS groups cases by
the pattern of missing values. The most frequent pattern was missing data on SATM but not on any
other variables. Cases 646 through 629 and case 631 were missing data on SATM and Statophobia.
Case 630 (and others) were missing data only on Statophobia, and so on.
Missing Patterns (cases with missing values)
Case
# Missing
Missing and Extreme Value Patternsa
% Missing
Gender
Year
Eye
Nucoph
Ideal
Statoph
SATM
660
1
14.3
S
661
1
14.3
S
662
1
14.3
S
626
2
28.6
S
S
627
2
28.6
S
S
628
2
28.6
S
S
629
2
28.6
S
S
631
2
28.6
S
S
630
1
14.3
S
632
1
14.3
S
633
1
14.3
S
-
194
2
28.6
S
S
665
1
14.3
S
558
1
14.3
78
1
14.3
S
444
1
14.3
S
221
2
28.6
S
S
552
2
28.6
S
S
311
2
28.6
S
S
S
-
- indicates an extreme low value, while + indicates an extreme high value. The range used is (Q1 - 1.5*IQR, Q3 + 1.5*IQR).
a. Cases and variables are sorted on missing patterns.
EM Estimated Statistics
Here we have estimated means, covariances, and correlation coefficients. Little’s MCAR test the null
that the missing data are Missing Completely At Random. Since it is significant, we conclude that the data are
NOT missing completely at random. The majority opinion is that EM estimates are not trustworthy when the
data at not missing completely at random.
EM Meansa
Gender
1.26
Ideal
70.32
Statoph
6.29
Nucoph
58.02
SATM
Year
504.60 1997.51
a. Little's MCAR test: Chi-Square = 61.477, DF = 32, Sig. =
.001
EM Covariancesa
Gender
Gender
Ideal
Statoph
Nucoph
SATM
Year
.191
-.949
-.146
-.815
2.480
.032
Ideal
14.833
.698
8.702
-18.295
-.483
Statoph
5.326
1.749
-73.667
-3.288
Nucoph
SATM
503.155
63.812 9171.847
1.114 259.774
Year
74.301
a. Little's MCAR test: Chi-Square = 61.477, DF = 32, Sig. = .001
EM Correlationsa
Gender
Gender
1
Ideal
Statoph
Nucoph
SATM
Year
Ideal
Statoph
Nucoph
SATM
Year
-.563
-.145
-.083
.059
.008
1
.079
.101
-.050
-.015
1
.034
-.333
-.165
1
.030
.006
1
.315
1
a. Little's MCAR test: Chi-Square = 61.477, DF = 32, Sig. = .001
ECU Users: Curiously, the SPSS (20) installation provided for ECU faculty to use on campus
does not contain the missing values and multiple imputation modules, but that provided for use off
campus does. Go figure.

Karl L. Wuensch, December, 2012

Return to Wuensch’s SPSS Lessons Page