Procedure

Converting an SPSS Data File to Mplus
by Paul F. Tremblay
September 2013
Types of Data Files
There are two types of ASCII data files that can be considered. They are referred to as delimited (free)
and fixed width formats respectively. A Delimited file is one that separates the data by some form of
delimiter such as a tab, space, or comma. An example of a Delimited file involving four subjects (one per
line) and five variables is presented below. In this example, the data are separated by one space. In SPSS
although the data is typically saved as an SPSS data file with an .SAV extension, it is also possible to save
the data as a delimited ASCII file that can be read in other programs.
1 2 3 4 5
12 5 167 1 14
1 23 4 12 6
40 21 23 6 19
A Fixed width file contains the values in specific defined fields. Assume, for example, that an individual
had typed the above text file in a series of three column fields so that the first variable was in columns 1
to 3, the next in columns 4 to 6, etc. The same values as shown above would appear as:
1 2 3 4 5
12 5167 1 14
1 23 4 12 6
40 21 23 6 19
Note that the entries are right justified in that each value ends on the column to the right of the
specification indicated, and it isn’t even necessary to have a space between the variables. The values
line up on the right hand side. (As a minor point, it should be noted that although the example had all
the variables in 3 column fields, it is not necessary to have the field widths identical. What is important
is that the values be right justified in their fields. So as an example, for the four subjects above, the
values of the second variable would be 2, 5, 23, 21. The values for the third variable would be 3, 167, 4,
23.)
Converting Data File from SPSS to Mplus using a Delimited Format
Make sure that all your values are numeric. For example if you had a variable Gender coded as ‘M’ or ‘F’,
you should recode with number such as 1 and 0. Note also that in SPSS, variable names can be longer
than eight characters. In Mplus the limit is eight characters, and Mplus will simply append variables
names that are too long.
We start with a data file in SPSS that has some missing values that have not been coded in any special
way. For example a participant might have missed a question, and therefore a blank space rather than a
value is assigned to that data point. These are referred to as “system missing” unlike other missing value
codes that you could have assigned. In this latter case, a researcher may have typed in the data
manually and decided to put in a special code of -9 when a participant was missing a data point. Then
the researcher would need to define the missing variable code as -9 in SPSS. This can be done the
following way in SPSS:
However in our example below, we will recode the system missing values into a code “-9” to ensure that
are missing values are interpreted correctly in the translation to Mplus.
The syntax in SPSS is:
RECODE
v1 v2 v3
(SYSMIS=-9) (else=copy).
EXECUTE.
Now we can see that the blank spaces are coded with a -9 in the screen below.
I now save the SPSS file as a tab delimited file. The file name has a .dat extension (in my example the full
name is test.dat). You should probably remove the checkmark (see below) so that your variable names
do not appear in the data file. Otherwise you will need to remove these labels from your data file (see
below).
If you open this file in Notepad it looks like the following screen. Because I left the checkmark in the
previous screen, my variable names appear with the data and need to be removed. (You could cut these
variable names and paste them in the mplus syntax file in the statement listing your variables. Note that
you will need to shorten variable names longer than 8 characters or Mplus will append them).
The syntax file for Mplus is presented below.
TITLE: First example of setting up data file
DATA:
file is test.dat;
!listwise = on;
VARIABLE:
names are v1 v2 v3;
missing all (-9);
usevariables = v1;
ANALYSIS:
type = basic;
Note that we have provided the statement to define missing values with the statement missing all
(-9);. If you forget this statement, Mplus would read your data incorrectly. You can use more than
one missing value code if needed. See Mplus manual for examples. The analysis: type =
basic; statement will give you descriptive statistics without testing any model. Note also the
statement uservariables = v1;. Here I will get information only for v1. I will bring in the other
two variables in a subsequent example. The Mplus output follows:
*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 1
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
First example of using different data file formats
SUMMARY OF ANALYSIS
Number of groups
Number of observations
1
9
Number of dependent variables
Number of independent variables
Number of continuous latent variables
1
0
0
Observed dependent variables
Continuous
V1
Estimator
Information matrix
Maximum number of iterations
Convergence criterion
Maximum number of steepest descent iterations
Maximum number of iterations for H1
Convergence criterion for H1
Input data file(s)
test1.txt
ML
OBSERVED
1000
0.500D-04
20
2000
0.100D-03
Input data format
FREE
SUMMARY OF DATA
Number of missing data patterns
1
SUMMARY OF MISSING DATA PATTERNS
MISSING DATA PATTERNS (x = not missing)
1
x
V1
MISSING DATA PATTERN FREQUENCIES
Pattern
1
Frequency
9
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value
0.100
PROPORTION OF DATA PRESENT
Covariance Coverage
V1
________
1.000
V1
RESULTS FOR BASIC ANALYSIS
ESTIMATED SAMPLE STATISTICS
1
Means
V1
________
17.667
V1
Covariances
V1
________
208.889
V1
Correlations
V1
________
1.000
Note in the above output that Mplus estimates missing values using maximum likelihood. In the above
example however, it does not estimate missing values because there are no other variables in the
analysis, from which to “borrow information”. So the statistics for v1 will be estimated on nine cases
and will correspond to descriptive statistics in SPSS.
If we bring in the other two variables, v2, v3, however, missing values will be estimated with the
information from all variables. You can see in the output below that Mplus reports 10 cases for the
analyses (i.e., no cases are dropped out). Mplus also shows the patterns of missing data. The statistics
for the three variables will now be different than statistics calculated in listwise or pairwise data. If you
want a listwise analysis, you can specify a listwise statement as indicated above (by removing the
exclamation mark in the syntax statement, !listwise = on;).
SUMMARY OF DATA
Number of missing data patterns
4
SUMMARY OF MISSING DATA PATTERNS
MISSING DATA PATTERNS (x = not missing)
1
x
x
x
V1
V2
V3
2
x
x
3
x
4
x
x
MISSING DATA PATTERN FREQUENCIES
Pattern
1
2
Frequency
6
2
Pattern
3
4
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value
Frequency
1
1
0.100
PROPORTION OF DATA PRESENT
Covariance Coverage
V1
V2
________
________
0.900
0.800
0.800
0.700
0.600
V1
V2
V3
V3
________
0.800
RESULTS FOR BASIC ANALYSIS
ESTIMATED SAMPLE STATISTICS
1
Means
V1
________
15.158
V1
V2
V3
Covariances
V1
________
258.612
-1335.259
11.306
V1
V2
V3
Correlations
V1
________
1.000
-0.475
0.678
V2
________
129.911
V3
________
4.149
V2
________
V3
________
30544.906
-91.992
1.076
V2
________
V3
________
1.000
-0.507
1.000
Pattern
Frequency