Converting an SPSS Data File to Mplus by Paul F. Tremblay September 2013 Types of Data Files There are two types of ASCII data files that can be considered. They are referred to as delimited (free) and fixed width formats respectively. A Delimited file is one that separates the data by some form of delimiter such as a tab, space, or comma. An example of a Delimited file involving four subjects (one per line) and five variables is presented below. In this example, the data are separated by one space. In SPSS although the data is typically saved as an SPSS data file with an .SAV extension, it is also possible to save the data as a delimited ASCII file that can be read in other programs. 1 2 3 4 5 12 5 167 1 14 1 23 4 12 6 40 21 23 6 19 A Fixed width file contains the values in specific defined fields. Assume, for example, that an individual had typed the above text file in a series of three column fields so that the first variable was in columns 1 to 3, the next in columns 4 to 6, etc. The same values as shown above would appear as: 1 2 3 4 5 12 5167 1 14 1 23 4 12 6 40 21 23 6 19 Note that the entries are right justified in that each value ends on the column to the right of the specification indicated, and it isn’t even necessary to have a space between the variables. The values line up on the right hand side. (As a minor point, it should be noted that although the example had all the variables in 3 column fields, it is not necessary to have the field widths identical. What is important is that the values be right justified in their fields. So as an example, for the four subjects above, the values of the second variable would be 2, 5, 23, 21. The values for the third variable would be 3, 167, 4, 23.) Converting Data File from SPSS to Mplus using a Delimited Format Make sure that all your values are numeric. For example if you had a variable Gender coded as ‘M’ or ‘F’, you should recode with number such as 1 and 0. Note also that in SPSS, variable names can be longer than eight characters. In Mplus the limit is eight characters, and Mplus will simply append variables names that are too long. We start with a data file in SPSS that has some missing values that have not been coded in any special way. For example a participant might have missed a question, and therefore a blank space rather than a value is assigned to that data point. These are referred to as “system missing” unlike other missing value codes that you could have assigned. In this latter case, a researcher may have typed in the data manually and decided to put in a special code of -9 when a participant was missing a data point. Then the researcher would need to define the missing variable code as -9 in SPSS. This can be done the following way in SPSS: However in our example below, we will recode the system missing values into a code “-9” to ensure that are missing values are interpreted correctly in the translation to Mplus. The syntax in SPSS is: RECODE v1 v2 v3 (SYSMIS=-9) (else=copy). EXECUTE. Now we can see that the blank spaces are coded with a -9 in the screen below. I now save the SPSS file as a tab delimited file. The file name has a .dat extension (in my example the full name is test.dat). You should probably remove the checkmark (see below) so that your variable names do not appear in the data file. Otherwise you will need to remove these labels from your data file (see below). If you open this file in Notepad it looks like the following screen. Because I left the checkmark in the previous screen, my variable names appear with the data and need to be removed. (You could cut these variable names and paste them in the mplus syntax file in the statement listing your variables. Note that you will need to shorten variable names longer than 8 characters or Mplus will append them). The syntax file for Mplus is presented below. TITLE: First example of setting up data file DATA: file is test.dat; !listwise = on; VARIABLE: names are v1 v2 v3; missing all (-9); usevariables = v1; ANALYSIS: type = basic; Note that we have provided the statement to define missing values with the statement missing all (-9);. If you forget this statement, Mplus would read your data incorrectly. You can use more than one missing value code if needed. See Mplus manual for examples. The analysis: type = basic; statement will give you descriptive statistics without testing any model. Note also the statement uservariables = v1;. Here I will get information only for v1. I will bring in the other two variables in a subsequent example. The Mplus output follows: *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 1 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS First example of using different data file formats SUMMARY OF ANALYSIS Number of groups Number of observations 1 9 Number of dependent variables Number of independent variables Number of continuous latent variables 1 0 0 Observed dependent variables Continuous V1 Estimator Information matrix Maximum number of iterations Convergence criterion Maximum number of steepest descent iterations Maximum number of iterations for H1 Convergence criterion for H1 Input data file(s) test1.txt ML OBSERVED 1000 0.500D-04 20 2000 0.100D-03 Input data format FREE SUMMARY OF DATA Number of missing data patterns 1 SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) 1 x V1 MISSING DATA PATTERN FREQUENCIES Pattern 1 Frequency 9 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage V1 ________ 1.000 V1 RESULTS FOR BASIC ANALYSIS ESTIMATED SAMPLE STATISTICS 1 Means V1 ________ 17.667 V1 Covariances V1 ________ 208.889 V1 Correlations V1 ________ 1.000 Note in the above output that Mplus estimates missing values using maximum likelihood. In the above example however, it does not estimate missing values because there are no other variables in the analysis, from which to “borrow information”. So the statistics for v1 will be estimated on nine cases and will correspond to descriptive statistics in SPSS. If we bring in the other two variables, v2, v3, however, missing values will be estimated with the information from all variables. You can see in the output below that Mplus reports 10 cases for the analyses (i.e., no cases are dropped out). Mplus also shows the patterns of missing data. The statistics for the three variables will now be different than statistics calculated in listwise or pairwise data. If you want a listwise analysis, you can specify a listwise statement as indicated above (by removing the exclamation mark in the syntax statement, !listwise = on;). SUMMARY OF DATA Number of missing data patterns 4 SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS (x = not missing) 1 x x x V1 V2 V3 2 x x 3 x 4 x x MISSING DATA PATTERN FREQUENCIES Pattern 1 2 Frequency 6 2 Pattern 3 4 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value Frequency 1 1 0.100 PROPORTION OF DATA PRESENT Covariance Coverage V1 V2 ________ ________ 0.900 0.800 0.800 0.700 0.600 V1 V2 V3 V3 ________ 0.800 RESULTS FOR BASIC ANALYSIS ESTIMATED SAMPLE STATISTICS 1 Means V1 ________ 15.158 V1 V2 V3 Covariances V1 ________ 258.612 -1335.259 11.306 V1 V2 V3 Correlations V1 ________ 1.000 -0.475 0.678 V2 ________ 129.911 V3 ________ 4.149 V2 ________ V3 ________ 30544.906 -91.992 1.076 V2 ________ V3 ________ 1.000 -0.507 1.000 Pattern Frequency
© Copyright 2026 Paperzz