November 2010 Dave Atkins, Ph.D. Overview for Analyzing Multilevel Data Using the HLM Software The following provides an outline of some of the important steps in analyzing multilevel data. I am only highlighting steps here that are particular to HLM. Thus, prior to running any model, thorough descriptive statistics should be run and variable types should be considered (i.e., binary, ordinal, nominal, count, continuous). To a certain extent, good data analysis is good data analysis. Thus, this is not meant to be an exhaustive list of all the steps in analyzing multilevel data but does provide some specific direction about critical steps and how to use SPSS and HLM to do them. Transforming Data From Wide to Long One of the first notable differences with multilevel modeling is how the data are structured or arranged prior to analysis. Oftentimes, our data are entered with a single row of data per individual. If we have repeated measures on spouses, we might have a separate column for each dependent variable for each spouse at each time point. Using the “Couple Tx 2005 wide” data: In this example, each couple receives one line of data with “id” indicating an ID variable for the couple. The Dyadic Adjustment Scale (DAS) was measured at 4 time points, and each spouse has 4 columns of data: hdas1 – hdas4 and wdas1 – wdas4. Transforming wide data to long data • Prior to fitting multilevel models, we need to change the structure of our data from one row per couple to one row per individual time-point. • With the example data, that means each individual will have 4 rows in the transformed data, and each couple will have 8 rows of data. • We will use the SPSS command VARSTOCASES • With longitudinal data on couples (and other “three level” data), we will run the command twice: 1) stack repeated-measures into gender-specific columns, and 2) stack gender-specific columns into single columns. • Example using “Couple Tx 2005 wide”: November 2010 Dave Atkins, Ph.D. *** First stack time points into gender-specific columns. VARSTOCASES /MAKE mdas FROM hdas1 TO hdas4 /MAKE wdas FROM wdas1 TO wdas4 /INDEX = assess /KEEP = ALL . • • Comments: o Each “MAKE” command will create a new variable of “stacked” data based on the variables listed after FROM. o If variables to be stacked are adjacent to one another, the keyword “TO” can be used to specify the range; otherwise, each individual variable name should be included. o The INDEX variable will count number of variables being stacked – four in the present example. o KEEP specifies variables that are not repeated but should be kept in the dataset; each value of these variables will be identical across rows of repeated measures. Next, we will use the dataset just created to stack the gender-specific columns: *** Using data from above, next stack gender-specific columns. VARSTOCASES /MAKE das FROM mdas wdas /INDEX = gender /KEEP = ALL . • • Comments: o In the present example, only men and women’s DAS scores get stacked, but if we had gender-specific variables such as age or work status or neuroticism, these would have been stacked here as well. o Our INDEX variable is gender. o We will KEEP all other variables. Finally, we will make a few changes to our two INDEX variables: *** Change assess and gender variables to 0-3 and 0-1 coding . COMPUTE assess = assess - 1 . COMPUTE gender = gender - 1 . EXECUTE . *** Assign value labels . VALUE LABELS gender 0 'Male' 1 'Female' . *** Sort data by grouping variables . SORT CASES BY id (A) gender (A) assess (A) . • HLM software requires a unique ID variable at each level of the data. id is a unique couple-level ID, but we need a unique individual-level ID. We just need a variable that has unique values for each individual, which we can do via the following: November 2010 Dave Atkins, Ph.D. *** Create unique individual ID variable . COMPUTE ind.id = id*10 + gender . EXECUTE . After re-arranging the columns a bit, the resulting data look like: With the data re-structured this way, we can see the nested structure: there are four assessment points on each individual and each individual belongs to one couple. The structure of the data is now correct for the HLM software. Final Considerations Before Reading Data into HLM software The HLM software has no facilities for data manipulation. Within the HLM software, predictors may be centered and cross-level interactions may be specified, but everything else must be done prior to HLM. Thus, consider whether: 1. All categorical predictors are appropriately coded (i.e., dummy or effects coding) 2. Are there any variable transformations needed (e.g., log-transformations, polynomial terms)? 3. Are there any interactions between variables at the same level that will be needed? All of these items need to be addressed prior to reading data into HLM. Reading Data into HLM Software I will provide an overview of reading data into HLM and fitting basic models, but another resource is a tutorial developed at the University of Texas: http://ssc.utexas.edu/software/software-tutorials#HLM Note that it covers HLM v5, but as far as I can tell, it would be largely applicable to version 6 as well. Historically, one of the most frustrating parts of using the HLM software has been getting the data into HLM. Version 6 seems to be a bit better, but at times, it still can be a bit mysterious. November 2010 Dave Atkins, Ph.D. • • The HLM manual typically describes organizing data into multiple files – one for each level of the data; however, you do not need to do this (and I know of no other software with this requirement). You can use the same file for each level, such as the one that we created with the SPSS syntax above. To begin the process, go to /File/Make new MDM file/Stat package input o You will have to select what program you will use: HLM2: Standard two-level model HLM3: Standard three-level model HMLM: Two-level model for intensive longitudinal data (typically) HMLM2: Three-level model for intensive longitudinal data HCM2: Cross-classified models o Select HLM3 for a three-level model (or, other appropriate program depending on your data), and you will see: • You will need to provide: o NOTE: HLM should read SPSS v17.0 and lower. If you have SPSS v18.0 (ie, PASW v18.0), try reading the data with input file type as “SPSS”, but if that does not work, try setting it to “Anything else” o A name for the MDM file (that will hold the data in a special format for the HLM program) o A name for MDM template file that holds the instructions that you enter through this interface o Files for level-1, level-2, and level-3 data NOTE: The one above is for two-level data, but is virtually identical for three-level data NOTE: These can be the same file When you select each file, you will also need to select the variables to be included: November 2010 Dave Atkins, Ph.D. o Are there any missing data? If so, when should they be deleted (I usually select “making mdm”) o After all selections are made, click: Make MDM o If the data were successfully read in, click on “Check Stats” Check that the correct number of data and/or groups are reported at each level Check the range of variables o If data were not read in successfully: Check the HLM manual about ID variables Is the data sorted? (if it is sorted and you still have problems, try sorting by the level 2 ID) Plotting Multilevel Data Prior to fitting any models, it is always a good idea to do through descriptive analyses of your data, including numerical summaries and definitely graphs. This work could be done in SPSS, but as of version 6, HLM has basic plotting functions, which are reasonably good. HLM offers two broad categories of plotting functions: 1) plots of the raw data, and 2) plots based on a fitted model. Before we fit an HLM to longitudinal data, we would like to know: • • • What does the change across time look like? Linear? Nonlinear? How similar are partners change patterns? What do growth curves look like when grouped by level-2 or level-3 predictors? We can answer these questions via: /File/Graph Data/line plots, scatter plots November 2010 Dave Atkins, Ph.D. • You will need to input the following: o X-axis: With longitudinal data, this is our time variable. o Y-axis: Our dependent variable o Number of groups: Depending on how many groups we have, we might want to only plot a subset of the data. Selecting a random probability is a pretty idea, or if you are patient, plot all the data. o Z-focus: Do we want plots by a level-2 or level-3 predictor? o Grouping: Data grouped by individuals (level-2) or couples (level-3). Typically, we will group by couples. o Type of plot: For plotting each couple’s data, choose “Line/marker plot”. For plotting multiple couples data by a level-2 or level-3 predictor, choose either “Line plot” or “Scatter plot”. o Pagination: One plot per group or multiple groups in same plot? Fitting Models Using HLM The first step in setting up a model is to identify the dependent variable (DV) among the Level-1 variables. HLM automatically sets up basic equations for Levels 2 and 3. By clicking on the “Mixed” button in the lower right corner, HLM will present the mixed model equation, in which the Level 2 and 3 equations are substituted into the Level 1 equation. This mixed model shows that currently our model only includes a single, fixed-effect intercept (i.e., γ000) and three random-effects: random intercepts at Levels 2 and 3, and the Level 1 residual error. November 2010 Dave Atkins, Ph.D. Singer and Willett in their book, Applied Longitudinal Data Analysis, call this model the “unconditional means model” (see pp. 92-97). This model can be useful if we wanted to know how much variability in the DV existed at each level of our data. Clicking on any covariate brings up a dialogue box inquiring whether we want to add the variable: 1) uncentered, 2) grandmean centered, or 3) group-mean centered. If the value zero on the scale of your covariate is a meaningful value, the covariate should typically be entered uncentered. For “assess” zero is the start of therapy, which is meaningful, and we add it uncentered. At this point, additional equations are included in both Level 2 and 3 models, though notice that the mixed model shows that only a single fixed-effect has been added. Note that additional randomeffects are “greyed out” in our equations – they have not yet been included, but could be by clicking on them. Selecting the “Basic Settings” menu brings up a dialogue box, where we can provide a name for the output file that HLM will produce, as well as boxes to request residual files for each level of the data, which I have selected. Next, click “Run Analysis” at top and then “Run model as shown” when asked. After the program stops iterating, open the output by going to “File/View output”. Below is the output, with selected comments. Program: Authors: Publisher: HLM 6 Hierarchical Linear and Nonlinear Modeling Stephen Raudenbush, Tony Bryk, & Richard Congdon Scientific Software International, Inc. (c) 2000 [email protected] www.ssicentral.com ------------------------------------------------------------------------------Module: HLM3.EXE (6.08.29257.1) Date: 3 March 2010, Wednesday Time: 16:16:11 ------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM3 RUN Problem Title: no title The data source for this run workshop\atkinsjfp.mdm = Z:\Documents\Lectures and Workshops\Bodenmann 2010 November 2010 Dave Atkins, Ph.D. The command file for this run Output file name workshop\hlm3.txt The maximum number of level-1 The maximum number of level-2 The maximum number of level-3 = C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\whlmtemp.hlm = Z:\Documents\Lectures and Workshops\Bodenmann 2010 units = 1072 units = 268 units = 134 NOTE: It is always a good idea to make sure that the number of observations and groups match up with what they should be. The maximum number of iterations = 100 Method of estimation: full maximum likelihood The outcome variable is DAS The model specified for the fixed effects was: ---------------------------------------------------Level-1 Coefficients --------------------INTRCPT1, P0 # ASSESS slope, P1 Level-2 Predictors --------------INTRCPT2, B00 # INTRCPT2, B10 Level-3 Predictors ---------------INTRCPT3, G000 INTRCPT3, G100 '#' - The residual parameter variance for the parameter has been set to zero Summary of the model specified (in equation format) --------------------------------------------------Level-1 Model Y = P0 + P1*(ASSESS) + E Level-2 Model P0 = B00 + R0 P1 = B10 Level-3 Model B00 = G000 + U00 B10 = G100 NOTE: These equations come directly from what was specified in the HLM GUI (i.e., graphical user interface) and will directly relate to the output shown below. Sadly, some of the terms for random-effects below are different than what is shown above. For starting values, data from 1072 level-1 and 268 level-2 records were used Iterations stopped due to small change in likelihood function ******* ITERATION 7 ******* Sigma_squared = 93.62851 NOTE: This is the estimate of the level-1 error variance (“E” in the equation above) Standard Error of Sigma_squared = Tau(pi) INTRCPT1,P0 4.66977 71.91814 NOTE: This is the estimate of the level-2 random intercept variance (“R0” above, or the variance of the P0 coefficient). November 2010 Dave Atkins, Ph.D. Tau(pi) (as correlations) INTRCPT1,P0 1.000 Standard Errors of Tau(pi) INTRCPT1,P0 11.70420 ---------------------------------------------------Random level-1 coefficient Reliability estimate ---------------------------------------------------INTRCPT1, P0 0.754 ---------------------------------------------------NOTE: Reliability relates to how much variability there is between groups compared to total variability (i.e., between group variance + error variance). If the error variance is large, there will be poor reliability, which affects our power. Tau(beta) INTRCPT1 INTRCPT2,B00 97.82677 NOTE: This is the estimate of the level-3 random intercept variance (“U00” in the equation above). Tau(beta) (as correlations) INTRCPT1/INTRCPT2,B00 1.000 Standard Errors of Tau(beta) INTRCPT1 INTRCPT2,B00 18.70386 ---------------------------------------------------Random level-2 coefficient Reliability estimate ---------------------------------------------------INTRCPT1/INTRCPT2, B00 0.672 ---------------------------------------------------The value of the likelihood function at iteration 7 = -4.217125E+003 The outcome variable is DAS Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, P0 For INTRCPT2, B00 INTRCPT3, G000 83.958879 1.114878 75.308 133 0.000 For ASSESS slope, P1 For INTRCPT2, B10 INTRCPT3, G100 3.359910 0.264333 12.711 1070 0.000 ---------------------------------------------------------------------------NOTE: These are estimates of the fixed-effects – the overall intercept and slope for our sample. HLM reports these directly mapping on to the hierarchical equations – so P0 is connected to B00 which is in turn connect to G000. Although this is a bit “wordy” it does follow the equations closely. The outcome variable is DAS Final estimation of fixed effects November 2010 Dave Atkins, Ph.D. (with robust standard errors) ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, P0 For INTRCPT2, B00 INTRCPT3, G000 83.958879 0.885803 94.783 133 0.000 For ASSESS slope, P1 For INTRCPT2, B10 INTRCPT3, G100 3.359910 0.405681 8.282 1070 0.000 ---------------------------------------------------------------------------NOTE: The coefficients are identical between these two tables, but the standard errors here are “robust” in the sense that they are valid, even when the data have unequal variances (in the statistics literature, sometimes called “heteroscedastically consistent”). Raudenbush and Bryk note that when the standard errors are notably different between the two tables, this could indicate model mis-specification. What might not be right about our current model? Final estimation of level-1 and level-2 variance components: -----------------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component -----------------------------------------------------------------------------INTRCPT1, R0 8.48046 71.91814 134 545.71355 0.000 level-1, E 9.67618 93.62851 -----------------------------------------------------------------------------Final estimation of level-3 variance components: -----------------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component -----------------------------------------------------------------------------INTRCPT1/INTRCPT2, U00 9.89074 97.82677 133 409.03277 0.000 -----------------------------------------------------------------------------NOTE: These are simply the variance components (again), but including a significance test. Singer and Willett note that these tests can be badly biased with small sample sizes. A preferred method of testing variance components is to use the deviance, reported below. Statistics for current covariance components model -------------------------------------------------Deviance = 8434.249432 Number of estimated parameters = 5 November 2010 Dave Atkins, Ph.D. Testing Assumptions All statistical models have assumptions, and testing assumptions almost always involves examining the residuals. With HLM, we have several types of residuals depending on how many levels are in our data and how many random-effects are in our model. HLM will create residual files to be opened in another statistical package (e.g., SPSS) – one per level of your data. To request residual files, click on Basic Settings and select the appropriate buttons: • • • • NOTE: If you want each residual file (i.e., level-1, level-2, and level-3), you do need to click on each button. After running the model, you will find three files called: resfil1.sav, resfil2.sav, and resfil3.sav (or, better yet, give them names that are more meaningful). The level-1 file contains: o L1resid: The residual of each observed data point from the fitted value of the HLM analysis o Fitval: The fitted value o Sigma: The square-root of the level-1 variance The level-2 and level-3 files contain: o Njk: Number of data points per individual o Empirical Bayes residuals of intercepts and other random-effects parameters Realize that these are the residuals and not the subject-specific coefficients. o Subject-specific coefficients (each name begins with “ec”), which combine the fixed-effects with the random-effects o Posterior variances of the random-effects o NOTE: If you only run a two level model, there are a few additional variables that can be used to assess the model, but they are not critical. November 2010 Dave Atkins, Ph.D. Using the output found in the residual files, we can examine and assess the assumptions of the HLM model. • • • Assessing normality of all error terms (i.e., level-1 residuals and empirical Bayes residuals at levels 2 and 3) o Create histograms or (better choice) QQ plots Equal variances along regression plane o Scatter plot of level-1 residuals on fitted values Correct specification of time variable o Scatter plot (or boxplots) of level-1 residuals on time variable Additional Options/Tools within HLM Deviance Tests for Variance Components Although HLM provides tests for variance components near the end of the output, these can be biased. A better way to test for the necessity of variance components is to use a deviance test. The deviance of the model is an overall summary of the model fit that we can use to compare models, but it does not have an absolute interpretation. That is, telling someone that the deviance for your model is 1,209.98 means nothing. To use deviance tests to examine additional random-effects, we will need the values of the deviance statistics in the smaller model (i.e., the model without the additional random-effects). For example, from our previous results: Statistics for current covariance components model -------------------------------------------------Deviance = 8434.249432 Number of estimated parameters = 5 Next, we add a random slope for “assess” at level-3. Before running the model, we go to /Other Settings/Hypothesis Testing. Here we find a dialogue box for entering a deviance statistic and number of parameters. Enter the values for the smaller model, like we see at right. Now run the model. At the very end of the output, we will find our deviance test, comparing the smaller model (without random slope for assess) to the larger model: November 2010 Dave Atkins, Ph.D. Statistics for current covariance components model -------------------------------------------------Deviance = 8293.124830 Number of estimated parameters = 7 Model comparison test ----------------------------------Chi-square statistic = 141.11517 Number of degrees of freedom = 2 P-value = 0.000 Note that there is a new deviance value and new number of estimated parameters (why 7?). We also have a model comparison, where the chi-square statistic is simply the difference in deviances between the two models, and it is tested against a chi-square distribution with degrees of freedom equal to the difference in parameters. The p-value is highly significant, indicating that the fit is significantly improved by adding the random slope of assess.
© Copyright 2026 Paperzz