Overview for Analyzing Multilevel Data Using the HLM Software The

November 2010
Dave Atkins, Ph.D.
Overview for Analyzing Multilevel Data Using the HLM Software
The following provides an outline of some of the important steps in analyzing multilevel data. I
am only highlighting steps here that are particular to HLM. Thus, prior to running any model,
thorough descriptive statistics should be run and variable types should be considered (i.e., binary,
ordinal, nominal, count, continuous). To a certain extent, good data analysis is good data
analysis. Thus, this is not meant to be an exhaustive list of all the steps in analyzing multilevel
data but does provide some specific direction about critical steps and how to use SPSS and HLM
to do them.
Transforming Data From Wide to Long
One of the first notable differences with multilevel modeling is how the data are structured or
arranged prior to analysis. Oftentimes, our data are entered with a single row of data per
individual. If we have repeated measures on spouses, we might have a separate column for each
dependent variable for each spouse at each time point. Using the “Couple Tx 2005 wide” data:
In this example, each couple receives one line of data with “id” indicating an ID variable for the
couple. The Dyadic Adjustment Scale (DAS) was measured at 4 time points, and each spouse
has 4 columns of data: hdas1 – hdas4 and wdas1 – wdas4.
Transforming wide data to long data
• Prior to fitting multilevel models, we need to change the structure of our data from one
row per couple to one row per individual time-point.
• With the example data, that means each individual will have 4 rows in the transformed
data, and each couple will have 8 rows of data.
• We will use the SPSS command VARSTOCASES
• With longitudinal data on couples (and other “three level” data), we will run the
command twice: 1) stack repeated-measures into gender-specific columns, and 2) stack
gender-specific columns into single columns.
• Example using “Couple Tx 2005 wide”:
November 2010
Dave Atkins, Ph.D.
*** First stack time points into gender-specific columns.
VARSTOCASES
/MAKE mdas FROM hdas1 TO hdas4
/MAKE wdas FROM wdas1 TO wdas4
/INDEX = assess
/KEEP = ALL .
•
•
Comments:
o Each “MAKE” command will create a new variable of “stacked” data based on
the variables listed after FROM.
o If variables to be stacked are adjacent to one another, the keyword “TO” can be
used to specify the range; otherwise, each individual variable name should be
included.
o The INDEX variable will count number of variables being stacked – four in the
present example.
o KEEP specifies variables that are not repeated but should be kept in the dataset;
each value of these variables will be identical across rows of repeated measures.
Next, we will use the dataset just created to stack the gender-specific columns:
*** Using data from above, next stack gender-specific columns.
VARSTOCASES
/MAKE das FROM mdas wdas
/INDEX = gender
/KEEP = ALL .
•
•
Comments:
o In the present example, only men and women’s DAS scores get stacked, but if we
had gender-specific variables such as age or work status or neuroticism, these
would have been stacked here as well.
o Our INDEX variable is gender.
o We will KEEP all other variables.
Finally, we will make a few changes to our two INDEX variables:
*** Change assess and gender variables to 0-3 and 0-1 coding .
COMPUTE assess = assess - 1 .
COMPUTE gender = gender - 1 .
EXECUTE .
*** Assign value labels .
VALUE LABELS gender 0 'Male' 1 'Female' .
*** Sort data by grouping variables .
SORT CASES BY
id (A) gender (A) assess (A) .
•
HLM software requires a unique ID variable at each level of the data. id is a unique
couple-level ID, but we need a unique individual-level ID. We just need a variable that
has unique values for each individual, which we can do via the following:
November 2010
Dave Atkins, Ph.D.
*** Create unique individual ID variable .
COMPUTE ind.id = id*10 + gender .
EXECUTE .
After re-arranging the columns a bit, the resulting data look like:
With the data re-structured this way, we can see the nested structure: there are four assessment
points on each individual and each individual belongs to one couple. The structure of the data is
now correct for the HLM software.
Final Considerations Before Reading Data into HLM software
The HLM software has no facilities for data manipulation. Within the HLM software, predictors
may be centered and cross-level interactions may be specified, but everything else must be done
prior to HLM. Thus, consider whether:
1. All categorical predictors are appropriately coded (i.e., dummy or effects coding)
2. Are there any variable transformations needed (e.g., log-transformations, polynomial
terms)?
3. Are there any interactions between variables at the same level that will be needed?
All of these items need to be addressed prior to reading data into HLM.
Reading Data into HLM Software
I will provide an overview of reading data into HLM and fitting basic models, but another
resource is a tutorial developed at the University of Texas:
http://ssc.utexas.edu/software/software-tutorials#HLM
Note that it covers HLM v5, but as far as I can tell, it would be largely applicable to version 6 as
well.
Historically, one of the most frustrating parts of using the HLM software has been getting the
data into HLM. Version 6 seems to be a bit better, but at times, it still can be a bit mysterious.
November 2010
Dave Atkins, Ph.D.
•
•
The HLM manual typically describes organizing data into multiple files – one for each
level of the data; however, you do not need to do this (and I know of no other software
with this requirement). You can use the same file for each level, such as the one that we
created with the SPSS syntax above.
To begin the process, go to /File/Make new MDM file/Stat package input
o You will have to select what program you will use:
 HLM2: Standard two-level model
 HLM3: Standard three-level model
 HMLM: Two-level model for intensive longitudinal data (typically)
 HMLM2: Three-level model for intensive longitudinal data
 HCM2: Cross-classified models
o Select HLM3 for a three-level model (or, other appropriate program depending on
your data), and you will see:
•
You will need to provide:
o NOTE: HLM should read SPSS v17.0 and lower. If you have SPSS v18.0 (ie,
PASW v18.0), try reading the data with input file type as “SPSS”, but if that
does not work, try setting it to “Anything else”
o A name for the MDM file (that will hold the data in a special format for the
HLM program)
o A name for MDM template file that holds the instructions that you enter
through this interface
o Files for level-1, level-2, and level-3 data
 NOTE: The one above is for two-level data, but is virtually identical
for three-level data
 NOTE: These can be the same file
 When you select each file, you will also need to select the variables to
be included:
November 2010
Dave Atkins, Ph.D.
o Are there any missing data? If so, when should they be deleted (I usually
select “making mdm”)
o After all selections are made, click: Make MDM
o If the data were successfully read in, click on “Check Stats”
 Check that the correct number of data and/or groups are reported at
each level
 Check the range of variables
o If data were not read in successfully:
 Check the HLM manual about ID variables
 Is the data sorted? (if it is sorted and you still have problems, try
sorting by the level 2 ID)
Plotting Multilevel Data
Prior to fitting any models, it is always a good idea to do through descriptive analyses of your
data, including numerical summaries and definitely graphs. This work could be done in SPSS,
but as of version 6, HLM has basic plotting functions, which are reasonably good.
HLM offers two broad categories of plotting functions: 1) plots of the raw data, and 2) plots
based on a fitted model. Before we fit an HLM to longitudinal data, we would like to know:
•
•
•
What does the change across time look like? Linear? Nonlinear?
How similar are partners change patterns?
What do growth curves look like when grouped by level-2 or level-3 predictors?
We can answer these questions via: /File/Graph Data/line plots, scatter plots
November 2010
Dave Atkins, Ph.D.
•
You will need to input the following:
o X-axis: With longitudinal data, this is our time variable.
o Y-axis: Our dependent variable
o Number of groups: Depending on how many groups we have, we might want to
only plot a subset of the data. Selecting a random probability is a pretty idea, or if
you are patient, plot all the data.
o Z-focus: Do we want plots by a level-2 or level-3 predictor?
o Grouping: Data grouped by individuals (level-2) or couples (level-3). Typically,
we will group by couples.
o Type of plot: For plotting each couple’s data, choose “Line/marker plot”. For
plotting multiple couples data by a level-2 or level-3 predictor, choose either
“Line plot” or “Scatter plot”.
o Pagination: One plot per group or multiple groups in same plot?
Fitting Models Using HLM
The first step in setting up a model is to
identify the dependent variable (DV)
among the Level-1 variables. HLM
automatically sets up basic equations for
Levels 2 and 3. By clicking on the
“Mixed” button in the lower right
corner, HLM will present the mixed
model equation, in which the Level 2
and 3 equations are substituted into the
Level 1 equation. This mixed model
shows that currently our model only
includes a single, fixed-effect intercept
(i.e., γ000) and three random-effects:
random intercepts at Levels 2 and 3, and
the Level 1 residual error.
November 2010
Dave Atkins, Ph.D.
Singer and Willett in their book, Applied Longitudinal Data Analysis, call this model the
“unconditional means model” (see pp. 92-97). This model can be useful if we wanted to know
how much variability in the DV existed at each level of our data.
Clicking on any covariate brings up a
dialogue box inquiring whether we want to
add the variable: 1) uncentered, 2) grandmean centered, or 3) group-mean centered.
If the value zero on the scale of your
covariate is a meaningful value, the
covariate should typically be entered uncentered. For “assess” zero is the start of
therapy, which is meaningful, and we add
it uncentered.
At this point, additional equations are
included in both Level 2 and 3 models,
though notice that the mixed model shows
that only a single fixed-effect has been
added. Note that additional randomeffects are “greyed out” in our equations –
they have not yet been included, but could
be by clicking on them.
Selecting the “Basic Settings” menu brings
up a dialogue box, where we can provide a name for the output file that HLM will produce, as
well as boxes to request residual files for each level of the data, which I have selected.
Next, click “Run Analysis” at top and then “Run model as shown” when asked. After the
program stops iterating, open the output by going to “File/View output”. Below is the output,
with selected comments.
Program:
Authors:
Publisher:
HLM 6 Hierarchical Linear and Nonlinear Modeling
Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
[email protected]
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM3.EXE (6.08.29257.1)
Date:
3 March 2010, Wednesday
Time:
16:16:11
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM3 RUN
Problem Title: no title
The data source for this run
workshop\atkinsjfp.mdm
= Z:\Documents\Lectures and Workshops\Bodenmann 2010
November 2010
Dave Atkins, Ph.D.
The command file for this run
Output file name
workshop\hlm3.txt
The maximum number of level-1
The maximum number of level-2
The maximum number of level-3
= C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\whlmtemp.hlm
= Z:\Documents\Lectures and Workshops\Bodenmann 2010
units = 1072
units = 268
units = 134
NOTE: It is always a good idea to make sure that the number of observations and groups
match up with what they should be.
The maximum number of iterations = 100
Method of estimation: full maximum likelihood
The outcome variable is
DAS
The model specified for the fixed effects was:
---------------------------------------------------Level-1
Coefficients
--------------------INTRCPT1, P0
#
ASSESS slope, P1
Level-2
Predictors
--------------INTRCPT2, B00
# INTRCPT2, B10
Level-3
Predictors
---------------INTRCPT3, G000
INTRCPT3, G100
'#' - The residual parameter variance for the parameter has been set to zero
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = P0 + P1*(ASSESS) + E
Level-2 Model
P0 = B00 + R0
P1 = B10
Level-3 Model
B00 = G000 + U00
B10 = G100
NOTE: These equations come directly from what was specified in the HLM GUI (i.e.,
graphical user interface) and will directly relate to the output shown below. Sadly,
some of the terms for random-effects below are different than what is shown above.
For starting values, data from 1072 level-1 and 268 level-2 records were used
Iterations stopped due to small change in likelihood function
******* ITERATION 7 *******
Sigma_squared =
93.62851
NOTE: This is the estimate of the level-1 error variance (“E” in the equation above)
Standard Error of Sigma_squared =
Tau(pi)
INTRCPT1,P0
4.66977
71.91814
NOTE: This is the estimate of the level-2 random intercept variance (“R0” above, or
the variance of the P0 coefficient).
November 2010
Dave Atkins, Ph.D.
Tau(pi) (as correlations)
INTRCPT1,P0 1.000
Standard Errors of Tau(pi)
INTRCPT1,P0
11.70420
---------------------------------------------------Random level-1 coefficient
Reliability estimate
---------------------------------------------------INTRCPT1, P0
0.754
---------------------------------------------------NOTE: Reliability relates to how much variability there is between groups compared to
total variability (i.e., between group variance + error variance). If the error
variance is large, there will be poor reliability, which affects our power.
Tau(beta)
INTRCPT1
INTRCPT2,B00
97.82677
NOTE: This is the estimate of the level-3 random intercept variance (“U00” in the
equation above).
Tau(beta) (as correlations)
INTRCPT1/INTRCPT2,B00 1.000
Standard Errors of Tau(beta)
INTRCPT1
INTRCPT2,B00
18.70386
---------------------------------------------------Random level-2 coefficient
Reliability estimate
---------------------------------------------------INTRCPT1/INTRCPT2, B00
0.672
---------------------------------------------------The value of the likelihood function at iteration 7 = -4.217125E+003
The outcome variable is
DAS
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, P0
For INTRCPT2, B00
INTRCPT3, G000
83.958879
1.114878
75.308
133
0.000
For
ASSESS slope, P1
For INTRCPT2, B10
INTRCPT3, G100
3.359910
0.264333
12.711
1070
0.000
---------------------------------------------------------------------------NOTE: These are estimates of the fixed-effects – the overall intercept and slope for
our sample. HLM reports these directly mapping on to the hierarchical equations – so
P0 is connected to B00 which is in turn connect to G000. Although this is a bit
“wordy” it does follow the equations closely.
The outcome variable is
DAS
Final estimation of fixed effects
November 2010
Dave Atkins, Ph.D.
(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, P0
For INTRCPT2, B00
INTRCPT3, G000
83.958879
0.885803
94.783
133
0.000
For
ASSESS slope, P1
For INTRCPT2, B10
INTRCPT3, G100
3.359910
0.405681
8.282
1070
0.000
---------------------------------------------------------------------------NOTE: The coefficients are identical between these two tables, but the standard errors
here are “robust” in the sense that they are valid, even when the data have unequal
variances (in the statistics literature, sometimes called “heteroscedastically
consistent”). Raudenbush and Bryk note that when the standard errors are notably
different between the two tables, this could indicate model mis-specification. What
might not be right about our current model?
Final estimation of level-1 and level-2 variance components:
-----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square
P-value
Deviation
Component
-----------------------------------------------------------------------------INTRCPT1,
R0
8.48046
71.91814
134
545.71355
0.000
level-1,
E
9.67618
93.62851
-----------------------------------------------------------------------------Final estimation of level-3 variance components:
-----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square
P-value
Deviation
Component
-----------------------------------------------------------------------------INTRCPT1/INTRCPT2, U00
9.89074
97.82677
133
409.03277
0.000
-----------------------------------------------------------------------------NOTE: These are simply the variance components (again), but including a significance
test. Singer and Willett note that these tests can be badly biased with small sample
sizes. A preferred method of testing variance components is to use the deviance,
reported below.
Statistics for current covariance components model
-------------------------------------------------Deviance
= 8434.249432
Number of estimated parameters = 5
November 2010
Dave Atkins, Ph.D.
Testing Assumptions
All statistical models have assumptions, and testing assumptions almost always involves
examining the residuals. With HLM, we have several types of residuals depending on how many
levels are in our data and how many random-effects are in our model. HLM will create residual
files to be opened in another statistical package (e.g., SPSS) – one per level of your data. To
request residual files, click on Basic Settings and select the appropriate buttons:
•
•
•
•
NOTE: If you want each residual file (i.e., level-1, level-2, and level-3), you do need
to click on each button.
After running the model, you will find three files called: resfil1.sav, resfil2.sav, and
resfil3.sav (or, better yet, give them names that are more meaningful).
The level-1 file contains:
o L1resid: The residual of each observed data point from the fitted value of the
HLM analysis
o Fitval: The fitted value
o Sigma: The square-root of the level-1 variance
The level-2 and level-3 files contain:
o Njk: Number of data points per individual
o Empirical Bayes residuals of intercepts and other random-effects parameters
 Realize that these are the residuals and not the subject-specific
coefficients.
o Subject-specific coefficients (each name begins with “ec”), which combine
the fixed-effects with the random-effects
o Posterior variances of the random-effects
o NOTE: If you only run a two level model, there are a few additional variables
that can be used to assess the model, but they are not critical.
November 2010
Dave Atkins, Ph.D.
Using the output found in the residual files, we can examine and assess the assumptions of the
HLM model.
•
•
•
Assessing normality of all error terms (i.e., level-1 residuals and empirical Bayes
residuals at levels 2 and 3)
o Create histograms or (better choice) QQ plots
Equal variances along regression plane
o Scatter plot of level-1 residuals on fitted values
Correct specification of time variable
o Scatter plot (or boxplots) of level-1 residuals on time variable
Additional Options/Tools within HLM
Deviance Tests for Variance Components
Although HLM provides tests for variance components near the end of the output, these can be
biased. A better way to test for the necessity of variance components is to use a deviance test.
The deviance of the model is an overall summary of the model fit that we can use to compare
models, but it does not have an absolute interpretation. That is, telling someone that the
deviance for your model is 1,209.98 means nothing.
To use deviance tests to examine additional random-effects, we will need the values of the
deviance statistics in the smaller model (i.e., the model without the additional random-effects).
For example, from our previous results:
Statistics for current covariance components model
-------------------------------------------------Deviance
= 8434.249432
Number of estimated parameters = 5
Next, we add a random slope for “assess”
at level-3. Before running the model, we
go to /Other Settings/Hypothesis Testing.
Here we find a dialogue box for entering a
deviance statistic and number of
parameters. Enter the values for the
smaller model, like we see at right.
Now run the model.
At the very end of the output, we will find
our deviance test, comparing the smaller model (without random slope for assess) to the larger
model:
November 2010
Dave Atkins, Ph.D.
Statistics for current covariance components model
-------------------------------------------------Deviance
= 8293.124830
Number of estimated parameters = 7
Model comparison test
----------------------------------Chi-square statistic
=
141.11517
Number of degrees of freedom =
2
P-value
= 0.000
Note that there is a new deviance value and new number of estimated parameters (why 7?). We
also have a model comparison, where the chi-square statistic is simply the difference in
deviances between the two models, and it is tested against a chi-square distribution with degrees
of freedom equal to the difference in parameters. The p-value is highly significant, indicating
that the fit is significantly improved by adding the random slope of assess.