New Statistical Models and Techniques for Analyzing EMA Data

New Statistical Models and Techniques for Analyzing
EMA Data
Runze Li
The Methodology Center
The Pennsylvania State University
Runze Li (Penn State)
New Statistical Models for EMA Data
1 / 20
Ecological Momentary Assessment Data (EMA)
The EMA method was pioneered in the study of tobacco use (Stone &
Shiffman, 1994)
Typical data collection procedures:
∘ provide every participant an electronic data-collection device:
palm computer, smart phone ...
∘ During the study, randomly prompt participants a few times every day
and presents them with a number of questions every prompt
∘ Participant record their answer into the device as soon as possible.
⋄ Ecological: data not collected in lab
⋄ Momentary: data collected without recalling memory.
Runze Li (Penn State)
New Statistical Models for EMA Data
2 / 20
Examples
Example 1: Using hand-held computers, EMA data were collect about 5
to 8 times per day over a period of about 50 days in a smoking cession
study conducted by Saul Shiffman. 50 questions every time. 302 subjects
in this study.
Example 2 Using smart phones, EMA data were collected once in morning
and one in afternoon during four week period (two weeks pre-quit and two
week post-quit) in a smoking cession study conducted by Center for
Tobacco Research & Intervention, Univ. of Wisconsin (led by Michael
Fiore, Tim Baker, Megan Piper). A number of questions were asked each
time. More than 1500 subjects in this study.
Runze Li (Penn State)
New Statistical Models for EMA Data
3 / 20
Intensive Longitudinal Data (ILD)
Intensive Longitudinal Data: There are many subjects in ILD.
Observations were collected from each subject at many observational
times. The observational times may be possibly subject-dependent or
well-scheduled.
Traditional longitudinal data: There are many subjects in the study.
Observations were collected from each subject at a few waves. The
observational times are typically well-scheduled.
Examples of ILD:
(a) EMA data,
(b) data automatically collected by using devices such as blood glucose
meters,
heart rate monitors and blood pressure monitors
(c) daily data collected by using web-based survey lasting two semesters;
(d) data collected in mature longitudinal studies such as HIV studies
in which data were collected semi-annual for more than 15 years.
Runze Li (Penn State)
New Statistical Models for EMA Data
4 / 20
Intensive longitudinal data
Intensive Longitudinal Data
6
5
Subject
4
3
2
1
0
0
Runze Li (Penn State)
1
2
3
4
5
Times
6
7
New Statistical Models for EMA Data
8
9
10
5 / 20
Traditional longitudinal data
Traditonal Longitudinal Data
6
5
Subject
4
3
2
1
0
0
Runze Li (Penn State)
1
2
3
4
5
Times
6
7
New Statistical Models for EMA Data
8
9
10
6 / 20
Important characteristics and features of ILD
(F1) Complex data structure: data collected at irregular
(subject-dependent) time points
⇒ occasions are nested within individuals
⇒ data have a multilevel structure
(F2) Covariates: time-varying (such as negative affect) and time-invariant
(such as gender)
Models for traditional longitudinal data, such as hierarchical linear
models/multilevel models/mixed effects model, can be used to fit data
with (F1) and (F2) under model assumptions
(a) effects are constant, linear or quadratic function of time.
(b) constant error variance.
If collection of EMA data is merely for reducing recalling bias. Traditional
models are fine.
Runze Li (Penn State)
New Statistical Models for EMA Data
7 / 20
Time-Varying Coefficient Models (TVEM)
If collection of EMA data is also for exploring fine features. Traditional
models are not flexible enough.
From our empirical analysis, models for EMA should have the following
two features:
(F3) The structure of the error process changes over time. The error
variance may dramatically change over time, making the classic linear
modeling assumption of a constant variance term untenable.
(F4) Effects vary across time.
For example, the relationship between mood and daily smoking may
change over time during post-quit period.
Runze Li (Penn State)
New Statistical Models for EMA Data
8 / 20
A Demo
∙ Data: collected by Saul Shiffman.
∘ limited ourselves to the subset of data collected by random prompts
∘ Y: Score of urge to smoke, treated as a continuous variable
∘ X : scores of Negative Affect(NA)
∘ Data are aligned by quit day.
∙ Time-varying Effect Model:
Urget
Runze Li (Penn State)
= 𝛽0 (t) + 𝛽1 (t)NAt + errort
New Statistical Models for EMA Data
9 / 20
Estimated Time-Varying Intercept
Runze Li (Penn State)
New Statistical Models for EMA Data
10 / 20
Time-Varying Intercept
∘ Intercept function provides us an overall trend when the mood of
smokers is on average level.
∘ The overall trend almost remain constant during the pre-quit period;
∘ The overall trend shows urge to smoke decreases after quit day.
∙ Take home message:
∘ It is difficult to specify a model for the intercept
not constant, not linear and not quadratic
∘ The ordinary linear regression is not appropriate here.
Runze Li (Penn State)
New Statistical Models for EMA Data
11 / 20
Estimated Time-Varying Effect of Negative Affect
Runze Li (Penn State)
New Statistical Models for EMA Data
12 / 20
Effect of negative affect
∘ constant during the pre-quit period
∘ significant jump at quit period
∙ Consistent with Baker’s theory: negative affect has stronger effect on
urge to smoke in post-quit period than in the pre-quit period.
Runze Li (Penn State)
New Statistical Models for EMA Data
13 / 20
Good news for you
How to implement the TVEM for ILD?
Estimation of time-varying coefficient model can be done by SAS.
SAS %TVEM macro is available.
One may just simply plug-in data, and the SAS-macro will automatically
produce the plots
See Stephanie’s presentation and Xiaoyu Liu’s poster
SAS macro works for a continuous response as well as for binary and count
responses including zero-inflated count responses
See Sara Vasilenko’s poster and Michael Yang’s poster.
Runze Li (Penn State)
New Statistical Models for EMA Data
14 / 20
In many situations, we found that some effects indeed are constant, while
some effects change over time.
A flexible model for ILD should allow (F1)-(F3) and
(F5) Some effects vary across time and some effect are
time-invariant.
Good news:
(a) The SAS %TVEM macro can directly used to model with both
time-varying effects and time-invariant effects.
(b) A final model is selected by data-driven model selection criteria (e.g.
AIC and BIC).
⇒ TVEM can be used as a model diagnostic tool to examine whether a
traditional model fit the data well.
Runze Li (Penn State)
New Statistical Models for EMA Data
15 / 20
Functional Hierarchical Linear Models
A flexible model for ILD should allow (F1)-(F5) and
(F6) Effects vary between individuals.
For example, the relationship between mood and daily smoking may vary
across individual, since some people smoke while having fun with their
friends but others smoke because when are lonely or frustrated.
Functional hierarchical linear model (HLM) allows both fixed and
individual effects in HLM varying over time. It has all Features (F1)-(F6)
Estimation of functional HLM can be done by SAS.
SAS % FHLMLLR macro is available.
One may just simply plug-in data, and the SAS-macro will automatically
produce the plots
Runze Li (Penn State)
New Statistical Models for EMA Data
16 / 20
Mixture of TVEMs
Motivation: For some smokers, mood may have strong effect on urge to
smoke, while it may have only weak effect for some other smokers. If there
are qualitatively different groups in a population, then this should be taken
into account in the model in order to avoid glossing over important
relationships.
(F7) The population may be a combination of several discrete latent
subpopulations.
An appropriate model with Features (F1)-(F5) for ILD with (F7) is
mixture of TVEMs
Mixture of TVEM is similar to the mixture of trajectory analysis, but is
more flexible about model specification for the effects.
We are developing a SAS macro for estimation of mixture of TVEMs.
See John Dziak’s poster.
Runze Li (Penn State)
New Statistical Models for EMA Data
17 / 20
Joint modeling
Scientific questions to address: e.g., what is the impact of negative affect
on “time to the first lapse (or relapse)”?
Response: Time to the first lapse. More general, time to a discrete event.
Often referred to as “survival” data.
Covariates: longitudinal processes, such as the negative affect.
(F8) Time-to-event response with longitudinal covariate processes
Challenges: (a) Time to discrete event can be censored; and (b) covariates
are longitudinal processes rather than a multidimensional vector.
Joint model with time-varying effects is appropriate for such cases.
Extensions to distal outcome with longitudinal covariate processes
Runze Li (Penn State)
New Statistical Models for EMA Data
18 / 20
Remarks
Through collaborations with Saul Shiffman and Megan Piper, WE have
developed several classes of models that are useful for analyzing ILD.
WE means
Stephanie Lanza
John Dziak, Mariya Shiyko, Xianming Tan, Sara Vasilenko
Esra Kurum, Junyi Lin, Xiaoyu Liu
Acknowledgments: Our works have been supported by P50-DA10075 and
R21 DA024260.
Runze Li (Penn State)
New Statistical Models for EMA Data
19 / 20
Thank you very much
Runze Li (Penn State)
New Statistical Models for EMA Data
20 / 20