New Statistical Models and Techniques for Analyzing EMA Data Runze Li The Methodology Center The Pennsylvania State University Runze Li (Penn State) New Statistical Models for EMA Data 1 / 20 Ecological Momentary Assessment Data (EMA) The EMA method was pioneered in the study of tobacco use (Stone & Shiffman, 1994) Typical data collection procedures: ∘ provide every participant an electronic data-collection device: palm computer, smart phone ... ∘ During the study, randomly prompt participants a few times every day and presents them with a number of questions every prompt ∘ Participant record their answer into the device as soon as possible. ⋄ Ecological: data not collected in lab ⋄ Momentary: data collected without recalling memory. Runze Li (Penn State) New Statistical Models for EMA Data 2 / 20 Examples Example 1: Using hand-held computers, EMA data were collect about 5 to 8 times per day over a period of about 50 days in a smoking cession study conducted by Saul Shiffman. 50 questions every time. 302 subjects in this study. Example 2 Using smart phones, EMA data were collected once in morning and one in afternoon during four week period (two weeks pre-quit and two week post-quit) in a smoking cession study conducted by Center for Tobacco Research & Intervention, Univ. of Wisconsin (led by Michael Fiore, Tim Baker, Megan Piper). A number of questions were asked each time. More than 1500 subjects in this study. Runze Li (Penn State) New Statistical Models for EMA Data 3 / 20 Intensive Longitudinal Data (ILD) Intensive Longitudinal Data: There are many subjects in ILD. Observations were collected from each subject at many observational times. The observational times may be possibly subject-dependent or well-scheduled. Traditional longitudinal data: There are many subjects in the study. Observations were collected from each subject at a few waves. The observational times are typically well-scheduled. Examples of ILD: (a) EMA data, (b) data automatically collected by using devices such as blood glucose meters, heart rate monitors and blood pressure monitors (c) daily data collected by using web-based survey lasting two semesters; (d) data collected in mature longitudinal studies such as HIV studies in which data were collected semi-annual for more than 15 years. Runze Li (Penn State) New Statistical Models for EMA Data 4 / 20 Intensive longitudinal data Intensive Longitudinal Data 6 5 Subject 4 3 2 1 0 0 Runze Li (Penn State) 1 2 3 4 5 Times 6 7 New Statistical Models for EMA Data 8 9 10 5 / 20 Traditional longitudinal data Traditonal Longitudinal Data 6 5 Subject 4 3 2 1 0 0 Runze Li (Penn State) 1 2 3 4 5 Times 6 7 New Statistical Models for EMA Data 8 9 10 6 / 20 Important characteristics and features of ILD (F1) Complex data structure: data collected at irregular (subject-dependent) time points ⇒ occasions are nested within individuals ⇒ data have a multilevel structure (F2) Covariates: time-varying (such as negative affect) and time-invariant (such as gender) Models for traditional longitudinal data, such as hierarchical linear models/multilevel models/mixed effects model, can be used to fit data with (F1) and (F2) under model assumptions (a) effects are constant, linear or quadratic function of time. (b) constant error variance. If collection of EMA data is merely for reducing recalling bias. Traditional models are fine. Runze Li (Penn State) New Statistical Models for EMA Data 7 / 20 Time-Varying Coefficient Models (TVEM) If collection of EMA data is also for exploring fine features. Traditional models are not flexible enough. From our empirical analysis, models for EMA should have the following two features: (F3) The structure of the error process changes over time. The error variance may dramatically change over time, making the classic linear modeling assumption of a constant variance term untenable. (F4) Effects vary across time. For example, the relationship between mood and daily smoking may change over time during post-quit period. Runze Li (Penn State) New Statistical Models for EMA Data 8 / 20 A Demo ∙ Data: collected by Saul Shiffman. ∘ limited ourselves to the subset of data collected by random prompts ∘ Y: Score of urge to smoke, treated as a continuous variable ∘ X : scores of Negative Affect(NA) ∘ Data are aligned by quit day. ∙ Time-varying Effect Model: Urget Runze Li (Penn State) = 𝛽0 (t) + 𝛽1 (t)NAt + errort New Statistical Models for EMA Data 9 / 20 Estimated Time-Varying Intercept Runze Li (Penn State) New Statistical Models for EMA Data 10 / 20 Time-Varying Intercept ∘ Intercept function provides us an overall trend when the mood of smokers is on average level. ∘ The overall trend almost remain constant during the pre-quit period; ∘ The overall trend shows urge to smoke decreases after quit day. ∙ Take home message: ∘ It is difficult to specify a model for the intercept not constant, not linear and not quadratic ∘ The ordinary linear regression is not appropriate here. Runze Li (Penn State) New Statistical Models for EMA Data 11 / 20 Estimated Time-Varying Effect of Negative Affect Runze Li (Penn State) New Statistical Models for EMA Data 12 / 20 Effect of negative affect ∘ constant during the pre-quit period ∘ significant jump at quit period ∙ Consistent with Baker’s theory: negative affect has stronger effect on urge to smoke in post-quit period than in the pre-quit period. Runze Li (Penn State) New Statistical Models for EMA Data 13 / 20 Good news for you How to implement the TVEM for ILD? Estimation of time-varying coefficient model can be done by SAS. SAS %TVEM macro is available. One may just simply plug-in data, and the SAS-macro will automatically produce the plots See Stephanie’s presentation and Xiaoyu Liu’s poster SAS macro works for a continuous response as well as for binary and count responses including zero-inflated count responses See Sara Vasilenko’s poster and Michael Yang’s poster. Runze Li (Penn State) New Statistical Models for EMA Data 14 / 20 In many situations, we found that some effects indeed are constant, while some effects change over time. A flexible model for ILD should allow (F1)-(F3) and (F5) Some effects vary across time and some effect are time-invariant. Good news: (a) The SAS %TVEM macro can directly used to model with both time-varying effects and time-invariant effects. (b) A final model is selected by data-driven model selection criteria (e.g. AIC and BIC). ⇒ TVEM can be used as a model diagnostic tool to examine whether a traditional model fit the data well. Runze Li (Penn State) New Statistical Models for EMA Data 15 / 20 Functional Hierarchical Linear Models A flexible model for ILD should allow (F1)-(F5) and (F6) Effects vary between individuals. For example, the relationship between mood and daily smoking may vary across individual, since some people smoke while having fun with their friends but others smoke because when are lonely or frustrated. Functional hierarchical linear model (HLM) allows both fixed and individual effects in HLM varying over time. It has all Features (F1)-(F6) Estimation of functional HLM can be done by SAS. SAS % FHLMLLR macro is available. One may just simply plug-in data, and the SAS-macro will automatically produce the plots Runze Li (Penn State) New Statistical Models for EMA Data 16 / 20 Mixture of TVEMs Motivation: For some smokers, mood may have strong effect on urge to smoke, while it may have only weak effect for some other smokers. If there are qualitatively different groups in a population, then this should be taken into account in the model in order to avoid glossing over important relationships. (F7) The population may be a combination of several discrete latent subpopulations. An appropriate model with Features (F1)-(F5) for ILD with (F7) is mixture of TVEMs Mixture of TVEM is similar to the mixture of trajectory analysis, but is more flexible about model specification for the effects. We are developing a SAS macro for estimation of mixture of TVEMs. See John Dziak’s poster. Runze Li (Penn State) New Statistical Models for EMA Data 17 / 20 Joint modeling Scientific questions to address: e.g., what is the impact of negative affect on “time to the first lapse (or relapse)”? Response: Time to the first lapse. More general, time to a discrete event. Often referred to as “survival” data. Covariates: longitudinal processes, such as the negative affect. (F8) Time-to-event response with longitudinal covariate processes Challenges: (a) Time to discrete event can be censored; and (b) covariates are longitudinal processes rather than a multidimensional vector. Joint model with time-varying effects is appropriate for such cases. Extensions to distal outcome with longitudinal covariate processes Runze Li (Penn State) New Statistical Models for EMA Data 18 / 20 Remarks Through collaborations with Saul Shiffman and Megan Piper, WE have developed several classes of models that are useful for analyzing ILD. WE means Stephanie Lanza John Dziak, Mariya Shiyko, Xianming Tan, Sara Vasilenko Esra Kurum, Junyi Lin, Xiaoyu Liu Acknowledgments: Our works have been supported by P50-DA10075 and R21 DA024260. Runze Li (Penn State) New Statistical Models for EMA Data 19 / 20 Thank you very much Runze Li (Penn State) New Statistical Models for EMA Data 20 / 20
© Copyright 2026 Paperzz