Generalizing Experimental Study Results to Target Populations

Generalizing experimental study results to target
populations
Elizabeth Stuart
Johns Hopkins Bloomberg School of Public Health
Departments of Mental Health, Biostatistics,
and Health Policy and Management
[email protected]
www.biostat.jhsph.edu/∼estuart
Funding thanks to NSF DRL-1335843, IES R305D150003
February 26, 2016
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
1 / 25
Outline
1
Introduction, context, and framework
2
The setting and overview of approaches
3
Reweighting approaches
4
Conclusions
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
2 / 25
Outline
1
Introduction, context, and framework
2
The setting and overview of approaches
3
Reweighting approaches
4
Conclusions
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
3 / 25
Making research results relevant: A range of policy or
practice questions
A given district or school may go on to the What Works
Clearinghouse to see whether a new reading intervention is
“evidence-based” and helpful for them
The state of Maryland may be deciding whether to recommend the
new program for all schools or districts in the state
Or for all “struggling” schools?
Medicare may be deciding whether or not to approve payment for a
new treatment for back pain
Should a broad public health media campaign be started around not
switching car seats to forward facing until a child is 12 months old?
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
4 / 25
From individual to population effects
All of these reflect a “population” average treatment effect
e.g., across individuals in a population, does this intervention work “on
average”?
This population could be fairly narrow, or quite broad
There may actually be underlying treatment effect heterogeneity
e.g., stronger effects for some individuals
Lots of interest in tailoring treatments for individuals; not my focus
today
But for policy questions that motivate today’s talk, desire an overall
average effect
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
5 / 25
At this point, relatively little attention to how well results from a given
study might carry over to a relevant target population
This talk will discuss recent work trying to get people to start thinking
about these issues, while taking advantage of recent advances in study
quality and data
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
6 / 25
How much do we need to worry about external validity?
Lots of evidence that the people or groups that participate in trials
differ from general populations
Will cause bias if the factors that differ also moderate treatment effects
Districts that participate in rigorous educational evaluations much
larger than typical districts in the US (Stuart et al., under review)
People that participate in trials of drug abuse treatment have higher
education levels than those in drug abuse treatment nationwide
(Susukida et al., in press)
Increasing worries about lack of minority representation in clinical
trials
And these differences can lead to external validity bias (Bell et al., in
press)
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
7 / 25
Outline
1
Introduction, context, and framework
2
The setting and overview of approaches
3
Reweighting approaches
4
Conclusions
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
8 / 25
The setting
Assume we have one randomized trial, already conducted
And also covariate data on some target population of interest (do not
have treatment values or outcomes in the population)
The question: How can we use these data to estimate the effects of
the intervention in the target population?
Note: Focused on assessing and enhancing external validity with
respect to the characteristics of trial and population subjects
Lots of other threats to external validity as well: scale-up problems,
implementation, different settings, . . . (see Cook, 2014)
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
9 / 25
Analysis approaches for estimating population effects
Meta-analysis: When multiple studies available, but does not
necessarily give population estimates
Cross-design synthesis: Explicitly combines experimental and
non-experimental effect estimates (Pressler & Kaizar, 2013)
Model-based approaches: Model outcome in the trial, use to
predict outcomes in the population (e.g., BART; Kern et al., 2016)
Post-stratification: Estimate separate effects, then combine using
population proportions
Reweighting: Like a smoothed version of post-stratification (Cole &
Stuart, 2009; O’Muircheartaigh & Hedges, 2014)
(Of course design options exist too, e.g., aiming to enroll
representative (or “balanced”) samples (Royall!))
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
10 / 25
Outline
1
Introduction, context, and framework
2
The setting and overview of approaches
3
Reweighting approaches
4
Conclusions
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
11 / 25
Case study: The ACTG Trial
Examined highly active antiretroviral (HAART) therapy for HIV
compared to standard combination therapy
577 US HIV+ adults randomized to treatment, 579 to control
33/577 and 63/579 endpoints (AIDS/death) during 52-week follow-up
Intent-to-treat analysis: Hazard ratio of 0.51 (95% CI: 0.33, 0.77)
Cole & Stuart (2010)
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
12 / 25
The target population
Don’t necessarily just care about people in trial
What would the effects of the treatment be if implemented
nationwide?
US estimates of the number of people infected with HIV in 2006
(CDC, 2008)
HIV incidence was estimated using a statistical approach with
adjustment for testing frequency and extrapolated to the US
Have joint distribution of sex, race, and age groups of the newly
infected individuals
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
13 / 25
Inverse probability of selection weighting
Weight the trial subjects up to the population
1
Each subject in trial receives weight wi = P(Si =1|X
)
(Inverse of their probability of being in the trial)
Use those weights when calculating means or running regressions
Related to inverse probability of treatment weighting,
Horvitz-Thompson estimation in surveys
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
14 / 25
Standard assumptions
Experiment was randomized
“Sample ignorability for treatment effects”: selection into the trial
independent of impacts given the observed covariates
For the same value of observed covariates, impacts the same across
trial and population
No unmeasured variables related to selection into the trial and
treatment effects (Sensitivity analysis for this: Nguyen et al., under
review)
“Overlap”: all individuals in the population had a non-zero probability
of participating in the trial
Analogous to strong ignorability/unconfoundedness of treatment
assignment in non-experimental studies
(If outcome under control observed in the population, can use a
slightly different assumption)
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
15 / 25
Effect heterogeneity and predictors of participation
People in trial more likely to be:
Older (not 13-29)
Male
White or Hispanic
Those characteristics also moderate effects in the trial
Detrimental effects for young people
Largest effects for those 30-39
Larger effects for males, as compared to females
Larger effects for blacks, as compared to White or Hispanic
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
16 / 25
Estimated population effects
Crude trial results
Age weighted
Sex weighted
Race weighted
Age-sex-race weighted
Hazard ratio
0.51
0.68
0.53
0.46
0.57
95% CI
0.33, 0.77
0.39, 1.17
0.34, 0.82
0.29, 0.72
0.33, 1.00
CI’s longer for weighted results
Effects generally somewhat attenuated, except for weighting only by
race
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
17 / 25
Placebo checks
Can also use the weighting as a diagnostic
Weighted control group mean should match the population outcome
mean if the control conditions are the same (“placebo check”)
In HAART case, if we had mortality information in the population,
could see if weighted mortality rate among control group matched the
population mortality rate (assuming no treatment in the population)
If placebo check fails, may indicate unobserved differences between
the groups
Hartman et al., 2013; Stuart et al., 2011
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
18 / 25
Outline
1
Introduction, context, and framework
2
The setting and overview of approaches
3
Reweighting approaches
4
Conclusions
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
19 / 25
Everyone wants to assume that study results generalize
But very few statistical methods exist
At this point, lots of “hand waving,” qualitative statements
Need more statistical methods to quantify and improve external
validity
For both study design and study analysis
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
20 / 25
What do we need to assess and enhance external validity?
Information on the factors that influence treatment effect
heterogeneity
Information on the factors that influence participation in rigorous
evaluations
Data on all of these factors in the trial and the population
Not very helpful if these factors not observed in the population
Methods that allow for the differences between trial and population
on these factors
These are coming along
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
21 / 25
Data a primary limiting factor
Right now we have very little information on factors that influence
effects or participation in trials
Sometimes hard to find population data
Trial data also often not publicly available
Even harder to find population data that has the same measures as
trial of interest
Stuart & Rhodes (under review): Hard to find appropriate population
data, and even then out of over 400 measures in each, only about 7
were comparable
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
22 / 25
Conclusions
Can’t necessarily assume that average effects seen in a trial would
carry over directly to a target population
Methods allow us to adjust for differences in observed characteristics
between the trial sample and population to estimate population
treatment effects
But only as good as the data available!
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
23 / 25
And remember . . .
“With better data, fewer assumptions are needed.”
- Rubin (2005, p. 324)
“You can’t fix by analysis what you bungled by design.”
- Light, Singer and Willett (1990, p. v)
“Real world relationships are invariably more complicated than those we
can represent in mathematically tractable models.”
- Royall and Pfeffermann (1981, p. 16)
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
24 / 25
References, with thanks to all my co-authors
Bell, S.H., Olsen, R.B., Orr, L.L., and Stuart, E.A. (in press). Estimates of external validity bias when impact
evaluations select sites non-randomly. Forthcoming in Education Evaluation and Policy Analysis.
Cole, S.R. and Stuart, E.A. (2010). Generalizing evidence from randomized clinical trials to target populations: the
ACTG-320 trial. American Journal of Epidemiology 172: 107-115.
Imai, K., King, G., and Stuart, E.A. (2008). Misunderstandings between experimentalists and observationalists about
causal inference. Journal of the Royal Statistical Society, Series A 171: 481-502.
Kern, H.L., Stuart, E.A., Hill, J., and Green, D.P. (2016). Assessing methods for generalizing experimental impact
estimates to target populations. Journal of Research on Educational Effectiveness.
Olsen, R., Bell, S., Orr, L., and Stuart, E.A. (2013). External Validity in Policy Evaluations that Choose Sites
Purposively. Journal of Policy Analysis and Management 32(1): 107-121
Stuart, E.A., Cole, S.R., Bradshaw, C.P., and Leaf, P.J. (2011). The use of propensity scores to assess the
generalizability of results from randomized trials. The Journal of the Royal Statistical Society, Series A 174(2): 369-386.
Stuart, E.A., Bradshaw, C.P., and Leaf, P.J. (2015). Assessing the generalizability of randomized trial results to target
populations. Prevention Science 16(3): 475-485.
Susukida, R., Crum, R., Stuart, E.A., and Mojtabai, R. (in press). Assessing Sample Representativeness in Randomized
Control Trials: Application to the National Institute of Drug Abuse Clinical Trials Network. Forthcoming in Addiction.
Elizabeth Stuart (JHSPH)
Generalizability
February 26, 2016
25 / 25