Can we establish cause-and-effect relationships in large

Can we establish cause-and-effect
relationships in large healthcare databases?
Lawrence McCandless
Associate Professor
[email protected]
Faculty of Health Sciences, Simon Fraser University
Spring 2016
Example of large healthcare database study
Antidepressants and Suicide: Causal Link?
Example of large healthcare database study
Antidepressants and Suicide: Causal Link?
Example of large healthcare database study
Antidepressants and Suicide: Causal Link?
• The Prolem: Confounding variables
• What is causing suicide? SSRIs, depression, or both?!?
• Were the authors able to control for confounders?
Outline
1. An Example of Confounding: Beta-blocker Therapy in
Heart Failure Patients
2. The Problem of Confounding when infering causation in
large healthcare database studies
3. The assumption of no unmeasured confounders
4. The role of Bayesian statistics
An Motivating Example:
The Effectiveness of Beta-Blocker Therapy in Heart Failure
Patients.
Example of Confounding: Beta Blocker Therapy in
Heart Failure Patients
I present a re-analysis of healthcare administrative data
described by McCandless (2007).
A retrospective cohort study that examined the relationship
between beta-blocker therapy and lower mortality in n=6969
heart faliure patients who were followed for 2 years between
1999-2001.
Among 6969 patients, 1295 were treated with beta blockers
(5674 were untreated).
All participants were followed for up to 2 years, and a total of
1755 died (25% died).
Observational study: No randomization
McCandless, Gustafson & Levy (2007) Stat Med Bayesian sensitivity analysis for unmeasured confounding in
observational studies.
Example of Confounding: Beta Blocker Therapy in
Heart Failure Patients
Here is a simple multiple logistic regression model:
Let Y = 1, 0 denote a dichotomous outcome variable for death
by the end of follow-up.
Let X = 1, 0 denote a dichotomous exposure variable for
beta-blocker treatment.
Let C denote at k = 12 vector of measured confounders (all
dichotomous), including sex, age, and 8 disease indicator
variables (e.g. cancer, heart disease).
Logistic Regression Analysis Results
Adjusted odds ratios for the association between treatment and lower mortality
Covariate
Beta blocker
Female sex
Age
<65
65-74
75-84
85+
Comorbid conditions
Cerebrovascular Dis
COPD
Hyponatremia
..
.
Odds Ratio (95% confidence interval)
0.72 (0.62-0.86)
0.75 (0.67-0.84)
1.00
1.43 (1.14-1.80)
2.20 (1.81-2.71)
3.29 (2.67-4.08)
1.37 (0.71-2.60)
1.07 (0.82-1.39)
1.10 (0.81-1.49)
..
.
Example of Confounding: Beta Blocker Therapy in
Heart Failure Patients
The crude odds ratio for the X − Y association was 0.63 with
95% CI (0.55, 0.74).
The question is: Was there adequate adjustment for
confounding?
Are there additional unmeasured confounders that were not
recorded in the health care administrative data?
Healthy patients were more likely to get treated with beta
blockers than unhealthy patients.
Additional important confounding variables: Smoking; Physical
activity; ...
Causal Inference in Large Healthcare Databases
The problems in the beta blocker data are typical of database
studies.
Large healthcare databases of electronic medical records have
advantages for epidemiology research:
• Very inexpensive compared to prospective cohort studies
• Extremely high power
• Free of errors from survey instruments (e.g. interviewer &
respondent biased reporting)
• Data capture entire populations because of publically
funded healthcare system
Causal Inference in Large Healthcare Databases
However observational studies are not randomized control
trials where study participants are randomly allocated to
receive treatment or placebo.
In observational studies, treated patients may differ
systematically from control patients due to unmeasured
confounding variables.
For example, treated patients may be healthier than control
patients.
Then correlation does not imply causation.
There is a mixing of the effect of the treatment with the effect of
the unmeasured confounders.
What is Causal Inference?
Causal inference is the process of establishing
cause-and-effect relationships from data that were not collected
from an experiment (e.g. randomized controlled trial).
The causal effect of a treatment is defined as the contrast
between two counterfactual outcomes:
1. The outcome that was observed in a treated individual,
versus
2. The outcome that would have been observed in the same
individual if they had not been treated.
These are called potential outcomes, or counterfactuals
because they are “contrary to fact”.
Hernan, Robins (2006) J Epi Com Health; Maldonado and Greenland (2002) Int J Epidmiol
Inferring causation in large healthcare databases
Causal inference in observational studies requires the
assumption of no unmeasured confounders.
Assumption:
The risk of death in the group 1 (Treatment) would have been
the same as the risk of death in group 2 (Control) had those
subjects not recieved treatment.
(and vice versa)
This is also called the assumption of ignorable treatment
assignment or exchangeability, and it is written as
(Y1 , Y0 ) ⊥⊥ X |C.
Also requires SUTVA assumption
Hernan, Robins (2006) J Epi Com Health, Maldonado and Greenland (2002) Int J Epidmiol
Inferring causation in large healthcare databases
To reduce confounding from measured variables, we can use
several methods:
• Regression adjustment (e.g. Logistic Regression)
• Propensity scores: Adjustment, matching, inverse
probability of treatment weighting and marginal structural
models
• Other methods: Restriction, standardization, g formula,
double robust methods, instrumental variables, disease
risk scores
Inferring causation in large healthcare databases
A fundemental problem when analyzing large healthcare
databases is the assumption of no unmeasured confounders.
Large healthcare databases are frequently missing information
on important clinical variables:
• Smoking
• Body mass index
• Severity of underlying disease
• Indications for treatment
Therefore correlation may not imply causation.
Example: Antidepressants and risk of suicide
Inferring causation in large healthcare databases
Rothman (2012) Epidemiology: An introduction
Inferring causation in large healthcare databases
Szklo & Nieto. (2015) Epidemiology, Beyond the Basics
Dealing with unmeasured confounders
• Possible solutions?
=⇒ Speculate about the characteristics of the unmeasured
confounder and then study the resulting inferences.
=⇒ This forms the basis of sensitivity analysis and bias
analysis.
Greenland, S. (2005) Multiple-bias modelling for analysis of
observational data. Journal of the Royal Statistical Society Ser A
168-267-306.
What is Sensitivity Analysis?
Sensitivity analysis for unmeasured confounding
1. Expand the model relating treatment and outcome to
include extra parameters which model confounding from
variables. (These are called bias parameters).
2. “Plug in” plausible values for bias parameters taken from
the literature.
3. Repeat the data analysis and verify that study conclusions
are robust to different choices of bias parameters.
• There are many papers on this topic (see Rosenbaum,
Rubin, Greenland, Robins). See Schneeweiss for
examples pharmacoepidemiology.
Sensitivity Analysis for Unmeasured Confounding
Suppose There is a Single Binary U.
Instead of this model
Logit[P(Y = 1|X , C)] = β0 + βX X + βC C
We use this model
Logit[P(Y = 1|X , C, U)] = β0 + βX X + βC C + βU U
LogitP(U = 1|X , C)] = γ0 + γX X
This is also know as a latent class model, or a mixture model
with 2 components.
Sensitivity Analysis for Unmeasured Confounding
Suppose There is a Single Binary U.
The model is indexed by 3 bias parameters:
βU odds ratio for association between U and Y
γX odds ratio for association between U and X
→ expit(γ0 ) and expit(γ0 + γX ) are the prevalences of U|X .
Sensitivity Analysis Results
Odds ratios for the relationship between beta blockers and mortality∗
Association
between
U and X
Association between U and Mortality
Protective
βU = −2
No Relation
No Effect
βU = −1
βU = 0
Harmful
βU = 1
βU = 2
OR = 1/6
OR = 1/3
OR = 1
OR = 3
OR = 9
0.70 (0.58-0.83)
0.71 (0.60-0.84)
0.72 (0.62-0.86)
0.71 (0.60-0.84)
0.70 (0.58-0.83)
1.15 (0.96-1.39)
0.96 (0.81-1.14)
0.72 (0.62-0.86)
0.56 (0.47-0.66)
0.48 (0.40, 0.57)
1.49 (1.15-1.78)
1.10 (0.93-1.31)
0.72 (0.62-0.86)
0.50 (0.43-0.60)
0.42 (0.35-0.49)
γX = 0 (OR=1)
Increase
γX = 1 (OR=3)
Large Increase
γX = 2 (OR=9)
∗
Adjusted for measured and unmeasured confounders
Note: Original odds ratio was 0.72 (0.62, 0.86) A sensitivity analysis using the method of Lin et al. (1998) Biometrics
How does Bayesian Statistics Fit in Here?
What is Bayesian Statistics?
Bayesian inference is an approach to statistics where scientific
evidence is summarized using probability distributions
Rather declaring a result to be significant or not-significant, the
Bayesian approach quantifies the probability that a relationship
is true.
Thus Bayesian methods give better representations of
uncertainty in complex data.
What is Bayesian Statistics?
Some of the advantages of Bayesian statistics:
Results are easier to interpret (e.g. compared to p-values and
confidence intervals)
Bayesian statistics easily accomodate complex models (e.g.
multilevel models, missing data, latent variables)
We can incorporate prior information using the prior
distribution. Bayesian approaches are well-suited knowledge
synthesis and combining datasets.
What is Bayesian Statistics?
Prior distribution: Summary of what we know about the
population quantity before having looked at the data.
Posterior distribution: knowledge after having seen the data.
Illustration of Bayesian statistics in diagnostic
testing
PPV =
sensitivity × prevalence
sensitivity × prevalence + (1 − specificity) × (1 − prevalence)
Equivalently,
P(D=1|T=1) =
P(T=1|D=1)P(D=1)
P(T=1|D=1)P(D=1) + P(T=0|D=0)P(D=0)
where
P(T = 1|D = 1) = Sensitivity, P(T = 0|D = 0) = Specificity
P(D = 1|T = 1) = Positive predictive value
Challenges with Bayesian Modelling
There are serious challenges with the practical implementation
of Bayesian models in large databases.
• Custom computer code must be developed.
• Trainees and personnel require advanced training in
biostatitics (e.g Bayesian methods, and epidemiollogy)
• Bayesian computation can be very challenging (e.g. due to
nonidentifiability and difficulties with understanding the
behavior of complex models).
However... some good news:
New software for Bayesian Analysis
Recently Andrew Gelman and others (20+) have developed the
software STAN
Gelman et al. (2015) Stan Reference Manual 2.7.0
STAN Bayesian software
STAN is generic software for doing Bayesian calculations.
It is based on adaptive Hamilton Monte Carlo, and it is a black
box precedure that requires little input from the user.
Based on probabilistic programming language, which uses a
computer program to represent probability distributions by
generating data.
Stan has is 500 page manual, dozens of distributions & models,
with extensive support, across platforms, C, R, python
Advantages of STAN software
In the RSTAN package in R, use the model code:
data {
real y[n]; // Death
real x[n]; // Treatment
}
parameters {
real beta0;
real betaX;
}
model {
y ~ (bernoulli_logit(beta0 + betaX X);
}
Or use the function increment_log_prob() and then STAN
does all the work.
It discovers structure of model and returns a sample.
Advantages of STAN software
15,000 interations, 10 chains, 1 hour on a new computer.
mean se_mean
sd
2.5%
25%
50%
75%
97.5%
betaX
-0.44
0.03 0.37
-1.14
-0.70
-0.44
-0.18
0.28
beta0
-1.81
0.09 0.82
-3.29
-2.53
-1.63
-1.15
-0.47
beta[1]
-0.27
0.00 0.06
-0.41
-0.32
-0.27
-0.23
-0.15
beta[2]
0.35
0.01 0.12
0.12
0.27
0.35
0.44
0.60
beta[3]
0.88
0.01 0.12
0.67
0.80
0.88
0.96
1.11
beta[4]
1.40
0.01 0.13
1.16
1.31
1.40
1.49
1.67
beta[5]
0.24
0.01 0.37
-0.50
-0.01
0.25
0.49
0.96
beta[6]
0.08
0.00 0.15
-0.21
-0.01
0.09
0.18
0.38
beta[7]
0.19
0.00 0.17
-0.16
0.07
0.19
0.30
0.52
beta[8]
0.76
0.00 0.12
0.53
0.67
0.75
0.83
0.98
beta[9]
0.38
0.01 0.37
-0.37
0.14
0.39
0.63
1.08
beta[10]
0.60
0.01 0.29
0.02
0.41
0.61
0.80
1.17
beta[11]
1.46
0.01 0.22
1.05
1.32
1.46
1.61
1.89
beta[12]
0.11
0.01 0.31
-0.51
-0.10
0.12
0.33
0.71
betaU
-0.15
0.17 1.53
-1.98
-1.66
-0.66
1.56
1.97
gammX
-0.07
0.07 1.20
-1.91
-1.17
-0.11
0.99
1.90
gamm0
-0.06
0.04 0.91
-1.76
-0.70
-0.06
0.58
1.74
lp__
-3763.58
0.07 3.04 -3770.38 -3765.41 -3763.25 -3761.40 -3758.62
n_eff Rhat
136 1.07
79 1.13
1190 1.01
490 1.02
430 1.02
418 1.02
990 1.01
1119 1.01
1184 1.01
1119 1.01
952 1.01
1113 1.00
794 1.02
929 1.01
79 1.12
267 1.03
663 1.01
1744 1.01
Back to beta-blocker data example...
Bayesian Sensitivity Analysis for Unmeasured
Confounding
We assign prior distributions to model parameters:
The bias parameters
βU , γ0 , γX ∼ Uniform(−2, 2)
to model uniform beliefs about the magnitude of unmeasured
confounding.
For the remaining model parameters
β0 , βX , βC ∝ 1
which are improper flat priors and very uninformative.
Results
Odds Ratios Adjusted for Measured and Unmeasured Confounders
Covariate
Beta blocker
Female sex
Age
<65
65-74
75-84
85+
Comorbid conditions
Cerebrovascular Dis
COPD
Hyponatremia
..
.
∗
NAIVE: Adjusted for C.
∗∗
Bayesian: Adjusted for C and U.
Odds Ratio (95% interval estimate)
NAIVE analysis∗
Bayesian analysis∗∗
Unif(-2,2) Prior
0.72 (0.62-0.86)
0.75 (0.67-0.84)
0.72 (0.45-1.15)
0.74 (0.65-0.84)
1.00
1.43 (1.14-1.80)
2.20 (1.81-2.71)
3.29 (2.67-4.08)
1.00
1.41 (1.13-1.79)
2.22 (1.77-2.74)
3.37 (2.64-4.29)
1.37 (0.71-2.60)
1.07 (0.82-1.39)
1.10 (0.81-1.49)
..
.
1.35 (0.70-2.63)
1.07 (0.81-1.42)
1.11 (0.81-1.42)
..
.
Ongoing Research
Here is the posterior distribution of the bias parameters
(βU , γ0 , γX )
Dotted lines indicate the prior distributions.
The prior and posterior distribution for bias are different.
This illustrates that Bayesian analysis using STAN software can
give a unique perspective about the bias and causality.
Can we infer causation in Large Healthcare
Databases?
• Conventional statistical methods are inadequate because
they do not model all the uncertainties.
• In massive dataset, confidence intervals and p-values are
crushed to zero indicating that everything is significant.
• We can do better, and we can build better statistical tools
that capture uncertainty.
• Bayesian methods are an important new tool for causal
inference in large healthcare databases and epidemiology
research.
Can we infer causation in Large Healthcare
Databases?
• Bayesian models can easily accommodate complex
models: 1) multilevel models, 2) missing data, 3)
unobserved variables such as an individual’s disease
status.
• The use of prior probability distributions represents a
powerful approach for incorporating information from
previous studies.
• Posterior probabilities are easier to interpret (e.g compared
to p values)
• Recent developments in Markov chain Monte Carlo
methodology facilitate the implementation of Bayesian
analyses of complex data sets.
References
McCandless LC, Gustafson P, Levy AR. (2007) Bayesian
sensitivity analysis for unmeasured confounding in
observational studies. Statistics in Medicine. 26:2331–47
Gustafson P (2014). Bayesian inference in partially identified
models: Is the shape of the posterior distribution useful?
Electronic Journal of Statistics 8, 476-496.
Gustafson P (2015) Bayesian inference for partially identified
models. Exploring the limits of limited data. CRC Press
McCandless LC, Gustafson P, Levy AR, Richardson S. (2012)
Hierarchical priors for bias parameters in Bayesian sensitivity
analysis for unmeasured confounding. Statistics in Medicine
31:383-96.
Greenland S (2006) Multiple bias modelling for analysis of
observational data. J Royal Stat Soc Ser A. 168-267-306.