PPT

Observational Studies
Based on Rosenbaum (2002)
David Madigan
Rosenbaum, P.R. (2002). Observational Studies (2nd edition). Springer
Introduction
•A empirical study in which:
“The objective is to elucidate cause-and-effect
relationships in which it is not feasible to use
controlled experimentation”
•Examples:
•smoking and heart disease
•aspirin and mortality
•vitamin C and cancer survival
•cocaine and birthweight
•DES and vaginal cancer
•diet and mortality
Asthma Study
Sex
• Have data on 2,000 kids
• What is the effect of
tobacco experimentation
on asthma?
Male
Female
African American
Ethnicity
Asian
Hispanic
Other
White
Smoking at Home
Yes
No
Tobacco
Experimentation
Yes
Asthma Self-Diagnosis
Yes
No
No
Asthma ISAAC
Yes
No
Cameron and Pauling Vitamin C
•Gave Vitamin C to 100 terminally ill cancer patients
•For each patient found 10 controls matched for age, gender,
cancer site, and tumor type
•Vitamin C patients survived four times longer than controls
•Later randomized study found no effect of vitamin C
•Turns out the control group was formed from patients
already dead…
LESSONS:
- observational studies are tricky
- randomized study is the gold standard
why?
Why does randomization work?
•The two groups are comparable at baseline
•Could do a better job manually matching patients on 18
characteristics listed, but no guarantees for other
characteristics
•Randomization did a good job without being told what the
18 characteristics were
•Chance assignment could create some imbalances but the
statistical methods account for this properly
The Hypothesis of No Treatment Effect
• In a randomized experiment, can test this hypothesis
essentially without making any assumptions at all
• “no effect” formally means for each patient the outcome
would have been the same regardless of treatment
assignment
• Test statistic, e.g., proportion (D|TT)-proportion(D|PCI)
TT
TT
PCI
PCI
D
D
L
L
observed
TT
PCI
TT
PCI
D
D
TT
L
L
PCI
PCI
TT
D
D
PCI
L
L
TT
TT
PCI
D
D
PCI
L
L
PCI
TT
TT
D
D
PCI
L
L
TT
PCI
TT
D
D
L
L
P=1/6
Estimates, etc.
• Note: the probability distribution needed for
the test is known, not assumed or modeled
• Randomized experiment provides unbiased
estimator of the average treatment effect
• Internal versus external validity
• Confidence intervals by inverting tests
• Partially ordered outcomes, censoring,
multivariate outcomes, etc.
Overt Bias in Observational Studies
“An observational study is biased if treatment and
control groups differ prior to treatment in ways
that matter for the outcome under study”
Overt bias: a bias that can be seen in the data
Hidden bias: involves factors not in the data
Can adjust for overt bias…
Overt Bias

xj
covariate vector
M units, j=1,…,M
Z j treatment (assume binary 0 or 1). pj =Pr(Zj=1)
M
Pr( Z1  z1 ,, Z M  z M )  p (1  p j )
zj
j
zj
j 1
unknown
An OS is free of hidden bias if the pj’s are known to

depend only on the x j ’s (i.e., p j   ( x j ) )
unknown
(so two units with same x have same prob of
getting the treatment)
Stratifying on x
• Suppose can group units into strata with
identical x’s. Then:
S
 
Pr( Z  z )   mss (1  s ) ns  ms
s 1

• Conditional on ms  i zsi all Z’s are equally
likely…just like in a uniform randomized
experiment
Stratifying on the Propensity Score
• Obviously exact matching not always possible
• Idea: form strata comprising units with the same
’s ( i.e. could have xsi  xsj but  ( xsi )   ( xsj ) )
• Problem: don’t know the ’s
• Solution: estimate them (logistic regression,
SVM, decision tree, etc.)
• Form strata containing units with “similar”
probability of treatment
Matched Analysis
Using a model with 29 covariates to predict VHA use, we were able to
obtain an accuracy of 88 percent (receiver-operating-characteristic curve
0.88) and to match 2265 (91.1 percent) of the VHA patients to Medicare
patients. Before matching, 16 of the 29 covariates had a standardized
difference larger than 10 percent, whereas after matching, all
standardized differences were less than 5 percent
Conclusions VHA patients had more coexisting conditions than
Medicare patients. Nevertheless, we found no significant
difference in mortality between VHA and Medicare patients, a
result that suggests a similar quality of care for acute myocardial
infarction.
What about hidden bias?
• Sensitivity analysis!
• Consider two units j and k with the same x.
hidden bias  they may not have the same p
• Consider this inequality:
1 p j (1  p k )


 p k (1  p j )
• Sensitivity analysis will consider various ’s
An equivalent latent variable model
 pj 



log
  ( x j )  u j , 0  u j  1
1p 
j 

for two units j and k with the same x:
p j (1  p k )
 exp{ (u j  uk )}

p k (1  p j )
between –1 and 1
p j (1  p k )
 exp(  ) 
 exp( )
p k (1  p j )
so the model implies the previous inequality with   exp( )
(implication goes the other way too)
Matched Pairs
• Strata of size 2, one gets the treatment, one doesn’t
z
s1
S
 




exp(u s1 )
exp(u s 2 )
Pr( Z  z )   
 

s 1  exp( u s1 )  exp( u s 2 ) 
 exp(u s1 )  exp(u s 2 ) 
zs 2
• If =0, every unit has the same chance of treatment
• Standard test statistic for matched pairs is:
S
2
 
T  t ( Z , r )   d s  csi Z si
s 1
rank of rs1  rs 2
i 1
Wilcoxon
rank sum test
cs1  1 if rs1  rs 2 and 0 otherwise
sum of the ranks for pairs in which treated unit > control unit
More on Matched Pairs
S
2
 
T  t ( Z , r )   d s  csi Z si
s 1
i 1
• No hidden bias => know the null distribution of T
because sth pair contributes ds with prob ½ and 0 with
prob ½
• with hidden bias, the sth pair contributes ds with prob:
cs1 exp(u s1 )  cs 2 exp(u s 2 )
ps 
exp(u s1 )  exp(u s 2 )
and zero with prob 1-ps
• so null distribution of T is unknown…
Even More on Matched Pairs
• easy to see that:
ps
1

 ps 
1 
1 
ps
• The P-value we are after is Pr(T  Tobs )
• Lower bound on P-value: Pr(T   Tobs ) where
T- is the sum of S quantities, the sth one

p
being ds with prob s and 0 otherwise

p
• Upper bound likewise using s
• This directly provides bounds on P-values for
fixed 
Smoking & Lung Cancer Example
• Hammond (1964) paired 36,975 heavy
smokers to non-smokers. Matched on age,
race, plus 16 other factors

Minimum
Maximum
1
< 0.0001
< 0.0001
2
< 0.0001
< 0.0001
3
< 0.0001
< 0.0001
4
< 0.0001
0.0036
5
< 0.0001
0.03
6
< 0.0001
0.1
Asthma Study
• Need a  of three to make the effect of
tobacco experimentation on asthma become
non-significant