Diapos / Slides

A Constrained Instrumental Variable Approach,
and its Applica tion to Mendelian
Randomiza tion with Pleiotropy
Lai J iang (Lady Davis Ins titute & McGill),
Karim Oualkacha (UQAM),
Celia Greenwood (Lady Davis Ins titute & McGill)
Montreal Caus al Inference Works hop, J uly 2016
Outline
1. The problem
2. Solutions a nd es tima tors
3. Simula tions – s imple exa mple
4. Simula tions - a n exa mple borrowed from rea lity
5. Fina lly …
2
The Problem
3
Instrumental Variables
We a re interes ted in the ca us a l effect of X (phenotype) on Y (dis ea s e)
U: unobs erved va ria ble tha t ma y confound the a s s ocia tion between X a nd Y
G: a va ria ble (genotype) we wa nt to us e a s a n ins trumenta l va ria ble
U
DAG:
G
X
Y
4
Core Conditions for Instrumental Variable analysis
U
G
X
Y
1) G independent of U
2) G not marginally independent of X
3) G and Y are independent conditional on X and U
5
Potential violations of assumptions in MR
G2
Linka ge Dis equilibrium
U
(Condition 3 or Condition 1)
G1
X
Y
Genetic Heterogeneity (Wea k IV)
G3
Popula tion Stra tifica tion (Condition 3)
P
G1
G2
U
X
U
X
Y
G1
Y
6
Pleiotropy
A SNP (G) is associated with another intermediate phenotype (Z, other than X)
which also has an effect on the disease Y.
(a)
G
•
•
(b)
U
Z
X
Y
G
Z
U
X
Y
Condition (3) – Y and G independent given X and U – is violated in (a).
Condition (1) – G and U independent – is violated in (b).
7
(Original) Problem
Estimators
9
Causal effect estimation:
IV assumptions satisfied
One SNP
U
η
G
δ
X
β
ρ
Y
In the presence of pleiotropy
Z
ω
G
δ
ξ
U
η
X
β
ρ
Y
In the presence of pleiotropy
U
Z
G
X
Y
13
Potential solutions when there is pleiotropy
1. Na ïve Method (TSLS us ing a ll SNPs )
{Bia s ed} [Ba um 2003]
2. Adjus t for G for Z, us e Gres a s IV
{Unbia s ed}
3. Limited Informa tion Ma ximum Likelihood. (LIML) {Bia s ed} [Burges s 2013]
4. Stepwis e Selection Methods -- s elect s ubs et of G {Incons is tent} [Ba i 2008]
5. Invers e Proba bility Weights
[Cole 2008].
6. Cons tra ined Ins trumenta l Va ria bles
14
1. Naïve method in practice (#1, #2)
1. Select SNPs s trongly a s s ocia ted with phenotype of interes t X
2. Remove SNPs s trongly a s s ocia ted with pleiotropic phenotypes Z
3. Remove wea k SNPs to a void overfitting & bia s
4. If the res ulting ins trumenta l va ria ble G s till ha s a s trong pleiotropic effect,
cons ider regres s ing G|Z a nd us e Gres ins tea d of G (ie. Method # 2)
5. TSLS model fitting us ing G (or Gres) -> X -> Y
3. LIML
Limited Informa tion Ma ximum Likelihood us es ins truments to rectify bia s in a
regres s ion Y ~ X, a ris ing when X is correla ted with res idua ls
•
i.e. Pleiotropy lea ds to correla tion between X a nd Z
LIML ta kes into a ccount the cova ria nces of the errors
More s ta ble to wea k ins trumenta l va ria ble bia s
•
Z is not us ed
16
4. Stepwise Selection Methods.
Im plem entation details a re crucial here
Step by s tep methods (s uch a s (Ba i, 2008)):
1. Specific criteria (s ignificance
thres holds ) for s election a nd deletion?
2. How to s um m arize the inform ation in
the final s et S?
PCA? PLS? Etc…
G: Construct a new IV
based on a subset S
3. Stop s trategy?
4. Starting S (previous publis hed
s ources )?
5. Backward? Forward? Bidirectional?
Select new features into S
With respect to X | G
Delete features in S with
respect to Z | G
17
5. Inverse Probability Weight Adjusted Regression (IPWAR)
18
Constrained Instrumental Variable Method
19
Constrained Instrumental Variable
U
Z
G
X
Y
20
Constrained Instrumental Variable
-
21
Constrained Instrumental Variable
• When 𝑝 < 𝑛 there is a n exa ct s olution
• In fa ct there a re 𝑝 − 𝑘 orthogonal s olutions
• 𝐺𝑟𝑟𝑟 = 𝐼𝑛 − 𝑃𝑍 𝐺, the projection of 𝐺 to s ubs pa ce orthogona l to columns of 𝑍
•
We s ea rch for a direction in Spa n(𝐺), 𝐺 ∗ , m os t inform a tive for 𝑋, orthogona l to 𝑍
•
𝐺 ∗ belongs to Spa n(𝐺𝑟𝑟𝑟 )
•
Com plem ent is not inform a tive for X
•
𝑆𝑆𝑆𝑆 𝐺𝑟𝑟𝑟 = 𝑆𝑆𝑆𝑆 𝐺 ∗ + 𝑆𝑆𝑆𝑆(𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑜𝑜 𝐺 ∗ 𝑖𝑖 𝐺𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠)
• Hence, CIV ha s identica l performa nce to s tra ightforwa rd 𝐺𝑟𝑟𝑟
Constrained Instrumental Variable
Wha t a bout when 𝑝 > 𝑛?
23
Penalized Constrained Instrumental Variable
Penalization
L0 penalty
LASSO penalty
Non-convex Penalties
…...
24
Which penalty?
• LASSO (L1) pena lty does
not a lwa ys yield a s pa rs e
s olution a utoma tically
for this problem.
• Neither do non-convex
pena lties (e.g. SCAD)
25
L0 penalty
• L0 pena lty: 𝑢
0
= ∑𝑗 𝑢𝑗
0
< 𝜆 directly enforces s pa rs ity
• However, greedy s ea rch for optima l 𝑢, 𝜆 is 𝑛𝑛-ha rd in computa tional
complexity
• Alterna tively, cons ider s moothed a pproxima te L0 pena lties a s 𝜎 → 0:
So 𝑢
0
𝑥2
1, 𝑥 = 0
𝑓𝜎 𝑥 = 𝑒𝑒𝑒 − 2 → �
0, 𝑥 ≠ 0
2𝜎
≈ 𝑝 − ∑𝑗 𝑓𝜎 (𝑢𝑗 )
Implementation
Two methods :
1. If 𝐩 < 𝐧, linea r a lgebra s olution
2. If 𝐩 > 𝐧, we propos e a numerica l s olution for pena lized CIV
Start from an initial gues s 𝑢, and an initial penalty 𝜎 = max |𝑢|
Iteratively decreas e 𝜎 → 𝜎𝑚𝑚𝑚 along a chos en s equence
Cros s validation to choos e 𝜆 for each 𝜎 bas ed on MSE of (𝑌|𝐺, 𝑢)
3. Boots tra p for s ta nda rd errors
27
Simulations
28
Methods compared
• Naive TSLS
• Adjusting each G for Z, then TSLS
• LIML
• Stepwise Selection Methods (forward, backward)
• Inverse Probability Weight Adjusted Regression (IPWAR)
• CCA/ sparseCCA
• CIV (Constrained Instrumental Variable)
• Smoothed CIV
29
Bias as a function of 𝜉
•
•
•
•
•
•
N=200
20 SNPs
5 SNPs as s ociated with X.
Am ong thes e 5:
2 SNPs als o as s ociated with Z
𝛽: Effect 𝑋 → 𝑌
Conditional TSLS/ G res iduals
CCA, CCA_LASSO
LIML
Backward/ Forward
IPWAR
CIV/Smoothed CIV
Bias and standard error
𝝃=1
𝝃=5
Estimate
Mean square
error
Estimate
Mean square
error
G residual
1.09
0.01
1.08
0.01
IPWAR
2.00
1.21
15.30
2.20
CIV
1.09
0.01
1.09
0.02
CIV smooth
1.01
0.01
1.05
0.01
SNPs selected over simulations
Method
SNP1
SNP2
SNP3
SNP4
SNP5
Others
CIV
0.76
0.76
1.00
1.0
1.0
~0.2
Smoothed
CIV
0.00
0.00
0.29
0.62
0.29
0.0
Note: Applied hard threshold: 0.1*max(estimate)
SNPs selected
CIV
Smoothed CIV
Contrived Example
LDL-HDL-CHD
34
HDL/ LDL Simulation
HDL/ LDL Simulation
Bias
Bootstrap
variance
Lower_CI
Upper_CI
-0.090
0.025
-0.145
-0.034
0.004
0.027
-0.050
0.059
-0.112
0.027
-0.169
-0.066
0.066
0.025
0.011
0.121
-0.008
0.042
-0.094
0.079
0.006
0.055
-0.097
0.109
CCA_LASSO
-0.090
0.030
-0.147
-0.033
CCA
-0.090
0.028
-0.147
-0.032
CIV
0.002
0.031
-0.052
0.057
-0.0002
0.031
-0.067
0.066
Naive
G residuals
LIML
IPWAR
Backward
Forward
TRUE
Finally…
37
Comments
• The s moothing CIV s elects SNPs a s s ocia ted with X a nd not Z in a non-greedy
s ea rch for the optima l s ubs et
•
G_res , CCA etc s elect SNPs differently a nd will keep overla pping SNPs
• For 𝑝 > 𝑛 or 𝑝 la rge
•
CIV will work
•
G_res idua ls will not work or will becom e uns ta ble
• 𝑝 > 𝑛 is a very unlikely Mendelia n ra ndomiza tion s cena rio
Questions
• Other a pplica tions of the s moothed L0 pena lty?
• We crea te one ins trument
•
Wha t a bout crea ting s evera l ins trum ents through the s a m e kinds of projections (p-k)?
• Links to
•
Ta rgeted lea rning
•
Multiple inva lid ins trum ents (a vera ges of wea k ins trum ents )
(New) problem
• Computa tiona lly efficient va ria ble s election s ubject to linea r cons tra ints
Thank you!
Acknowledgements:
CIHR
Ludmer Centre for Neuroinforma tics a nd Menta l Hea lth
Brent Richa rds & his group
41