slide - KOCSEA

IDENTIFYING CAUSAL GENES AND
DYSREGULATED PATHWAYS IN
COMPLEX DISEASES
YOO-AH KIM
NIH / NLM / NCBI
Nov. 6th, 2010
Complex Diseases

Associated with the effects of multiple genes

As opposed to single gene diseases
The combination of genomic alteration may
vary strongly among different patients

Dysregulating the same components, thus often
leading to the same disease phenotype
 Difficult to study and Treat


Cancer, Heart diseases, Diabetes, etc.
Copy Number Variations
Two copies of each gene are generally assumed to
be present in a genome
 Genomic regions may be deleted or duplicated
causing CNV
 Some CNVs are associated with susceptibility or
resistance to diseases such as cancer

Copy Number Variations in 158 Glioblastoma patients
Identifying Genomic Causes in
Complex Diseases
Identify genotypic causes in individual patients as
well as dysregulated pathways




Systems biology approach
Genome-wide search
Graph theoretic algorithms



Circuit flow
Set cover
158 Glioblastoma multiforme patients
Glioblastoma multiforme (GBM)
the most common and most aggressive type of
primary brain tumor in humans

Expression as Quantitative Trait
Genotype:
Copy number variations
Phenotype:
Gene expression
eQTL (expression Quantitative Trait Loci)
Analysis
Putative causal gene/loci
 While
Putative target gene
we assume that the genetic variation is the
cause and expression change is the effect, we
don’t know molecular pathways behind the
relation
Method Outline
Gene expression
B. eQTL

Find association between
expression and copy
number
g
s
1
1
Molecular interactions
Candidate causal genes
D. Causal gene selection

Weighted multiset cover
s
g
g
2
3g
2
s
s3
4s
m
n
C
causal
genes
target
Gene gm
tag
TF-DNA
SNP sn
phosphoryl.
event
C. Circuit flow algorithm


cases
tag loci

B
cases
target genes
A. Target gene selection
A
+-
D
causal
genes
cases
proteinprotein
Target Gene Selection

Select a representative set of
disease genes
Gene Expression
Gene 1
Gene 2
Gene 3
 Filter
differentially expressed genes
for each case
 Multi-set cover
.
.
.
.
.
Controls
Disease Cases
eQTL

Associations between the expression of target
genes and copy number variations of genomic
loci
cases
 Linear
tag Loci
target genes
regression
 For every pair of tag loci and
target genes
cases
Finding Candidate Causal Genes
Genotypic Variations
Target Genes
Finding Candidate Causal Genes
Genotypic Variations
Candidate Genes
C1
C2
C3
C4
C5
?
Target Genes
Finding Candidate Causal Genes
Genotypic Variations
Candidate Genes
C1
C2
C3
C4
C5
protein-protein interactions
phosphorylation events
transcription factor interactions.
Interaction Network
Target Genes
Finding Candidate Causal Genes
Genotypic Variations
Candidate Genes
Interaction Network
C1
C2
C3
C4
C5
u
v
Resistance (u, v) is set to be reversely proportional to
(|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2
+
Current flow
Target Genes
Finding Candidate Causal Genes
Genotypic Variations
Candidate Genes
Interaction Network
C1
C2
C3
C4
C5
-
+
Compute the amount of current entering each causal gene
by solving a system of linear equations
Current flow
Target Genes
Method Outline
Gene expression
B. eQTL

Find association between
expression and copy
number
g
s
1
1
Molecular interactions
Candidate causal genes
D. Causal gene selection

Weighted multiset cover
s
g
g
2
3g
2
s
s3
4s
m
n
C
causal
genes
target
Gene gm
tag
TF-DNA
SNP sn
phosphoryl.
event
C. Circuit flow algorithm


cases
tag loci

B
cases
target genes
A. Target gene selection
A
+-
D
causal
genes
cases
proteinprotein
Final Causal Gene Selection
causal genes
A putative causal gene explains a
disease case if
• its corresponding tag locus has a copy
number alteration
• its affected target genes (i.e., genes
sending a significant amount of current to
the causal gene) are differentially
expressed in the disease case
cases
Final Causal Gene Selection
causal genes
A putative causal gene explains a
disease case if
• its corresponding tag locus has a copy
number alteration
• its affected target genes (i.e., genes
sending a significant amount of current to
the causal gene) are differentially
expressed in the disease case
cases
Final Causal Gene Selection
causal genes
A putative causal gene explains a
disease case if
• its corresponding tag locus has a copy
number alteration
• its affected target genes (i.e., genes
sending a significant amount of current to
the causal gene) are differentially
expressed in the disease case
WEIGHT
cases
Final Causal Gene Selection

Find a smallest set of genes covering (almost) all cases at least k’
times
 minimum weighted multi-set cover
Dysregulated Pathways

Causal paths between a target and a causal gene

a maximum current path
C1
C2
C3
C4
C5
Selected Causal Genes
Number of Genes
Overlap with GBM genes
16056
0.56 (75)
Step C: Circuit flow
701
0.045 (10)
Step D: Set cover
128
4.7  10-4 (6)
Step B: eQTL
Results
701 candidate causal gene from circuit
flow algorithm (STEP C)
128 causal genes from set cover (STEP D)
Causal Genes

The selected causal gene set includes many known cancer
implicated genes
P-value
Genes
Glioma
0.008
PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN
Cell cycle
0.028
MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1
p53 signaling pathway
0.030
CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN
Proteasome
0.026
PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4
Functional analysis using DAVID
BSOSC Review, November 2008
PTEN as causal gene
TF-DNA
proteinprotein
TF
kinase
fold change
- 0 +
causal
genes
EGFR as causal and target gene
Causal EGFR
TF
kinase
fold change
- 0 +
causal
genes
phosphorylation
TF-DNA
proteinprotein
Target EGFR
Conclusion

A novel computational method to simultaneously
identify causal genes and dys-regulated pathways
 Circuit

flow algorithm
 Multi-set cover
Augmentation of eQTL evidence with interaction
information resulted in a very powerful approach
 uncover
potential causal genes as well as intermediate
nodes on molecular pathways

Our method can be applied to any disease system
where genetic variations play a fundamental causal
role
Acknowledgements
Teresa M. Przytycka
 Stefan Wuchty


Other group members
 Dong
Yeon Cho
 Yang Huang
 Damian Wojtowicz
 Jie Zheng
Method Outline
Gene expression
B. eQTL

Find association between
expression and copy
number
g
s
1
1
Molecular interactions
Candidate causal genes
D. Causal gene selection

Weighted multiset cover
s
g
g
2
3g
2
s
s3
4s
m
n
C
causal
genes
target
Gene gm
tag
TF-DNA
SNP sn
phosphoryl.
event
C. Circuit flow algorithm


cases
tag loci

B
cases
target genes
A. Target gene selection
A
+-
D
causal
genes
cases
proteinprotein
causal EGFR
TF
kinase
fold change
- 0 +
causal
genes
phosphorylation
TF-DNA
proteinprotein
target EGFR
CAUSAL PATHS
EGFR as causal and target gene
CAUSAL PATHS
PTEN as causal gene
TF-DNA
proteinprotein
TF
kinase
fold change
- 0 +
causal
genes
Our Method

Integrate several types of data

Gene expression

Copy number variations

Molecular interactions
Methods and Results

Method




model the expression change of disease
genes as a function of genomic alterations
translated the propagation of information
from a potential causal to a disease gene
as the flow of electric current through a
network of molecular interactions.
multi-set cover: select most prominent
genes
causal
genes
disease
gene gm
+-
Validated our approach by testing the enrichment of selected causal
genes with known GBM/Glioma related genes
tag
SNP
sn