IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES YOO-AH KIM NIH / NLM / NCBI Nov. 6th, 2010 Complex Diseases Associated with the effects of multiple genes As opposed to single gene diseases The combination of genomic alteration may vary strongly among different patients Dysregulating the same components, thus often leading to the same disease phenotype Difficult to study and Treat Cancer, Heart diseases, Diabetes, etc. Copy Number Variations Two copies of each gene are generally assumed to be present in a genome Genomic regions may be deleted or duplicated causing CNV Some CNVs are associated with susceptibility or resistance to diseases such as cancer Copy Number Variations in 158 Glioblastoma patients Identifying Genomic Causes in Complex Diseases Identify genotypic causes in individual patients as well as dysregulated pathways Systems biology approach Genome-wide search Graph theoretic algorithms Circuit flow Set cover 158 Glioblastoma multiforme patients Glioblastoma multiforme (GBM) the most common and most aggressive type of primary brain tumor in humans Expression as Quantitative Trait Genotype: Copy number variations Phenotype: Gene expression eQTL (expression Quantitative Trait Loci) Analysis Putative causal gene/loci While Putative target gene we assume that the genetic variation is the cause and expression change is the effect, we don’t know molecular pathways behind the relation Method Outline Gene expression B. eQTL Find association between expression and copy number g s 1 1 Molecular interactions Candidate causal genes D. Causal gene selection Weighted multiset cover s g g 2 3g 2 s s3 4s m n C causal genes target Gene gm tag TF-DNA SNP sn phosphoryl. event C. Circuit flow algorithm cases tag loci B cases target genes A. Target gene selection A +- D causal genes cases proteinprotein Target Gene Selection Select a representative set of disease genes Gene Expression Gene 1 Gene 2 Gene 3 Filter differentially expressed genes for each case Multi-set cover . . . . . Controls Disease Cases eQTL Associations between the expression of target genes and copy number variations of genomic loci cases Linear tag Loci target genes regression For every pair of tag loci and target genes cases Finding Candidate Causal Genes Genotypic Variations Target Genes Finding Candidate Causal Genes Genotypic Variations Candidate Genes C1 C2 C3 C4 C5 ? Target Genes Finding Candidate Causal Genes Genotypic Variations Candidate Genes C1 C2 C3 C4 C5 protein-protein interactions phosphorylation events transcription factor interactions. Interaction Network Target Genes Finding Candidate Causal Genes Genotypic Variations Candidate Genes Interaction Network C1 C2 C3 C4 C5 u v Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2 + Current flow Target Genes Finding Candidate Causal Genes Genotypic Variations Candidate Genes Interaction Network C1 C2 C3 C4 C5 - + Compute the amount of current entering each causal gene by solving a system of linear equations Current flow Target Genes Method Outline Gene expression B. eQTL Find association between expression and copy number g s 1 1 Molecular interactions Candidate causal genes D. Causal gene selection Weighted multiset cover s g g 2 3g 2 s s3 4s m n C causal genes target Gene gm tag TF-DNA SNP sn phosphoryl. event C. Circuit flow algorithm cases tag loci B cases target genes A. Target gene selection A +- D causal genes cases proteinprotein Final Causal Gene Selection causal genes A putative causal gene explains a disease case if • its corresponding tag locus has a copy number alteration • its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case cases Final Causal Gene Selection causal genes A putative causal gene explains a disease case if • its corresponding tag locus has a copy number alteration • its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case cases Final Causal Gene Selection causal genes A putative causal gene explains a disease case if • its corresponding tag locus has a copy number alteration • its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case WEIGHT cases Final Causal Gene Selection Find a smallest set of genes covering (almost) all cases at least k’ times minimum weighted multi-set cover Dysregulated Pathways Causal paths between a target and a causal gene a maximum current path C1 C2 C3 C4 C5 Selected Causal Genes Number of Genes Overlap with GBM genes 16056 0.56 (75) Step C: Circuit flow 701 0.045 (10) Step D: Set cover 128 4.7 10-4 (6) Step B: eQTL Results 701 candidate causal gene from circuit flow algorithm (STEP C) 128 causal genes from set cover (STEP D) Causal Genes The selected causal gene set includes many known cancer implicated genes P-value Genes Glioma 0.008 PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN Cell cycle 0.028 MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1 p53 signaling pathway 0.030 CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN Proteasome 0.026 PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4 Functional analysis using DAVID BSOSC Review, November 2008 PTEN as causal gene TF-DNA proteinprotein TF kinase fold change - 0 + causal genes EGFR as causal and target gene Causal EGFR TF kinase fold change - 0 + causal genes phosphorylation TF-DNA proteinprotein Target EGFR Conclusion A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate nodes on molecular pathways Our method can be applied to any disease system where genetic variations play a fundamental causal role Acknowledgements Teresa M. Przytycka Stefan Wuchty Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng Method Outline Gene expression B. eQTL Find association between expression and copy number g s 1 1 Molecular interactions Candidate causal genes D. Causal gene selection Weighted multiset cover s g g 2 3g 2 s s3 4s m n C causal genes target Gene gm tag TF-DNA SNP sn phosphoryl. event C. Circuit flow algorithm cases tag loci B cases target genes A. Target gene selection A +- D causal genes cases proteinprotein causal EGFR TF kinase fold change - 0 + causal genes phosphorylation TF-DNA proteinprotein target EGFR CAUSAL PATHS EGFR as causal and target gene CAUSAL PATHS PTEN as causal gene TF-DNA proteinprotein TF kinase fold change - 0 + causal genes Our Method Integrate several types of data Gene expression Copy number variations Molecular interactions Methods and Results Method model the expression change of disease genes as a function of genomic alterations translated the propagation of information from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions. multi-set cover: select most prominent genes causal genes disease gene gm +- Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes tag SNP sn
© Copyright 2026 Paperzz