Extensions of CCA and PLS to unravel relationships between two data sets S. Déjean(1) - I. González(2) - K-A. Lê Cao(3) 1. Institut de Mathématiques de Toulouse, UMR 5219 Université de Toulouse et CNRS [email protected] 2. Plateforme Biopuces, Genopôle Toulouse Midi-Pyrénées Institut National des Sciences Appliquées [email protected] 3. ARC Centre of Excellence in Bioinformatics Institute for Molecular Bioscience, University of Queensland, Australia [email protected] Conference 2009 – Rennes, July 8-10 History Once upon a time in Toulouse, a city in South West of France, two groups of scientists lived nearly together without talking to each other. But one day, they decided to do so and to work together. They had Ph.D students, wrote articles and built R packages... n 1 ∑X n i =1 i « Stat » DNA RNA ATGCC TACCAGT −1 X ' X X 'Y « Bio » ATGCC n 1 ∑X n i =1 i J. Stat. Soft. ofw UseR Conference 2009, Rennes, July 8-10 SAGMB J. Biol. Syst. BMC Bioinformatics CCA integrOmics S. Déjean, I. González, K-A Lê Cao – integrOmics 2 / 10 Why integrOmics ? Gen-omics Transcript-omics Prote-omics Biological system Each « -omics » data set can be studied separately, but ● A great part of relevant information can be extracted from joint analysis of 2 or several datasets, so ● ⇒ Integrate omics data project in short : Metabol-omics integrOmics math.univ-toulouse.fr/biostat Lipid-omics Methodology Applications Software ...-omics UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 3 / 10 Methodology Methodology Two ways to deal with the 'large p - small n' problem in the classical framework provided by Canonical Correlation Analysis and Partial Least Squares regression. n p q X Y CCA / regularization ● ● ● Maximize correlation between linear combination of variables in X and Y Requires inversion of XX' and YY' Regularization of CCA −1 PLS / selection ● ● Maximize covariance between linear combination of variables in X and Y Selection obtained through Lasso penalization of loading vectors −1 XX ' ⇒ XX ' X I n UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 4 / 10 Applications Applications E. Yergeau, S.A. Schoondermark-Stolk, E.L. Brodie, S. Déjean, T.Z. DeSantis, O. Gonçalves, Y.M. Piceno, G.L. Andersen and G.A. Kowalchuk (2008). Environmental microarray analyses of Antarctic soil microbial communities. The International Society for Microbial Ecology Journal, 3, 340-351 ● ● ● S. Combes, I. González, S. Déjean, A. Baccini, N. Jehl, H. Juin, L. Cauquil, B. Gabinaud, F. Lebas, C. Larzul (2008). Relationships between sensory and physicochemical measurements in meat of rabbit from three different breeding systems using canonical correlation analysis. Meat science, 80(3), 835-841 I. González, S. Déjean, P.G.P. Martin, O. Gonçalves, P. Besse, A. Baccini (2009). Highlighting Relationships Between Heterogeneous Biological Data Through Graphical Displays Based On Regularized Canonical Correlation Analysis. Journal of Biological Systems, 17(2), 173-199 ● UseR Conference 2009, Rennes, July 8-10 K. A. Lê Cao, D. Rossouw, C. Robert-Granié, P. Besse (2008). A sparse PLS for variable selection when integrating Omics data, Statistical Applications in Genetics and Molecular Biology, 7(1), article 35 S. Déjean, I. González, K-A Lê Cao – integrOmics 5 / 10 IntegrOmics R package UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics Software 6 / 10 Using integrOmics ● From X and Y two matrices ● Preliminary view of the correlations matrices Software R> imgCor(X, Y, type = "separate") ● Classical CCA R> res.rcc = rcc(X, Y) ● Regularized CCA R> res.rcc = rcc(X, Y, 0.05, 0.01) ● PLS R> res.pls = pls(X, Y) ● Sparse PLS R> res.pls = spls(X, Y, mode=c("regression", "canonical"), + keep.X=c(10, 10, 10), keep.Y=c(10, 10, 10)) UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 7 / 10 Graphical display R> plotVar(res.rcc, R> plotIndiv(res.rcc), + X.label=T,Y.label=T) + ind.names=nutrimouse$diet 1.5 1.0 fish C18.2n.6 C20.2n.6 fish fish C22.4n.6 1.0 LDLr 0.5 GSTpi2 SPI1.1 C20.5n.3 C22.6n.3 C22.5n.3 C18.0 apoA.I CYP4A10 HMGCoAred cHMGCoAS CYP3A11 C20.3n.9 PLTP FAS PMDCI CYP27a1 GK CPT2 AOX BSEP THIOL BIEN GSTa Lpin2 C16.0 HPNCL 0.0 0.5 Comp 1 Variables plot UseR Conference 2009, Rennes, July 8-10 sun lin lin -1.5 -0.5 lin lin lin sun sun sun ref coc refrefref ref lin ref sun sun sun sun ref ref coc -1.0 -1.0 fish -1.0 C14.0 C18.1n.7 C16.1n.7 fish fish lin lin fish -0.5 SR.BI C20.3n.3 FAT C18.3n.6 UCP2 MCAD C18.1n.9 CYP2c29 Dimension 2 C20.3n.6 C16.1n.9 -0.5 0.0 Ntcp ACOTH apoC3 fish C22.5n.6 0.0 0.5 C20.1n.9 C20.4n.6 Comp 2 Software 1.0 coc WT PPAR coccoc coc coc coc -2 -1 0 1 Dimension 1 Individuals plot S. Déjean, I. González, K-A Lê Cao – integrOmics 8 / 10 Future work Software 11 13 5 12 2 Provide new graphical display using graphs ● 6 3 4 9 8 1 14 7 0 10 p1 p2 p4 ● Methodologies to deal with more than 2 data sets 20 10 Functional statistics to deal with metabolomics or proteomics data 0 ● 30 40 n p3 0 UseR Conference 2009, Rennes, July 8-10 50 100 150 200 S. Déjean, I. González, K-A Lê Cao – integrOmics 9 / 10 Summary Ph.D K-A. Lê Cao (P. Besse, C. Robert-Granié) Ph.D I. González First steps of collaborations between biologists and statisticians in Toulouse around Omics technologies 2003 (A. Baccini, J.R. Léon) CCA ofw integrOmics First article « state of the art » 2004 UseR Conference 2009, Rennes, July 8-10 2005 Conference Rennes 2006 2007 2008 2009 S. Déjean, I. González, K-A Lê Cao – integrOmics 10 / 10
© Copyright 2025 Paperzz