Computational Approaches to Decipher Composition and Regulation of Complexes by Large-scale Analysis of Mass Spectrometry (MS) Data Morteza 1 Chalabi , Motivation Detailed knowledge about protein complexes is necessary to understand almost all biochemical, signaling and functional processes in the cell. Competitive binding, protein interactions and posttranslational modifications control complex activity and coordinate their assembly. Distinction of stable core members from selectively aggregating proteins is necessary to identify complex subunits controlling behavior and composition. 1 Schwämmle , 2 Vandin , Veit Fabio Ole N. [email protected], 1: University of Southern Denmark 2: University of Padova, Italy We studied a large data collection of protein expression profiles across more than 150 different body tissues and cell lines measured by mass spectrometers. We designed a novel statistical score model, CompSig, which: Can predict if a list of proteins work as a complex Can predict if individual proteins are part of a complex Only relies on data available on the WEB curated and federated by several research institutions and organizations It uses advanced statistical modeling techniques 10-Dec-2015 and powerful approach to multiple testing". Journal of the Royal Statistical Society Results CompSig was built using proteomicsDB1 database which contains human proteins and body tissues or cell lines in which they are expressed CompSig was tested and evaluated using both CORUM2, contains known complexes, and non-CORUM complexes CompSig could successfully predict many of the CORUM complexes CompSig could predict several forms of CNOT complex (aka Ccr4-NOT)3 successfully Similar Projects Complex composition has been studied by various protein-protein interaction network-based approaches such as graph clustering and community detection. However, these network-based approaches suffer from the incompleteness of the interactome even in widely studied organisms such as humans and mice and therefore many protein complexes are still not identified or well-characterized: Cerami, E., et al. (2012). The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery Gerster, S., et al. (2014). Statistical Approach to Protein Quantification. Molecular & Cellular Proteomics Havugimana, Pierre C., et al. (2012). A Census of Human Soluble Protein Complexes. Cell Kikugawa, S., et al. (2012). PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from HInvitational protein-protein interactions integrative dataset. BMC Systems Biology Ori, A., et al. (2016). Spatiotemporal variation of mammalian protein complex stoichiometries. Genome Biol 1 Jensen CompSig Given a list of proteins (LisP), it assigns each protein a p-value To calculate p-values, it uses a protein-protein interaction network (+17k nodes) in which two proteins are connected if they are coexpressed in at least two tissues. Edges are weighted accordingly (WeiT) Using the graph’s adjacency matrix thousands of random complexes are generated For each random complex as many correlations, Corr(C, P), as the number of proteins in it are calculated Corr(C, Pi): correlation of protein Pi‘s expression values and complex C’s average expression value across WeiT tissues Using the density curve of these correlations, |ListP| empirical p-values are calculated P-values are adjusted using a False Discovery Rate procedure1 A protein is part of the complex if is significant given a test level α A complex is significant if (a fraction of) all of its members are significant Fisher’s combined p-value is calculated for |ListP| p-values Conclusion Detailed knowledge about protein complexes is necessary to understand almost all biochemical, signaling and functional processes in the cell Complex composition has been studied by various protein-protein interaction network-based approaches such as graph clustering and community detection We designed a novel statistical score model CompSig that can predict if a list of proteins work as a complex CompSig could predict many of CORUM and nonCORUM complexes successfully 1. 2. 3. Wilhelm, M., et al. (2014). Mass-spectrometry-based draft of the human proteome. Nature Ruepp, A., et al. (2010). CORUM: the comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Research Collart, M. A., et al. (2012). The Ccr4–Not complex. Gene
© Copyright 2026 Paperzz