Morteza Chalabi

Computational Approaches to Decipher Composition and Regulation of
Complexes by Large-scale Analysis of Mass Spectrometry (MS) Data
Morteza
1
Chalabi ,
Motivation
 Detailed knowledge about protein complexes is
necessary to understand almost all biochemical,
signaling and functional processes in the cell.
Competitive binding, protein interactions and posttranslational modifications control complex activity and
coordinate their assembly. Distinction of stable core
members from selectively aggregating proteins is
necessary to identify complex subunits controlling
behavior and composition.
1
Schwämmle ,
2
Vandin ,
Veit
Fabio
Ole N.
[email protected],
1: University of Southern Denmark
2: University of Padova, Italy
 We studied a large data collection of protein expression
profiles across more than 150 different body tissues and
cell lines measured by mass spectrometers. We
designed a novel statistical score model, CompSig,
which:
 Can predict if a list of proteins work as a
complex
 Can predict if individual proteins are part of a
complex
 Only relies on data available on the WEB
curated and federated by several research
institutions and organizations
 It uses advanced statistical modeling
techniques
10-Dec-2015
and powerful approach to multiple testing". Journal of the Royal Statistical
Society
Results
 CompSig was built using proteomicsDB1 database
which contains human proteins and body tissues or
cell lines in which they are expressed
 CompSig was tested and evaluated using both
CORUM2, contains known complexes, and non-CORUM
complexes
 CompSig could successfully predict many of the
CORUM complexes
 CompSig could predict several forms of CNOT complex
(aka Ccr4-NOT)3 successfully
Similar Projects
 Complex composition has been studied by various
protein-protein interaction network-based approaches
such as graph clustering and community detection.
However, these network-based approaches suffer from
the incompleteness of the interactome even in widely
studied organisms such as humans and mice and
therefore many protein complexes are still not identified
or well-characterized:
 Cerami, E., et al. (2012). The cBio Cancer
Genomics Portal: An Open Platform for
Exploring Multidimensional Cancer Genomics
Data. Cancer Discovery
 Gerster, S., et al. (2014). Statistical Approach to
Protein Quantification. Molecular & Cellular
Proteomics
 Havugimana, Pierre C., et al. (2012). A Census
of Human Soluble Protein Complexes. Cell
 Kikugawa, S., et al. (2012). PCDq: human
protein complex database with quality index
which summarizes different levels of evidences
of protein complexes predicted from HInvitational
protein-protein
interactions
integrative dataset. BMC Systems Biology
 Ori, A., et al. (2016). Spatiotemporal variation of
mammalian protein complex stoichiometries.
Genome Biol
1
Jensen
CompSig
 Given a list of proteins (LisP), it assigns each protein a
p-value
 To calculate p-values, it uses a protein-protein
interaction network (+17k nodes) in which two proteins
are connected if they are coexpressed in at least two
tissues. Edges are weighted accordingly (WeiT)
 Using the graph’s adjacency matrix thousands of
random complexes are generated
 For each random complex as many correlations,
Corr(C, P), as the number of proteins in it are calculated
 Corr(C, Pi): correlation of protein Pi‘s expression values
and complex C’s average expression value across WeiT
tissues
 Using the density curve of these correlations, |ListP|
empirical p-values are calculated
 P-values are adjusted using a False Discovery Rate
procedure1
 A protein is part of the complex if is significant given a
test level α
 A complex is significant if (a fraction of) all of its
members are significant
 Fisher’s combined p-value is calculated for |ListP|
p-values
Conclusion
 Detailed knowledge about protein complexes is
necessary to understand almost all biochemical,
signaling and functional processes in the cell
 Complex composition has been studied by various
protein-protein interaction network-based approaches
such as graph clustering and community detection
 We designed a novel statistical score model CompSig
that can predict if a list of proteins work as a complex
 CompSig could predict many of CORUM and nonCORUM complexes successfully
1.
2.
3.
Wilhelm, M., et al. (2014). Mass-spectrometry-based draft of the human proteome. Nature
Ruepp, A., et al. (2010). CORUM: the comprehensive resource of mammalian protein
complexes-2009. Nucleic Acids Research
Collart, M. A., et al. (2012). The Ccr4–Not complex. Gene