Metabolic phenotypic analysis uncovers reduced proliferation

Metabolic phenotypic analysis uncovers reduced
proliferation associated with oxidative stress in
progressed breast cancer
Supplementary Material
ROBUSTNESS ANALYSIS
There are two main parameters that can affect MPA results: α and β (Materials and
Methods). The sensitivity of MPA to the setting of these parameters was tested by
randomly choosing a representative subset of the clinical samples from [1]. The
subset contained 30 samples, five from each subtype and grade. MPA profiles were
computed for these samples with different α (0.3, 0.35, 0.4, 0.45, 0.55, 0.6, 0.65) and
β (0.5, 0.6, 0.7, 0.8, 0.95, 0.99) settings. We computed the Spearman correlation
between the different profiles obtained for (1) each sample (Supplementary Figure
1a,c), and (2) each metabolic process (Supplementary Figure 1b,d), where an MPA
profile of a process consists of the MPA scores it obtained across the different
samples. The average correlation coefficients obtained per sample and per process are
0.9936 ± 0.0044, and 0.8586 ± 0.0928, when varying α, and 0.9068 ± 0.0986, and
0.9105 ± 0.1048 when varying β, respectively, testifying to the methods considerable
robustness.
1
Supplementary Figure 1. Robustness analysis. The Spearman correlation between the MPA
profiles obtained with varying α (a-b), and β (c-d) parameters. The correlations were
computed per sample (a, c), and per metabolic process (b, d).
2
SMALL-SCALE EXAMPLE
Supplementary Figure 2. MPA example. An example of MPA computation based on a
metabolic model and gene-expression or protein abundance measurements. Circular nodes
represent metabolites, whereas diamond nodes represent enzymes. For enzymes, gray, red
and green represent normal, significantly high and significantly low expression of the
enzyme or enzyme-encoding genes, respectively. Similarly, for reactions, those that are
predicted as active, inactive or undetermined are colored red, green, or gray, respectively.
Solid edges represent metabolic reactions. Broken edges associate enzymes with the
reactions they catalyze. The optimal solution obtained for a given sample with (B, D) or
without (A, C) the additional constraint to activate the metabolic process of M9 production
is shown. The score of the solution is the sum of reactions that are consistent with their
expression state, which is what we aim to maximize. The MPA score of a given sample and a
metabolic process is then the ratio between the optimal scores obtained with and without
the process activity. As each sample has its unique expression profile, different MPA scores
are obtained: The M9 production MPA score of sample 1 is smaller than that of sample 2, as
its expression is less consistent with M9 production (i.e., enzymes E5 and E7 are lowly
expressed).
3
MPA VALIDATION
MPA captures known metabolic differences across different human tissues
To evaluate whether MPA can correctly capture metabolic differences it was applied
to deduce the metabolic phenotypes of three different human tissues: the liver, muscle
and adipose tissue. The metabolic capabilities of these tissues are quite different and
some of them are relatively well-known. For each tissue, MPA was applied to analyze
a few dozen gene expression profiles, [2, 3] generating MPA profiles that span an
array of relevant metabolic processes (Supplementary Table 2).
The liver is the main metabolic organ of the human body. It is the central site of
various metabolic processes, including the biosynthesis of key metabolites,
carbohydrate metabolism, and the urea cycle [4-6]. Indeed, according to MPA, all
metabolic processes are potentially more active in liver compared to the adipose
tissue. Typical liver metabolic processes, as the urea cycle, have a significantly higher
potential to be active in the liver compared to the muscle and adipose tissue
(Wilcoxon p-value of 9.134e-10 and 1.012e-07, respectively). Interestingly, despite
the role of adipocytes in fat storage, fatty acid synthesis is significantly more active in
the liver, both according to the literature [6, 7] and based on our results (p-values <
2.074e-05). Compared to the muscle, 86% of the metabolic processes are significantly
more active in the liver (p-value <0.05). Among those that are more active in the
muscle, we found the uptake of several amino acids, including tyrosine, proline and
asparagine (p-values < 0.01). Consistent with the high energetic demands of muscle
tissue, its ATP production is higher than that of the liver (p-value 5.51e-08).
Subsequently, we compared the MPA profiles of the muscle to those of the adipose
tissue. 18 out of the 20 metabolic processes related to amino acids were predicted to
be more active in the muscle compared to the adipose tissue (11 of which with pvalues < 0.05); on the other hand, the metabolism of lipids was predicted to be
significantly more active in adipocytes (with p-values of 6.51e-03 and 2.38e-07, for
acetyl-CoA carboxylase-a (ACC) and ATP citrate lyase (ACL), respectively), as one
would expect based on the literature [8-10]. Furthermore, the results are consistent
with the biomass content of these tissues, as the main component of the muscle is
proteins, while the main component of the adipose tissue is lipids [11]. Thus, the
4
metabolic differences identified by MPA are consistent with the known metabolic
phenotypes of the three tissues that were analyzed. A detailed account of the results
obtained in this inter-tissue comparison is provided in Supplementary Table 2.
MPA correctly estimates lipid production in breast cancer (BC) clinical
samples
To examine the ability of MPA to predict metabolite levels in the tissue we utilized a
dataset of lipid measurements and gene expression profiling of 110 BC tumors [12].
We assigned each sample an MPA lipid-score based on its capacity to activate ACC,
the main reaction in the process of de-novo lipogenesis, per unit of biomass. We
recorded how many of the measured lipids were correlated to the MPA lipid-scores.
Out of the 551 measured lipids, 225 were significantly (p-value < 0.05) positively
correlated to the MPA-lipid-scores (Supplementary Figure 3a). 189 of the 240
triglycerides (TG), and 36 out of the 331 membrane-lipids show a significant positive
correlation to the MPA lipid-scores. To evaluate the significance of the results we
computed the empirical value of MPA's performances by randomly shuffling the
samples, obtaining p-values of 6.40e-03, 2.80e-03, and 0.06, for predicting all the
lipids, TGs, and membrane-lipids, respectively (Supplementary Figure 3b). The gene
expression of the main lipogenesis enzymes, including ACC (Supplementary Figure
3), ACL, and fatty acid synthase (FAS), were not significantly correlated to the lipid
levels (Supplementary Table 3).
Supplementary Figure 3. Prediction of lipid content by MPA. (a) The percentage of lipids
that were significantly positively correlated (p-value<0.05) to the MPA lipid-scores (blue) or
to the gene expression of ACC (gray). (b) The empirical p-values, denoting the significance of
5
the obtained results, in a minus log10 scale. Similar results were obtained when considering
the gene expression levels of FAS and ACL (Supplementary Table 3).
Gene expression based analysis of the metabolic pathways related to a key
membrane-lipid
To understand why the membrane-lipids were less correlated to the MPA lipid scores
we further analyzed a key membrane-lipid that has been previously linked to de-novo
lipogenesis [12]: phosphatidylcholine (PC) (14:0,16:0). However, the level of PC
(14:0,16:0) was not correlated to either the MPA scores or the gene expression of denovo lipogenesis. We computed the correlation between the expression of the
different metabolic reactions and the level of PC (14:0,16:0). Among the reactions
whose expression is positively correlated to PC (14:0,16:0), following FDR
correction, we do not find de-novo lipogenesis reactions as FAS and ACC, or choline
kinase. Instead we find reactions that are related to fatty-acids elongation and fattyacid oxidation. The full list is given in Supplementary Table 4. According to the
analysis, other metabolic pathways may be more indicative of membrane-lipid levels
than de-novo lipogenesis as PC (14:0,16:0).
Supplementary Figure 4. Intracellular ROS levels are reduced in metastatic BC cells. The
distribution of superoxide levels in metastatic (blue) and non-metastatic (red) cell-lines.
6
SUPPLEMENTARY METHODS
MPA metabolite prediction
To compute the level of a given metabolite, m, MPA computes two scores:
(1) Sm - the MPA score that denotes the capacity to produce the metabolite m.
(2) Sbiomass – the MPA score that denotes the capacity to produce biomass.
The metabolite level is estimated to be Sm/Sbiomass, that is, its approximated production
rate normalized by its dilution due to cell proliferation. The production of lipids is
assessed by the capacity to activate ACC, the main reactions in the process of de-novo
lipogenesis.
Prognosis prediction
The data utilized for prognosis prediction includes two datasets of genome-wide gene
expression measurements taken from BC patients and their clinical characteristics [1,
13]. The data of [1] includes additional information regarding the treatment that had
been given to the patient, and five-year survival. The prediction of ER+ patients
prognosis is conducted via the SVM library (LIBSVM) [14]. The prediction process
includes 14 prediction tasks. In each task we predict either five-year or metastasis-free
survival of patients of the same grade, dataset, and treatment group. The samples are
randomly divided into test and train sets 100 times. Each time 2/3 of the samples are
used for training and the remaining 1/3 for testing. Prediction accuracy is measured by
the mean and standard deviation of the AUCs of the Receiver Operating Curves
(ROCs) obtained.
The prediction process, as described above, was performed with different type of
features: the MPA profiles consisting of MPA scores of 57 metabolic processes
(Supplementary Table 1) or only of de-novo lipogenesis, the gene expression of all
genes or only of selected groups of genes, and metabolic pathway enrichment
features. The latter were obtained by assigning each metabolic pathway defined by
[15] a value of 1, -1 or 0. A value of 1 or -1 denotes the pathway is significantly
enriched (hyper-geometric p-value<1e-02) with highly or lowly expressed reactions,
respectively. The selected groups of genes include: lipogenic genes that encode the
7
enzymes ACC, ACL, and FAS, genes included in the BC prognostic signature
identified by [13], and metabolic genes.
Protein abundance prediction based on gene expression data
The gene expression of HeLa cells [16] was utilized to compute the MPA score of
each of the metabolic reactions in the model. As previously explained, the gene
expression is first discretized and mapped to the metabolic reactions. Hence the input
includes two sets of reactions:
(1)
Rhigh – highly expressed reactions
(2)
Rlow – lowly expressed reactions
According to MPA, the metabolic reactions can then be divided as follows:
(3)
Ractive {i | HeLaMPA (i) t P}
(4)
Rinactive {i | HeLaMPA (i) P}
where HeLaMPA(i) is the MPA score of reaction i based on the gene expression of the
HeLa cells, and µ is the median of these scores. Reactions for which MPA's
prediction is inconsistent with the gene expression are:
(5)
PTUR {i | i  Ractive and i  Rhigh }
(6)
PTDR {i | i  Rinactive and i  Rlow}
where PTUR and PTDR consist of reactions that are predicted to be posttranscriptionally up or down regulated, respectively. To test these predictions we
utilized the protein abundance. The protein abundance of the reactions in the PTUR
set was compared to the protein abundance of the reactions that could have been
predicted as PTUR (i.e., ™Rhigh - reaction that are not highly expressed) via Wilcoxon
rank-sum test. Similarly, the protein abundance of the reactions in the PTDR set was
compared to that of the reactions that could have been predicted as PTDR (i.e., ™Rlow reaction that are not lowly expressed).
Due to the discretization of the gene expression prior to its utilization via MPA, some
reactions are denoted as moderately expressed, and therefore are not included in the
objective function. These reactions can also have varying expression states that are
8
not given as input for MPA. However, MPA predicts their activity nonetheless. Like
the PTUR and PTDR sets, these predictions do not arise directly from the gene
expression, and therefore can be utilized to asses MPA performances. To this end we
compared (via one-sided Wilcoxon rank-sum test) the protein abundance of the
moderately expressed reactions that have been predicted to be active (i.e., i  Ractive )
to those that have been predicted to be inactive (i.e., i  Ractive ).
MPA input: Protein vs. gene expression
MPA can be applied either by utilizing gene expression or protein abundance data. If
the two are available it is pertinent to either give precedence to the proteomic data or
combine both in one confidence measure, as protein abundance measurements are
considered to be more proximal to the pertaining enzymatic activity.
Sampling-based stoichiometric analysis
Sampling-based stoichiometric analysis is a technique by which different feasible flux
distributions are sampled under a given condition. The obtained flux distributions
enable to compute the correlation between the flux rates of metabolic reactions, and
the differences between the flux-rates of a reaction of interest under different
conditions. A sampling-based stoichiometric analysis of the generic human metabolic
model was applied to sample 2000 flux distributions [17]. A thousand of them were
sampled with the constraint to uptake glutamine, and a thousand with the constraint to
secrete glutamine. Analysis of variance (ANOVA) was preformed to compare the flux
through phosphoglycerate dehydrogenase (PHGDH) in the two sets of samples, and
Spearman correlation between the PHGDH and glutamine exchange fluxes was
computed.
REFERENCES
1.
2.
3.
Chang, H.Y., et al., Robustness, scalability, and integration of a wound-response gene
expression signature in predicting breast cancer survival. Proceedings of the National
Academy of Sciences of the United States of America, 2005. 102(10): p. 3738-3743.
Lee, Y.H., et al., Microarray profiling of isolated abdominal subcutaneous adipocytes
from obese vs non-obese Pima Indians: increased expression of inflammation-related
genes. Diabetologia, 2005. 48(9): p. 1776-1783.
Wu, X., et al., The effect of insulin on expression of genes and biochemical pathways
in human skeletal muscle. Endocrine, 2007. 32(3): p. 356-356.
9
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
Gille, C., et al., HepatoNet1: a comprehensive metabolic reconstruction of the human
hepatocyte for the analysis of liver physiology. Mol Syst Biol, 2010. 6.
Gebhardt, R., Metabolic zonation of the liver: regulation and implications for liver
function. Pharmacol Ther, 1992. 53: p. 275-354.
Mathews, C., K.v. Holde, and K. Ahern, Biochemistry2000: Addison-Welsley
Publishing Company.
Berg, J.M., J.L. Tymoczko, and L. Stryer, in Biochemistry2002, New York: W H
Freeman.
Tipton, K.D. and R.R. Wolfe, Exercise , protein metabolism , and muscle growth. Int J
Sport Nutr Exerc Metab., 2001. 11(1): p. 109-32.
Tipton, K.D., et al., Postexercise net protein synthesis in human muscle from orally
administered amino acids. American Journal of Physiology - Endocrinology And
Metabolism, 1999. 276(4): p. E628-E634.
Borsheim, E., A. Aarsland, and R.R. Wolfe, Effect of an amino acid, protein, and
carbohydrate mixture on net muscle protein balance after resistance exercise. Int J
Sport Nutr Exerc Metab., 2004. 14(3): p. 255-71.
Bordbar, A., et al., A multi-tissue type genome-scale metabolic network for analysis
of whole-body systems physiology. BMC Systems Biology, 2011. 5(1): p. 180.
Hilvo, M., et al., Novel Theranostic Opportunities Offered by Characterization of
Altered Membrane Lipid Metabolism in Breast Cancer Progression. Cancer Research,
2011. 71(9): p. 3236-3245.
van 't Veer, L.J., et al., Gene expression profiling predicts clinical outcome of breast
cancer. Nature, 2002. 415(6871): p. 530-6.
Chang, C.-C. and C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans.
Intell. Syst. Technol., 2011. 2(3): p. 1-27.
Duarte, N.C., et al., Global reconstruction of the human metabolic network based on
genomic and bibliomic data. Proc Natl Acad Sci U S A, 2007. 104(6): p. 1777-82.
Nagaraj, N., et al., Deep proteome and transcriptome mapping of a human cancer
cell line. Mol Syst Biol, 2011. 7.
Becker, S.A., et al., Quantitative prediction of cellular metabolism with constraintbased models: the COBRA Toolbox. Nat. Protocols, 2007. 2(3): p. 727-738.
10