Determining minimum set of driver nodes in

Determining minimum set of driver nodes in protein-protein interaction
networks: Additional file 1
Xiao-Fei Zhang, Le Ou-Yang, Yuan Zhu, Meng-Yun Wu, and Dao-Qing Dai
Contents
1 Supplementary Table
2
2 Supplementary Figure
3
1
1 Supplementary Table
Table S1: Significance of the difference between degree populations of predicted driver proteins and non-driver proteins
intlinprog
Dataset
combined
binary
co-complex
lp solve
MDS
CC-MDS
MDS
CC-MDS
4.9E-10
2.3E-15
8.3E-05
0
0
1.3E-134
1.6E-131
6.7E-183
4.2E-64
0
0
1.3E-134
Table S2: Significance of the difference between betweenness populations of predicted driver proteins and non-driver proteins
intlinprog
Dataset
combined
binary
co-complex
lp solve
MDS
CC-MDS
MDS
CC-MDS
1.6E-25
1.3E-30
1.1E-15
0
0
8.5E-248
1.1E-235
4.6E-314
9.4E-130
0
0
8.5E-248
Table S3: Significance of the difference between populations of the number of annotated protein complexes of predicted
driver proteins and non-driver proteins
intlinprog
Dataset
combined
binary
co-complex
lp solve
MDS
CC-MDS
MDS
CC-MDS
3.5E-03
2.3E-02
1.5E-02
2.3E-06
6.1E-06
5.6E-05
5.2E-05
5.1E-04
5.4E-04
2.2E-06
5.9E-06
5.6E-05
Table S4: Significance of the difference between populations of the number of annotated GO annotations of predicted driver
proteins and non-driver proteins
intlinprog
Dataset
lp solve
Ontology
MDS
CC-MDS
MDS
CC-MDS
combined
BP
CC
MF
2.9E-06
7.8E-11
1.5E-15
1.4E-33
1.7E-44
3.2E-78
6.3E-17
6.7E-26
8.1E-50
2.5E-33
8.7E-44
6.4E-78
binary
BP
CC
MF
3.1E-07
1.3E-14
1.8E-16
5.6E-31
4.9E-43
7.1E-72
2.9E-20
1.8E-28
3.8E-48
9.8E-31
1.2E-42
1.4E-71
co-complex
BP
CC
MF
1.9E-05
4.4E-07
5.0E-13
3.2E-28
1.7E-31
5.5E-75
7.8E-20
2.0E-21
1.1E-43
3.2E-28
1.7E-31
5.5E-72
2
2 Supplementary Figure
A
B
1
0
1406
0.9
1404
0.25
0.8
1402
0.7
1400
0.5
1398
0.6
1396
0.75
0.5
1394
0.4
= 0.15
1
1392
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
Figure S1: Effect of parameter γ on the resulting CC-MDS proteins for the intlinprog method in the binary network. In (A),
we present the effect of parameter γ on the number of predicted driver proteins. The x-axis denotes the value of γ; the y-axis
denotes the number of driver proteins determined using the CC-MDS model; the red circle labels the value of γ we choose.
In (B), we present the overlap rate between the sets of driver proteins obtained using different values of γ.
Number of driver proteins
552
1
0
551
0.9
0.25
550
0.8
549
0.7
0.5
548
0.6
547
0.75
0.5
= 0.05
546
0.4
1
545
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
Figure S2: Effect of parameter γ on the resulting CC-MDS proteins for the intlinprog method in the complex network. In
(A), we present the effect of parameter γ on the number of predicted driver proteins. The x-axis denotes the value of γ; the
y-axis denotes the number of driver proteins determined using the CC-MDS model; the red circle labels the value of γ we
choose. In (B), we present the overlap rate between the sets of driver proteins obtained using different values of γ.
3
A
2
Degree
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
B
2
Degree
10
1
10
0
10
C
2
Degree
10
1
10
0
10
Figure S3: Degree distributions of predicted driver and non-driver proteins. The degree distributions of predicted driver and
non-driver proteins are represented by box plots (line = median). (A) combined network; (B) binary network; (C) co-complex
network.
4
0
10
-2
10
-4
10
-6
10
0
10
-2
10
-4
10
-6
10
0
10
-2
10
-4
10
-6
10
Figure S4: Betweenness distributions of predicted driver and non-driver proteins. The betweenness distributions of predicted
driver and non-driver proteins are represented by box plots (line = median). (A) combined network; (B) binary network; (C)
co-complex network.
5
A
4000
Number of connected
components
3500
3000
2500
2000
1500
MDS-intlinprog
CC-MDS-intlinprog
MDS-lp-solve
CC-MDS-lp-solve
1000
500
0
0
500
1000
1500
Number of deleted proteins
B
4000
Number of connected
components
3500
3000
2500
2000
1500
MDS-intlinprog
CC-MDS-intlinprog
MDS-lp-solve
CC-MDS-lp-solve
1000
500
0
0
500
1000
1500
Number of deleted proteins
C
1400
Number of connected
components
1200
1000
800
600
MDS-intlinprog
CC-MDS-intlinprog
MDS-lp-solve
CC-MDS-lp-solve
400
200
0
0
100
200
300
400
500
600
Number of deleted proteins
Figure S5: Vulnerability to attack against predicted driver proteins quantified using the number of connected components.
Starting with the most connected proteins, the proteins are successively deleted and the number of connected components
after each deletion is calculated. There is one curve for each set of predicted driver proteins that shows the number of
connected components as a function of the number of deleted proteins. (A) combined network; (B) binary network; (C)
co-complex network.
6
Largest connected component
A
1
MDS-intlinprog
CC-MDS-intlinprog
MDS-lp-solve
CC-MDS-lp-solve
0.9
0.8
0.7
0.6
0.5
0.4
0
500
1000
1500
Number of deleted proteins
Largest connected component
B
1
MDS-intlinprog
CC-MDS-intlinprog
MDS-lp-solve
CC-MDS-lp-solve
0.9
0.8
0.7
0.6
0.5
0.4
0
500
1000
1500
Number of deleted proteins
Largest connected component
C
1
MDS-intlinprog
CC-MDS-intlinprog
MDS-lp-solve
CC-MDS-lp-solve
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
100
200
300
400
500
600
Number of deleted proteins
Figure S6: Vulnerability to attack against predicted driver proteins quantified using the largest connected component. Starting
with the most connected proteins, the proteins are successively deleted and the size of largest connected component after
each deletion is calculated. There is one curve for each set of predicted driver proteins that shows the fraction of nodes in the
largest connected component as a function of the number of deleted proteins. (A) combined network; (B) binary network;
(C) co-complex network.
7
Number of complexes
A
2
10
1
10
0
10
Number of complexes
B
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
non-MDS
CC-MDS
non-CC-MDS
2
10
1
10
0
10
C
2
Number of complexes
10
1
10
0
10
Figure S7: Distributions of the number of associated complexes of predicted driver and non-driver proteins. The distributions
of the number of associated protein complexes of predicted driver and non-driver proteins are represented by box plots (line
= median). (A) combined network; (B) binary network; (C) co-complex network.
8
2
Number of annotations
10
1
10
0
10
2
Number of annotations
10
1
10
0
10
2
Number of annotations
10
1
10
0
10
Figure S8: Distributions of the number of associated GO annotations of predicted driver and non-driver proteins in the
combined network. The distributions of the number of associated GO annotations of predicted driver and non-driver proteins
are represented by box plots (line = median). (A) biological process; (B) cellular component; (C) molecular function.
9
A
2
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
intlinprog
non-MDS
CC-MDS
non-CC-MDS
lp_solve
B
2
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
intlinprog
non-MDS
CC-MDS
non-CC-MDS
lp_solve
C
2
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
intlinprog
MDS
non-MDS
CC-MDS
non-CC-MDS
lp_solve
Figure S9: Distributions of the number of associated GO annotations of predicted driver and non-driver proteins in the
binary network. The distributions of the number of associated GO annotations of predicted driver and non-driver proteins are
represented by box plots (line = median). (A) biological process; (B) cellular component; (C) molecular function.
10
A
2
Number of annotations
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
intlinprog
non-MDS
CC-MDS
non-CC-MDS
lp_solve
B
2
Number of annotations
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
MDS
intlinprog
non-MDS
CC-MDS
non-CC-MDS
lp_solve
C
2
Number of annotations
10
1
10
0
10
MDS
non-MDS
CC-MDS
non-CC-MDS
intlinprog
MDS
non-MDS
CC-MDS
non-CC-MDS
lp_solve
Figure S10: Distributions of the number of associated GO annotations of predicted driver and non-driver proteins in the
complex network. The distributions of the number of associated GO annotations of predicted driver and non-driver proteins
are represented by box plots (line = median). (A) biological process; (B) cellular component; (C) molecular function.
11
CC-MDS (1407)
33
18
17
1339
33
146
DS-DC (1536)
32
DS-GDC (1534)
CC-MDS (1393)
33
21
12
1327
26
151
DS-DC (1525)
32
DS-GDC (1532)
CC-MDS (546)
10
9
4
523
9
77
DS-DC (618)
13
DS-GDC (617)
Figure S11: Overlap of the three sets of driver proteins produced by CC-MDS, DS-DC and DS-GDC algorithms applied on
the three networks considered. (A) combined network; (B) binary network; (C) co-complex network.
12