Supplementary Information

Supplementary Information
for
Genomic analysis reveals a tight link between transcription
factor dynamics and regulatory network architecture
Raja Jothi1,+,*, S. Balaji2,a,+, Arthur Wuster3, Joshua A Grochow4, Jörg Gsponer3, Teresa M Przytycka2,
L. Aravind2 and M. Madan Babu3,*
1
Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North
Carolina 27709, USA
2
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894,
USA
3
MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK
4
Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
+
These authors contributed equally to this work
*
Correspondence: RJ ([email protected]) or MMB ([email protected])
a
Present address: Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of
Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
Table of Contents
Vertex sort algorithm ............................................................................................................................. 2
Leaf-removal algorithm .......................................................................................................................... 3
BFS-level algorithm ................................................................................................................................ 3
Topological (linear) vs temporal ordering ............................................................................................... 4
PHO2’s position in the hierarchy ........................................................................................................... 4
Supplementary Figure S1. ....................................................................................................................... 5
Supplementary Figure S2. ....................................................................................................................... 6
Supplementary Figure S3. ....................................................................................................................... 7
Supplementary Figure S4. ....................................................................................................................... 8
Supplementary Figure S5. ....................................................................................................................... 9
Supplementary Table S1. ...................................................................................................................... 10
References............................................................................................................................................ 11
1
Vertex sort algorithm
Given a directed network denoted by graph G(V, E), where V is the set of vertices (nodes) and E is the set of directed
edges, the following linear-time algorithm (i.e., θ(V+E)-time) topologically sorts (or orders) the nodes in the network to
infer hierarchical structure of the network. The running time of the algorithm is dominated by the depth-first search
(DFS) procedure, which takes θ(V+E)-time.
VERTEX-SORT(G)
1 S ← STRONGLY-CONNECTED-COMPONENTS(G)
2 for each strongly connected component C in S
3
do COLLAPSE-SCC(G, C)
4 ITERATIVE-LEAF-REMOVAL(G)
5 ITERATIVE-LEAF-REMOVAL(GT)
► GT is same as G with direction of edges reversed
6 number-of-layers ← l
► l as computed in lines 4 or 5
7 for each node u in G
8
do lower-bound[u] ← layer[u] as computed in step 4
9
upper-bound[u] ← layer[u] as computed in step 5
10 for each node u in G
11
do if lower-bound[u] = upper-bound[u]
► u is not part of the hierarchy
12
then hierarchichal-layer[u] ← lower-bound[u]
13
else hierarchichal-layer[u] ← lower-bound[u] to upper-bound[u], inclusive
STRONGLY-CONNECTED-COMPONENTS(G)
1 call DFS(G) to compute finishing times finish[u] for each node u
2 compute transpose of G i.e., GT (reverse the direction of edges in G)
3 call DFS(GT), but in the main loop of DFS, consider the nodes in order of decreasing finish[u] (as computed in
line 1)
4 output the vertices of each tree in the depth-first forest of step 3 as a separate strongly connected component
DFS(G)
1 for each node u in V[G]
2
do color[u]← WHITE
3 time ← 0
4 for each node u in V[G]
5
do if color[u] = WHITE
6
then DFS-VISIT(u)
DFS-VISIT(u)
1 color[u] ← GRAY
2 discovery[u] ← time ← time + 1
3 for each node v to which u has a directed edge
4
do if color[v] = WHITE
5
then DFS-VISIT(v)
6 color[u] ← BLACK
7 finish[u] ← time ← time + 1
► White node u has just been discovered
► Explore edge (u→v)
COLLAPSE-SCC(G, C)
1 V[G] ← V[G] U s
2 for each node u ≠ s in G\C
► u is in G, but not in C
3
do if there exists an edge from u to at least one node in C
4
E[G] ← E[G] U (u→s)
5
if there exists an edge from at least one node in C to u
6
E[G] ← E[G] U (s→u)
7 for each node u in C
8
do remove u and all edges incident on u (from G)
2
ITERATIVE-LEAF-REMOVAL(G)
1 l←0
2 while V[G] ≠ ø
3
do l ← l + 1
4
REMOVE-LEAVES(G, l)
REMOVE-LEAVES(G, l)
1 L←ø
►Set of leaves
2 for each node u in V[G]
3
do if u has no outgoing edge
► u is a leaf node
4
then L ← L U {u}
5 for each leaf node u in L
6
do layer(u) ← l
7
remove u and all incoming edges to u (from G)
Leaf-removal algorithm
The leaf-removal algorithm is a bottom-up iterative procedure, which, because of its simplicity, is one of the commonly
used approaches for network decomposition. In each iteration, all the leaf nodes and the edges incident on them are
peeled-off (or removed from) the network. The algorithm stops when the network is fully decomposed or when there are
no more nodes to peel. The pseudo-code for the leaf-removal algorithm is given in the section above. This approach has
been used to infer hierarchical structures in biological networks (Balazsi et al, 2005; Ma et al, 2004a; Ma et al, 2004b).
Given its design, which necessitates that there is at least one leaf node to peel at every iterative step (until the network is
fully decomposed), the procedure requires that the input network is acyclic. Also, the bottom-up procedure may
incorrectly fix the hierarchical levels of certain nodes, whose position in the network is not well-defined (see Box 1 in
the main text).
BFS-level algorithm
The BFS-level algorithm (Yu and Gerstein, 2006), like the leaf-removal algorithm, is a bottom-up procedure in which
each node in the network is assigned a hierarchical level equal to the number of edges to the closest leaf node plus one.
The method’s reliance in determining the hierarchical levels of nodes in reference to leaf nodes makes it a bottom-up
procedure. Unlike the leaf-removal algorithm, the BFS-level algorithm can be applied on networks containing cycles.
However, it can be proved mathematically that the hierarchical structure inferred by the BFS-level algorithm may not
necessarily be correct. To outline the proof, let us consider a network in which node u has two different directed paths to
leaf node z: u→v→w→x→y→z and u→ z. The BFS-level algorithm will classify leaf node z to belong to level 1, node u
and y to belong to level 2, node x to belong to level 3, node w to belong to level 4, and node v to belong to level 5.
Topologically, although u occupies a higher position than nodes v, w, and x in the network, the BFS-level algorithm will
output an incorrect topological ordering of the nodes, i.e., v(5)→w(4)→x(3)→u, y(2)→z(1). The vertex sort algorithm
and the leaf-removal algorithm will identify the correct topological order in this case. The hierarchy inferred by BFSlevel algorithm for the yeast regulatory network is presented in Fig S1A. An instance for which the BFS-level algorithm
infers an inaccurate hierarchy is given in Fig S1B. The underlying problem in the design of the BFS-level algorithm
highlighted in these examples also prevents it from being scalable. That is, for instance, let u and v be any two nodes
inferred to belong to levels i and j ≤ i in the original network, respectively. Over time, imagine that the network has
grown with the addition of new nodes and/or edges. The BFS-level algorithm does not guarantee that the new levels i*
and j* of nodes u and v after the addition of new nodes and/or edges to the network will be such that j* ≤i*.
3
BFS-LEVEL-ALGORITHM(G)
1 L←ø
2 for each node s in V[G]
3
do if s has no outgoing edge
4
then L ← L U {s}
5 for each leaf node s in L
6
BFS(GT, s)
►Set of leaves
► u is a leaf node
► GT is same as G with direction of edges reversed
BFS (G, s)
1 for each node u in V[G] – {s}
2
do layer[u] ← ∞
3 layer[s] ← 1
4 Q ← {s}
5 While Q ≠ ø
6
do u ← head[Q]
7
for each v to which u has a directed edge
8
do if layer[v] > layer[u] + 1
9
then layer[v] ← layer[u] + 1
10
ENQUEUE(Q, v)
11
DEQUEUE(Q)
► Q is a “queue” data structure
► u is the element at the head of the queue Q
► Insert v to the tail of the queue Q
► Remove the element at the head of the queue Q
Topological (linear) vs temporal ordering
The information available in the form of a static network structure alone may not be sufficient to resolve with certainty
the possible ordering of nodes within the hierarchy. In other words, because the static network only allows us to perform
linear topological ordering and not temporal ordering, some ambiguities cannot be resolved. This is illustrated by the
following example.
In Fig 2, assume that node 8 is not regulated by node 6, i.e., the edge from node 6 to node 8 is removed. The vertex sort
algorithm will still place nodes 7 and 8 at the same level because they form a SCC. One can argue that node 7 needs to
be placed at a higher level than node 8 because node 8 will be activated only after node 7 gets activated by node 6. It
needs to be emphasized that the network depicted in Fig 2 is a static structure capturing regulatory events that occur at
different time points with possibly different combinatorial logic (AND, OR, etc) operating on a node. This would
immediately mean that there are two possible temporal order of activations: (a) node 8 gets activated only after node 7
gets activated and (b) node 7 gets activated only after it receives signals from both 6 and 8 (note that 8 has no in-degree).
Given the static network structure alone, it is almost impossible to resolve these two possibilities with certainty. In other
words, ambiguities such as these cannot be resolved unless temporal data is made available.
PHO2’s position in the hierarchy
PHO2 is regulated by ABF1, which is regulated by MBP1 (MBP1ABF1PHO2). MBP1 and ABF1 occupy the toplevels 7 and 6 in the hierarchy, respectively (Fig 3C). Downstream, PHO2 regulates transcription factors ACA1, CRZ1,
and YPR196W (all occupying the bottom layer), and tens of other target genes. This particular regulatory pathway,
which goes through PHO2, does not involve core TFs. This leaves PHO2 as a component by itself parallel to the core,
which makes PHO2’s position in the hierarchy, as defined by the 3 layers, unclear as one could theoretically position
PHO2 in the core or bottom layer and still maintain the linear/topological ordering of TFs.
4
BFS-level algorithm on Yeast transcription network
A
Level 4
ARG81
MAC1
NDT80
Level 3
ADR1
GLN3
OAF1
AFT2*
GTS1
PDC2◄
ARG80
HMS1
PIP2
ASH1
INO4
RPN4
AZF1
IXR1
SPT23
CAT8
LEU3
TOS8*
CUP9
MAL33
TYE7
DAL81
MCM1◄
YAP1
DAL82
MIG1
FKH2*
MIG2
Level 2
ABF1*◄
FHL1*◄
HAL9
MBP1*
PHD1*
ROX1
STP1
XBP1
ACE2
FKH1
HAP1
MET31
PHO2
RPH1
STP2
YAP3
AFT1*
FLO8*
HAP4
MET32
PHO4
RTG1
SUM1
YAP5*
ARO80
FZF1
HCM1*
MET4◄
PLM2*
RTG3
SUT1
YAP6*
ARR1
GAL4
HMLALPHA2
MGA1*
PUT3
SIP4
SWI4*
YAP7*
CBF1*
GAT1
HMRA1
MSN2*
RAP1*◄
SKN7*
SWI5
YDR026C
CHA4
GAT3
HMRA2
MSN4*
REB1*◄
SKO1
TEC1*
YHP1
CIN5*
GCN4*
HMS2
NRG1*
RGT1
SMP1
TOS4*
YOX1*
DAL80
GZF3
HSF1*◄
OTU1
RIM101
SOK2*
UGA3
ZAP1
DAT1
HAC1
INO2
PDR1
RME1
STE12*
UME6*
Level 1
ACA1
IME1
PDR8
STB5
YER130C
BAS1
LYS14
PPR1
STP4
YER184C
CAD1
MAL13
RDR1
THI2
YJL206C
CRZ1
MATALPHA1
RDS1
UPC2
YKR064W
CST6
MET28
RGM1
URC2
YML081W
CUP2
MIG3
RLM1
USV1
YPR196W
ECM22
MOT3
SFL1
WAR1
YRM1
EDS1
MSN1
SFP1
YBL054W
YRR1
GCR1◄
MSS11
SRD1
YDR049W
HMLALPHA1
PDR3
STB4
YDR266C
TFs with zero in-degree
B
◄ Essential genes
* Regulatory hubs
An instance of inaccurate hierarchy as inferred by the BFS-level algorithm
Level 3
OAF1
Level 2
UME6
Level 2
GAT1
NDT80
Level 4
Level 4
ARG81
AFT2
Level 3
Level 3
CUP9
AFT1
Level 2
Level 2
YAP6
RDS1
ACA1
IME1
XBP1
YER130C
RGM1
STB4
YML081W
Level 1
Supplementary Figure S1. (A) Hierarchy inferred by the BFS-level algorithm on the yeast regulatory network. TFs
colored in red, green, and blue correspond to those in the top-, core- and bottom-layers of the hierarchy inferred using the
vertex sortex algorithm (Fig 3B in the main text). TFs colored in black are not regulated and do not regulate other TF in
the network. TFs colored in green, which are part of a strongly connected component (SCC) and should rightfully
occupy the same level in any hierarchy, are dispersed across different levels of the hierarchy inferred by the BFS-level
algorithm. TFs with zero in-degree (those that are not regulated by any other TFs in the network), which should all
rightfully occupy the top level of any hierarchy, are dispersed across different levels of the hierarchy inferred by the
BFS-level algorithm. (B) An instance of inaccurate hierarchy as inferred by the BFS-level algorithm. The actual
hierarchy is displayed as a graph with the hierarchical levels of TFs as inferred by the BFS-level algorithm by the side.
Clearly OAF1 and UME6 are above ARG81 in the actual hierarchy. However, the BFS-level algorithm reports ARG81
to be above OAF1 and UME1 in the hierarchy it infers, which is obviously incorrect.
5
0.12
Observed
Frequency
0.10
0.08
0.06
0.04
Random
0.02
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
57
55
0.00
Number of TFs in the largest SCC
Supplementary Figure S2. Number of TFs in the SCC of the yeast transcription regulatory network is statistically
smaller (p < 2.8x10-3) than what one would expect to see in a random network of same size and degree distribution.
6
Supplementary Figure S3. Percentages of TFs, in each layer, whose deletion is either lethal or has moderate to
strong negative effect. There is a strong positive correlation between TF fitness and the how high the TF is in the
hierarchy regardless of the major growth conditions. YPD – 1% Bacto-peptone, 2% yeast extract and 2% glucose;
YPDGE – 0.1% glucose, 3% glycerol and 2% ethanol; YPG – 3% glycerol; YPE – 2% ethanol; YPL – 2% lactate (see
Methods for definition of fitness categories).
7
tRNA adaptation index
tRNA adaptation index
Supplementary Figure S4. Distribution of tRNA adaptation index values for TFs in each of the three layers of the
inferred hierarchy. No statistical significant differences exist between the distributions.
8
A
B
Protein abundance
Protein degradation
Protein half-life [min]
Protein abundance [mol/cell]
p < 0.065
C
D
Protein Noise
Protein noise
[distance from median CV]
16
8
0
% TFs containing TATA box
24
Transcriptional Noise
Top
Core
Bottom
Supplementary Figure S5. Dynamic properties of TFs in relation to the hierarchy inferred by BFS-level
algorithm. Distribution of TF values in each of the three layers of the inferred hierarchy for (A) protein abundance
(copies/cell), (B) protein half-life (mins), and (D) Noise in protein abundance (variability in protein levels in a cell
population). (C) Percentages of TFs in each of the three hierarchical layers containing a TATA box. The expected
percentage is shown as a dashed line (22%). The x-axis in (D) denotes protein noise measured as the distance from
median co-efficient of variation of all proteins (DM; see Methods)
9
Supplementary Table S1. Overalp of TFs in the corresponding layers of hierarchies inferred by the vertex sort and
the BFS-level algorithms.
N = 158
Top
Core
Bottom
Unclassified
Vertex sort
n1
25
64
59
10
BFS-level
n2
31
79
48
-
Overlap
x
9
45
39
-
10
Obs/Exp
(x*N)/(n1*n2)
1.8
1.4
2.2
-
p-value
(Hypergeom. Distr.)
0.029
2.2e-5
5.5e-14
-
4. References
Balazsi G, Barabasi AL, Oltvai ZN (2005) Topological units of environmental signal processing in the transcriptional
regulatory network of Escherichia coli. Proc Natl Acad Sci U S A 102: 7841-7846.
Ma HW, Buer J, Zeng AP (2004a) Hierarchical structure and modules in the Escherichia coli transcriptional regulatory
network revealed by a new top-down approach. BMC Bioinformatics 5: 199.
Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP (2004b) An extended transcriptional regulatory network of
Escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Res 32: 6643-6649.
Yu H, Gerstein M (2006) Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci U S
A 103: 14724-14731.
11