Supplementary Information for Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture Raja Jothi1,+,*, S. Balaji2,a,+, Arthur Wuster3, Joshua A Grochow4, Jörg Gsponer3, Teresa M Przytycka2, L. Aravind2 and M. Madan Babu3,* 1 Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina 27709, USA 2 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA 3 MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK 4 Department of Computer Science, University of Chicago, Chicago, IL 60637, USA + These authors contributed equally to this work * Correspondence: RJ ([email protected]) or MMB ([email protected]) a Present address: Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA Table of Contents Vertex sort algorithm ............................................................................................................................. 2 Leaf-removal algorithm .......................................................................................................................... 3 BFS-level algorithm ................................................................................................................................ 3 Topological (linear) vs temporal ordering ............................................................................................... 4 PHO2’s position in the hierarchy ........................................................................................................... 4 Supplementary Figure S1. ....................................................................................................................... 5 Supplementary Figure S2. ....................................................................................................................... 6 Supplementary Figure S3. ....................................................................................................................... 7 Supplementary Figure S4. ....................................................................................................................... 8 Supplementary Figure S5. ....................................................................................................................... 9 Supplementary Table S1. ...................................................................................................................... 10 References............................................................................................................................................ 11 1 Vertex sort algorithm Given a directed network denoted by graph G(V, E), where V is the set of vertices (nodes) and E is the set of directed edges, the following linear-time algorithm (i.e., θ(V+E)-time) topologically sorts (or orders) the nodes in the network to infer hierarchical structure of the network. The running time of the algorithm is dominated by the depth-first search (DFS) procedure, which takes θ(V+E)-time. VERTEX-SORT(G) 1 S ← STRONGLY-CONNECTED-COMPONENTS(G) 2 for each strongly connected component C in S 3 do COLLAPSE-SCC(G, C) 4 ITERATIVE-LEAF-REMOVAL(G) 5 ITERATIVE-LEAF-REMOVAL(GT) ► GT is same as G with direction of edges reversed 6 number-of-layers ← l ► l as computed in lines 4 or 5 7 for each node u in G 8 do lower-bound[u] ← layer[u] as computed in step 4 9 upper-bound[u] ← layer[u] as computed in step 5 10 for each node u in G 11 do if lower-bound[u] = upper-bound[u] ► u is not part of the hierarchy 12 then hierarchichal-layer[u] ← lower-bound[u] 13 else hierarchichal-layer[u] ← lower-bound[u] to upper-bound[u], inclusive STRONGLY-CONNECTED-COMPONENTS(G) 1 call DFS(G) to compute finishing times finish[u] for each node u 2 compute transpose of G i.e., GT (reverse the direction of edges in G) 3 call DFS(GT), but in the main loop of DFS, consider the nodes in order of decreasing finish[u] (as computed in line 1) 4 output the vertices of each tree in the depth-first forest of step 3 as a separate strongly connected component DFS(G) 1 for each node u in V[G] 2 do color[u]← WHITE 3 time ← 0 4 for each node u in V[G] 5 do if color[u] = WHITE 6 then DFS-VISIT(u) DFS-VISIT(u) 1 color[u] ← GRAY 2 discovery[u] ← time ← time + 1 3 for each node v to which u has a directed edge 4 do if color[v] = WHITE 5 then DFS-VISIT(v) 6 color[u] ← BLACK 7 finish[u] ← time ← time + 1 ► White node u has just been discovered ► Explore edge (u→v) COLLAPSE-SCC(G, C) 1 V[G] ← V[G] U s 2 for each node u ≠ s in G\C ► u is in G, but not in C 3 do if there exists an edge from u to at least one node in C 4 E[G] ← E[G] U (u→s) 5 if there exists an edge from at least one node in C to u 6 E[G] ← E[G] U (s→u) 7 for each node u in C 8 do remove u and all edges incident on u (from G) 2 ITERATIVE-LEAF-REMOVAL(G) 1 l←0 2 while V[G] ≠ ø 3 do l ← l + 1 4 REMOVE-LEAVES(G, l) REMOVE-LEAVES(G, l) 1 L←ø ►Set of leaves 2 for each node u in V[G] 3 do if u has no outgoing edge ► u is a leaf node 4 then L ← L U {u} 5 for each leaf node u in L 6 do layer(u) ← l 7 remove u and all incoming edges to u (from G) Leaf-removal algorithm The leaf-removal algorithm is a bottom-up iterative procedure, which, because of its simplicity, is one of the commonly used approaches for network decomposition. In each iteration, all the leaf nodes and the edges incident on them are peeled-off (or removed from) the network. The algorithm stops when the network is fully decomposed or when there are no more nodes to peel. The pseudo-code for the leaf-removal algorithm is given in the section above. This approach has been used to infer hierarchical structures in biological networks (Balazsi et al, 2005; Ma et al, 2004a; Ma et al, 2004b). Given its design, which necessitates that there is at least one leaf node to peel at every iterative step (until the network is fully decomposed), the procedure requires that the input network is acyclic. Also, the bottom-up procedure may incorrectly fix the hierarchical levels of certain nodes, whose position in the network is not well-defined (see Box 1 in the main text). BFS-level algorithm The BFS-level algorithm (Yu and Gerstein, 2006), like the leaf-removal algorithm, is a bottom-up procedure in which each node in the network is assigned a hierarchical level equal to the number of edges to the closest leaf node plus one. The method’s reliance in determining the hierarchical levels of nodes in reference to leaf nodes makes it a bottom-up procedure. Unlike the leaf-removal algorithm, the BFS-level algorithm can be applied on networks containing cycles. However, it can be proved mathematically that the hierarchical structure inferred by the BFS-level algorithm may not necessarily be correct. To outline the proof, let us consider a network in which node u has two different directed paths to leaf node z: u→v→w→x→y→z and u→ z. The BFS-level algorithm will classify leaf node z to belong to level 1, node u and y to belong to level 2, node x to belong to level 3, node w to belong to level 4, and node v to belong to level 5. Topologically, although u occupies a higher position than nodes v, w, and x in the network, the BFS-level algorithm will output an incorrect topological ordering of the nodes, i.e., v(5)→w(4)→x(3)→u, y(2)→z(1). The vertex sort algorithm and the leaf-removal algorithm will identify the correct topological order in this case. The hierarchy inferred by BFSlevel algorithm for the yeast regulatory network is presented in Fig S1A. An instance for which the BFS-level algorithm infers an inaccurate hierarchy is given in Fig S1B. The underlying problem in the design of the BFS-level algorithm highlighted in these examples also prevents it from being scalable. That is, for instance, let u and v be any two nodes inferred to belong to levels i and j ≤ i in the original network, respectively. Over time, imagine that the network has grown with the addition of new nodes and/or edges. The BFS-level algorithm does not guarantee that the new levels i* and j* of nodes u and v after the addition of new nodes and/or edges to the network will be such that j* ≤i*. 3 BFS-LEVEL-ALGORITHM(G) 1 L←ø 2 for each node s in V[G] 3 do if s has no outgoing edge 4 then L ← L U {s} 5 for each leaf node s in L 6 BFS(GT, s) ►Set of leaves ► u is a leaf node ► GT is same as G with direction of edges reversed BFS (G, s) 1 for each node u in V[G] – {s} 2 do layer[u] ← ∞ 3 layer[s] ← 1 4 Q ← {s} 5 While Q ≠ ø 6 do u ← head[Q] 7 for each v to which u has a directed edge 8 do if layer[v] > layer[u] + 1 9 then layer[v] ← layer[u] + 1 10 ENQUEUE(Q, v) 11 DEQUEUE(Q) ► Q is a “queue” data structure ► u is the element at the head of the queue Q ► Insert v to the tail of the queue Q ► Remove the element at the head of the queue Q Topological (linear) vs temporal ordering The information available in the form of a static network structure alone may not be sufficient to resolve with certainty the possible ordering of nodes within the hierarchy. In other words, because the static network only allows us to perform linear topological ordering and not temporal ordering, some ambiguities cannot be resolved. This is illustrated by the following example. In Fig 2, assume that node 8 is not regulated by node 6, i.e., the edge from node 6 to node 8 is removed. The vertex sort algorithm will still place nodes 7 and 8 at the same level because they form a SCC. One can argue that node 7 needs to be placed at a higher level than node 8 because node 8 will be activated only after node 7 gets activated by node 6. It needs to be emphasized that the network depicted in Fig 2 is a static structure capturing regulatory events that occur at different time points with possibly different combinatorial logic (AND, OR, etc) operating on a node. This would immediately mean that there are two possible temporal order of activations: (a) node 8 gets activated only after node 7 gets activated and (b) node 7 gets activated only after it receives signals from both 6 and 8 (note that 8 has no in-degree). Given the static network structure alone, it is almost impossible to resolve these two possibilities with certainty. In other words, ambiguities such as these cannot be resolved unless temporal data is made available. PHO2’s position in the hierarchy PHO2 is regulated by ABF1, which is regulated by MBP1 (MBP1ABF1PHO2). MBP1 and ABF1 occupy the toplevels 7 and 6 in the hierarchy, respectively (Fig 3C). Downstream, PHO2 regulates transcription factors ACA1, CRZ1, and YPR196W (all occupying the bottom layer), and tens of other target genes. This particular regulatory pathway, which goes through PHO2, does not involve core TFs. This leaves PHO2 as a component by itself parallel to the core, which makes PHO2’s position in the hierarchy, as defined by the 3 layers, unclear as one could theoretically position PHO2 in the core or bottom layer and still maintain the linear/topological ordering of TFs. 4 BFS-level algorithm on Yeast transcription network A Level 4 ARG81 MAC1 NDT80 Level 3 ADR1 GLN3 OAF1 AFT2* GTS1 PDC2◄ ARG80 HMS1 PIP2 ASH1 INO4 RPN4 AZF1 IXR1 SPT23 CAT8 LEU3 TOS8* CUP9 MAL33 TYE7 DAL81 MCM1◄ YAP1 DAL82 MIG1 FKH2* MIG2 Level 2 ABF1*◄ FHL1*◄ HAL9 MBP1* PHD1* ROX1 STP1 XBP1 ACE2 FKH1 HAP1 MET31 PHO2 RPH1 STP2 YAP3 AFT1* FLO8* HAP4 MET32 PHO4 RTG1 SUM1 YAP5* ARO80 FZF1 HCM1* MET4◄ PLM2* RTG3 SUT1 YAP6* ARR1 GAL4 HMLALPHA2 MGA1* PUT3 SIP4 SWI4* YAP7* CBF1* GAT1 HMRA1 MSN2* RAP1*◄ SKN7* SWI5 YDR026C CHA4 GAT3 HMRA2 MSN4* REB1*◄ SKO1 TEC1* YHP1 CIN5* GCN4* HMS2 NRG1* RGT1 SMP1 TOS4* YOX1* DAL80 GZF3 HSF1*◄ OTU1 RIM101 SOK2* UGA3 ZAP1 DAT1 HAC1 INO2 PDR1 RME1 STE12* UME6* Level 1 ACA1 IME1 PDR8 STB5 YER130C BAS1 LYS14 PPR1 STP4 YER184C CAD1 MAL13 RDR1 THI2 YJL206C CRZ1 MATALPHA1 RDS1 UPC2 YKR064W CST6 MET28 RGM1 URC2 YML081W CUP2 MIG3 RLM1 USV1 YPR196W ECM22 MOT3 SFL1 WAR1 YRM1 EDS1 MSN1 SFP1 YBL054W YRR1 GCR1◄ MSS11 SRD1 YDR049W HMLALPHA1 PDR3 STB4 YDR266C TFs with zero in-degree B ◄ Essential genes * Regulatory hubs An instance of inaccurate hierarchy as inferred by the BFS-level algorithm Level 3 OAF1 Level 2 UME6 Level 2 GAT1 NDT80 Level 4 Level 4 ARG81 AFT2 Level 3 Level 3 CUP9 AFT1 Level 2 Level 2 YAP6 RDS1 ACA1 IME1 XBP1 YER130C RGM1 STB4 YML081W Level 1 Supplementary Figure S1. (A) Hierarchy inferred by the BFS-level algorithm on the yeast regulatory network. TFs colored in red, green, and blue correspond to those in the top-, core- and bottom-layers of the hierarchy inferred using the vertex sortex algorithm (Fig 3B in the main text). TFs colored in black are not regulated and do not regulate other TF in the network. TFs colored in green, which are part of a strongly connected component (SCC) and should rightfully occupy the same level in any hierarchy, are dispersed across different levels of the hierarchy inferred by the BFS-level algorithm. TFs with zero in-degree (those that are not regulated by any other TFs in the network), which should all rightfully occupy the top level of any hierarchy, are dispersed across different levels of the hierarchy inferred by the BFS-level algorithm. (B) An instance of inaccurate hierarchy as inferred by the BFS-level algorithm. The actual hierarchy is displayed as a graph with the hierarchical levels of TFs as inferred by the BFS-level algorithm by the side. Clearly OAF1 and UME6 are above ARG81 in the actual hierarchy. However, the BFS-level algorithm reports ARG81 to be above OAF1 and UME1 in the hierarchy it infers, which is obviously incorrect. 5 0.12 Observed Frequency 0.10 0.08 0.06 0.04 Random 0.02 87 85 83 81 79 77 75 73 71 69 67 65 63 61 59 57 55 0.00 Number of TFs in the largest SCC Supplementary Figure S2. Number of TFs in the SCC of the yeast transcription regulatory network is statistically smaller (p < 2.8x10-3) than what one would expect to see in a random network of same size and degree distribution. 6 Supplementary Figure S3. Percentages of TFs, in each layer, whose deletion is either lethal or has moderate to strong negative effect. There is a strong positive correlation between TF fitness and the how high the TF is in the hierarchy regardless of the major growth conditions. YPD – 1% Bacto-peptone, 2% yeast extract and 2% glucose; YPDGE – 0.1% glucose, 3% glycerol and 2% ethanol; YPG – 3% glycerol; YPE – 2% ethanol; YPL – 2% lactate (see Methods for definition of fitness categories). 7 tRNA adaptation index tRNA adaptation index Supplementary Figure S4. Distribution of tRNA adaptation index values for TFs in each of the three layers of the inferred hierarchy. No statistical significant differences exist between the distributions. 8 A B Protein abundance Protein degradation Protein half-life [min] Protein abundance [mol/cell] p < 0.065 C D Protein Noise Protein noise [distance from median CV] 16 8 0 % TFs containing TATA box 24 Transcriptional Noise Top Core Bottom Supplementary Figure S5. Dynamic properties of TFs in relation to the hierarchy inferred by BFS-level algorithm. Distribution of TF values in each of the three layers of the inferred hierarchy for (A) protein abundance (copies/cell), (B) protein half-life (mins), and (D) Noise in protein abundance (variability in protein levels in a cell population). (C) Percentages of TFs in each of the three hierarchical layers containing a TATA box. The expected percentage is shown as a dashed line (22%). The x-axis in (D) denotes protein noise measured as the distance from median co-efficient of variation of all proteins (DM; see Methods) 9 Supplementary Table S1. Overalp of TFs in the corresponding layers of hierarchies inferred by the vertex sort and the BFS-level algorithms. N = 158 Top Core Bottom Unclassified Vertex sort n1 25 64 59 10 BFS-level n2 31 79 48 - Overlap x 9 45 39 - 10 Obs/Exp (x*N)/(n1*n2) 1.8 1.4 2.2 - p-value (Hypergeom. Distr.) 0.029 2.2e-5 5.5e-14 - 4. References Balazsi G, Barabasi AL, Oltvai ZN (2005) Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli. Proc Natl Acad Sci U S A 102: 7841-7846. Ma HW, Buer J, Zeng AP (2004a) Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach. BMC Bioinformatics 5: 199. Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP (2004b) An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Res 32: 6643-6649. Yu H, Gerstein M (2006) Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci U S A 103: 14724-14731. 11
© Copyright 2026 Paperzz