Network building algorithms 1 2 3 4 5 6 General considerations....................................................................................................................................3 1.1 Terminology...........................................................................................................................................3 1.2 Pre-filters, algorithms and parameters ...................................................................................................3 Network building steps ...................................................................................................................................3 2.1 Pre-filtering step.....................................................................................................................................3 2.2 Transformation step ...............................................................................................................................3 2.3 Building requested network ...................................................................................................................4 The Network Options window ........................................................................................................................4 3.1 Structure of the window .........................................................................................................................4 3.2 The textual and graphical Help screens .................................................................................................5 3.3 Upper pane .............................................................................................................................................6 3.4 Pre-filters pane .......................................................................................................................................6 3.5 Standard options pane (selecting an algorithm) .....................................................................................7 3.6 Advanced options pane ..........................................................................................................................7 3.7 Network objects pane.............................................................................................................................8 Pre-filters ........................................................................................................................................................8 4.1 Choose a tissue (according to the Applied Biosystems’ taxonomy) ......................................................9 4.2 Choose a subcellular localization...........................................................................................................9 4.3 Choose species .......................................................................................................................................9 4.4 Choose orthologs....................................................................................................................................9 4.5 Choose object types ...............................................................................................................................9 4.6 Choose interaction types ........................................................................................................................9 Pre-filters on the “Advanced options” pane..................................................................................................10 5.1 Filter interactions by confidence level .................................................................................................10 5.2 Use unspecified reactions.....................................................................................................................10 5.3 Use indirect interactions.......................................................................................................................10 5.4 Use binding interactions.......................................................................................................................10 Network building algorithms (“Standard Options” pane).............................................................................11 6.1 Direct interactions ................................................................................................................................11 6.2 Auto Expand ........................................................................................................................................12 6.3 Shortest paths .......................................................................................................................................13 6.4 Self regulation ......................................................................................................................................14 6.5 Expand by one interaction....................................................................................................................15 6.6 Manual expand .....................................................................................................................................15 6.7 Analyze network ..................................................................................................................................16 6.8 Transcription regulation .......................................................................................................................16 6.9 Analyze network (transcription factors / receptors) .............................................................................16 1 7 8 Advanced options and their use in the algorithms ........................................................................................17 7.1 Discard objects.....................................................................................................................................17 7.2 Number of steps in the path .................................................................................................................18 7.3 Edges in Shortest Paths ........................................................................................................................18 7.4 Use interaction weights ........................................................................................................................18 7.5 Assign weights to interactions (“Edges weights”) ...............................................................................19 7.6 Show non-connected root nodes ..........................................................................................................20 7.7 Number of nodes in a fragment............................................................................................................20 7.8 Add complementary objects.................................................................................................................21 7.9 Group networks....................................................................................................................................21 Showing high-throughput experimental data on networks ...........................................................................21 2 1 1.1 General considerations Terminology It should be recalled that the term “network” is used in our technical and user documents in two different senses: (1) the “full network”, i.e. the graph comprising all the objects and their interactions that are currently present in the MetaCore database; (2) any “specific network”, built of such objects and interactions only, that have been either specified by the user in an “initial list of objects”, or relate in some way to those initial objects, directly or indirectly, according to user-specified criteria. Therefore every network in the second sense (“specific network”) is a sub-graph of the network in the first sense, i.e. of the “full network”. In this technical document, the term “network” is sometimes used in a yet third “intermediate” sense (1a): “prefiltered network”, that is, a graph consisting of all those objects and their interactions that have been selected from the MetaCore database according to user-specified “pre-filters”. “Networks” in plural will always signify “specific networks” built according to user-defined criteria. 1.2 Pre-filters, algorithms and parameters User-defined criteria for building specific networks are of two different kinds: a) Criteria concerning the types of objects/interactions to be selected from the database irrespective of their relations to the initial objects, that is, criteria that may be verified by considering characteristics of every database object or interaction without even knowing the list of initial objects. Such criteria are called pre-filters. b) Criteria concerning relations of objects/interactions being considered for inclusion in a specific network, to objects from a user-specified initial list. Such relations may be direct, i.e. found in the database tables for a given initial object, or indirect, i.e. multi-step paths leading to the initial object via one or more “intermediate” objects. These criteria are set by the user when specifying which algorithm should be used for network building. After specifying an algorithm, the user can modify some of its characteristics by specifying advanced options pertaining to the algorithm (each of those options has a predefined default setting). 2 Network building steps Building of any network according to user-specified criteria comprises the following technical steps (which are not discernible to the user): 1) Pre-filtering, which consists of loading all those objects and interactions from the database that pass the user-specified pre-filters. 2) Transformation of the “pre-filtered network” loaded. 3) Building a specific network (or networks) by applying the user-specified algorithm and advanced options. 2.1 Pre-filtering step During the pre-filtering step, all such objects and interactions are loaded from the MetaCore database, which satisfy all the pre-filters currently specified by the user (pre-filters are described in full details in the subsequent sections). During this process, information on individual objects and interactions is assembled from a large number of different database tables and kept together. In this way, a pre-filtered full network is dynamically constructed, which is still too large for being displayed in any way. 2.2 Transformation step Some local transformations are then applied to the pre-filtered network built in the previous step. They are necessary to better calculate shortest paths and to select appropriate elements when building the specific network requested by the user. Today, only two such transformation procedures are implemented. One of them splits some types of interactions to be considered in both directions (see section 5.4). The second one concerns nodes representing groups of objects (see hereafter). 3 Splitting a node that represents a group of objects: a b a1 a b1 a1 G e1 b f1 e2 e1 E1 G2 G1 E2 E1 f2 b1 e2 E2 This drawing schematically shows a fragment of a pre-filtered network around a node representing a group of objects, before the transformation (on the left), and after the transformation (on the right). The node G represents a group of two or more similar objects E1, E2… (only two are shown on the drawing). Arrows e1, e2,… go from those objects-members to the group. Arrows a, a1,…, b, b1,… denote interactions of any object in the group with some other objects; they are supposed to be valid for every member of the group. On the right, the same network fragment is shown after transformation. The group G is now represented by paired nodes G1 and G2 (always 2 nodes, irrespective of the number of group members). The node G1 receives all the incoming interaction arrows shown on the left figure, while the node G2 starts all the outgoing interaction arrows of the left figure. The old membership arrows e1, e2… now enter the node G2, while the node G1 starts a new set of outgoing membership arrows f1, f2… towards every group member. The new modified network is used in the following way. Suppose we want to consider the path a1-G1-b1 as the path applied to one specific member E1 of the group G1. Using the “transformed” path, we will instead consider a more detailed path a1-G1-f1-E1-e1-G2-b1. The membership arrows f1 and e1 are considered as having zero length, thus not increasing artificially the total length of the path. Such a “node split” may become necessary, in particular, when an interaction arrow, say a1, is valid only for some of the group members (e.g. for E1 but not for E2). 2.3 Building requested network A specific network is then built using only such objects and interactions that have passed through pre-filters in Step 1 (and modified whenever necessary in Step 2). Conceptually this process can be thought as delimiting some sub-graph in a potentially very large pre-filtered graph that has already been built in Step 1. Various network building algorithms (described in Section 6 in more detail) apply different methods and criteria when searching for most relevant objects/interactions, sorting them out, finding shortest paths between nodes etc. Typically, the network is built starting from the initial user-specified list of root objects by gradually adding their close or more distant neighbors. Whether every root object remains in the resulting network or not at the end of this process depends on the algorithm and options that have been selected by the user. 3 3.1 The Network Options window Structure of the window The Network Options window has 5 panes: the upper pane (always open) and for other panes that can be individually folded or unfolded by the user. When this window is called (e.g. by clicking “Build network for selected objects” in the upper-right corner of the “Selected network objects” pane in the “Signaling networks” window), it opens with the “Standard Options” pane unfolded, and other three panes folded: 4 The “Standard options” pane in fact is that one where the user selects one specific network building algorithm. Note: The “Pre-filters” and “Advanced options” panes also contain some presets – see appropriate sections below. Clicking on the red “Build network” button launches the user-selected (or preset) algorithm with user-selected (or preset) options. Note: Each of the four algorithms on the upper two lines of the “Standard options” pane build a series of networks, and not a single one. 3.2 The textual and graphical Help screens The “Network options” window, as other MetaCore windows, has a “Help” button in its upper-left corner. A click on this button opens a new window with a general explanation on various network building methods. In addition, many of the “Standard options” and “Advanced options” bear a question mark; clicking on this mark opens a new window with a contextual help in a purely graphical form, with no textual explanations. The same screen opens in every case, scrolled up to appropriate schematic image. Hereafter some of those schematic images are given to illustrate various network algorithms. All those schemes use the same legend: 5 “Root nodes” denote “objects from the initial list” (user defined). 3.3 Upper pane The first line asks the user whether to build the requested network in a new window (default): The second line is for specifying semantic type of the network to build. Default stands for merged (metabolic + regulatory) network: The third line asks whether to use or not the canonical pathways, in vernacular called noodles. Default stands for using noodles. Canonical pathways are linear sequences of interactions having specific biologic meaning and manually curated by biologists. The MetaCore database actually contains descriptions of more than 50 000 canonical pathways. To check each and every noodle for its relevance to the network in course of construction takes very much time, esp. when the initial list of root objects is relatively long; for this reason, a warning appears in line with the button: A noodle is a directed sequence of network nodes and edges between them. On the network being built, a noodle is emphasized by coloring appropriate edges. Most of the network building algorithms consider every noodle as a “shortcut” between its two extremities, with the total length always considered equal to 1. 3.4 Pre-filters pane Only half of the available pre-filters are accessible on this pane (see section 0 for full details). The other half is accessible on the Advanced options pane (see section 3.6). At start, the pane is folded (not visible). By default, no filters are active. When you activate any of them (by clicking Select), a pop-up window opens presenting a list of options to select from. 6 3.5 Standard options pane (selecting an algorithm) The four algorithms that are proposed on the upper two lines of this pane will build a series of networks rather than a single one (see their description in sections 0 – 6.9): Each of the remaining algorithms builds exactly one network (see their descriptions in sections 6.1 – 6.6): Note: The upper group of four algorithms (“Analyze…”) in certain cases is inactive and not displayed, for example, if we come here by pressing the “Rebuild” button on the screen of an already built network. 3.6 Advanced options pane This pane contains some additional pre-filters (described in details in Section 5) and also a list of additional options (see Section 7). Pre-filters on this pane differ in behavior from those of the “Pre-filters” pane in the following important aspect: the edges (interactions) that are filtered out by any of these pre-filters will not be considered by network building algorithms, but may optionally be added to the network already built (the “Show but not use” column): All these 4 pre-filters may be activated with any of the network building algorithms. The set of additional options that appear on the “Advanced options” pane at any given moment varies depending on the network algorithm. For every algorithm selected on the “Standard options” pane these options propose to set specific conditions of execution of the algorithm. In contrast, the options that do not affect the current algorithm are not displayed on the pane. Hereafter the full list of those options is given (this list never appears in full): 7 All these options are discussed in detail in Section 7. 3.7 Network objects pane This pane allows the user to specify desired position of objects on the paths being built by the currently selected algorithm. There are 4 columns of checkboxes; using the first 3 of them, the user can individually specify for every object cited in the “Name” column, whether the paths that start from that object (resp. pass through, or arrive to) should be considered when building the desired network. These conditions are logically OR-ed, so that every path will be included that satisfies at least one of the conditions checked. On the figure below, for example, the checkboxes mark for inclusion those paths that start from “PREG1”, or go to “PRPK”, or pass through “PIG11”, or start from “PP2C”: The fourth column is “Avoid”; a box checked in this column signifies that the corresponding object should not appear on any path in the network being built. Boxes checked in this column should be considered as OR-ed between them, and finally, their whole combination should be appended by “AND NOT” to the OR-ed combination of all the boxes checked in the first three columns. Presently, “Shortest paths” (see section 6.3) is the only algorithm where all the four columns are active and may be used as above described. All other algorithms propose only the through and avoid columns, while from and to become inactive. In those other algorithms, the through column simply signifies that on every path that otherwise satisfies all other conditions of the algorithm, at least one of the checked objects should be found. By default, the pane shows the initial list of root objects; the through column is fully checked, the other columns are fully unchecked. The user can add more objects by clicking “Add network objects” (the “Search network objects” pop-up window opens). Three buttons on top of each column are: check all (left button), un-check all (right button) or check/uncheck all (central flip-flop button). 4 Pre-filters Pre-filters operate at an early stage of loading objects and interactions from the MetaCore database, by “rejecting” some of the objects and interactions that should not even be considered as potential nodes and edges of the requested network(s). Rejecting a node naturally implies rejection of all edges that start or end at that node. If, after rejecting a number of nodes/edges, some non-rejected node becomes isolated (orphan node), that 8 node is also rejected, though only conditionally: there is a user option to return isolated nodes back into the network. At present, 5 pre-filters are defined on the “Pre-filters” pane; they are described in this section. Yet 4 more prefilters are accessible on the “Advanced options” pane (see section 5). Pre-filters from the “Pre-filters” pane: By default (at start) all the 5 pre-filters are switched off. Every filter may be individually switched on, by clicking its check-box. This will activate the appropriate “Select” button; a click on it opens a popup with a list where one or more options may be selected, or even all of them. Initially, all the options are unchecked in the list of every pre-filter except “subcellular localization”, in which the list is fully checked. 4.1 Choose a tissue (according to the Applied Biosystems’ taxonomy) Only those objects will be loaded from the database that are explicitly marked in the database tables as localized in at least one of the tissues (selected by the user from the popup list of tissues). Other (non-selected) localizations of the same object do not matter. Note: When this filter is enabled, an object for which no localization at all is specified in the database, will not pass even the whole list of tissues is checked. To load such objects with unspecified localization, this pre-filter should be disabled. Note: The Applied Biosystems’ list of tissues comprises about 30 entries. GeneGo biologists use (not systematically) a much more detailed system of approximately 1000 different tissue specifications. These data, however, sleep in the MetaCore database without ever being shown to users. In contrast, the entire Applied Biosystems’ gene/protein localization table has been put into MetaCore, and is explored by this pre-filter. The two tissue classification systems exist therefore in parallel. 4.2 Choose a subcellular localization This pre-filter operates similarly to the previously described one, and uses standard subcellular localizations taxonomy. By default, the whole list of localizations is selected. 4.3 Choose species This pre-filter contains the options Homo sapiens, Mus musculus, and Rattus norvegicus. The choice here restricts the pre-filtered network to only those objects linked to genes of the chosen species. 4.4 Choose orthologs For this pre-filter, there is a notion “Human by default”. The exact meaning of this pre-filter is: load from the database those Human objects for which there are orthologs for the organism that has been selected. For example, if the user has selected Mouse, then Human objects having Mouse orthologs will be accepted and all other objects will be rejected. 4.5 Choose object types To select the types of objects to be accepted by this pre-filter, the same GeneGo list of object types is proposed that is illustrated on the “Network Legend”, which opens when clicking the Legend button on the network window. 4.6 Choose interaction types Similar to the previous pre-filter; interaction types are to be selected from the GeneGo list illustrated on the Network Legend. 9 5 Pre-filters on the “Advanced options” pane These pre-filters apply to edges of the network, not to its nodes; also, they are presented to the user in a different way than the pre-filters of the previous section. There are three columns of radio-buttons; the button in the first column, when hit, sets the strongest filtering condition (it is set by default). If the user hits Show but not use instead, then the interactions that have been rejected by the filter will be nevertheless loaded, but will not be taken into account when performing the algorithm specified. These interactions will be finally added to the network when it is already built (provided that both ends of such interaction belong to that network). If the Use option is selected instead, those low priority interactions will be not only displayed on the network, but also will be used by the network building algorithm. Hereafter those pre-filters are described in more details, including the additional button in the last line. Illustrative schemes are taken from the graphical help pop-up that opens when hitting the question mark in the corresponding line. 5.1 Filter interactions by confidence level By default, only those interactions are loaded from the database, which have been manually curated; they are considered to be the most reliable ones. The least reliable (low-trust) are data obtained using automated parsing of texts in English (natural language processing, NLP). Note: There are about 10 different “trust levels” that are set by GeneGo biologists when annotating articles. Strictly speaking, those 10 trust levels are not fully ordered; there is only some rough idea of whether a given trust is low, medium or high. They are primarily used by biologists themselves, e.g. when creating or updating a map. In contrast, they are presently NOT USED by MetaCore network building algorithms. What is used in MetaCore (the “Filter interactions by confidence level” pre-filter from the “Advanced Options” panel), is just one of those 10 trust levels which stands for “Natural Language Processing” (NLP), that is, annotations created by AUTOMATIC parsers of textual input. All other 9 levels are “manually curated”, that is, set by biologists in course of MANUAL annotation. By default, NLP-created interactions are filtered out, but the user has an option to include them as well. 5.2 Use unspecified reactions By default, interactions with both type and mechanism being marked in the database as unspecified are not loaded. The user can switch to using these interactions for network building (see scheme below), or just to showing them on the network already built. 5.3 Use indirect interactions By default indirect interactions are not loaded from the database. The user can switch to using these interactions for network building (no illustrative scheme for this option on the graphical help page), or just to showing them on the network already built. Indirect interactions are interactions marked in the database as having the mechanism influence on expression, or unspecified. 5.4 Use binding interactions The exact meaning of this option is: use interactions with binding mechanism even if their type is not specified (marked as unspecified), or specified as complex formation (see scheme below). By default: not use such interactions. The option of using them is split here into two sub-options: normal, when such interaction is used only in the direction in which it is specified in the database; and bidirectional, when it is considered as defined in both directions. 10 In the latter case (“bidirectional”), after performing the pre-filtering step, a supplementary operation is performed during the network transformation step (see section 2.2). It consists in splitting bidirectional edges into a pair of unidirectional ones. Namely, edges corresponding to binding interactions of the above mentioned types, and which are in essence non-directional (their “main” direction is specified in the database for pure formality) are considered during the network building as a pair of edges in opposite directions. Those added edges going into opposite direction are used when necessary by network building algorithms, but they are not shown on the network built. 6 Network building algorithms (“Standard Options” pane) We will first describe the algorithms that build one network at a time. Those algorithms that build a whole series of networks will be considered starting from the subsection 0. 6.1 Direct interactions No real algorithmic work is performed in this case. Simply, a graph is built whose nodes are all “root objects” (that is, objects from the initial list), and whose edges are all interactions between those root objects (that have passed through the pre-filters specified): 11 6.2 Auto Expand A network is built “around” the initial list of root objects. All edges are considered as directional by the algorithm; outgoing and incoming paths are explored separately during the network expansion process. Two criteria are used by the network for selecting the “most relevant” nodes and edges expanding around the root nodes: proximity of a node, and “traffic”, or flow trough the node. This is explained hereafter in more detail. Each of the root nodes is considered having the “flow” through it equal to 1. These root nodes are the origin of a “wave front” that is then iteratively calculated and used by the algorithm. First, consider network expansion process in the outgoing path direction. If a node with the flow value n has k outgoing edges, then the outgoing flow is equally distributed between those edges, so that every edge receives n/k. Furthermore, if a node has some number of incoming edges with already calculated flow values, then those flow values are summed up to obtain the total flow through that node. The flow value 1 is considered to be the maximum, so if the sum of incoming flows exceeds 1, the result will be reduced to 1. (This limitation, in particular, makes the algorithm performing even in presence of looping paths). Consider an example. Let the nodes A and B each have the flow = 1; let the node A has the only outgoing edge leading to C, that is, the whole flow from A is directed to C, so C receives flow=1; and let the node B has two outgoing edges, one to C another to D, that is, each of C and D receive ½ of flow from B. We assume that C and D have no other incoming edges. Then, the total flow through D is ½, and the total flow through C is 1 + ½ = 1 because 1 is the upper limit. This is illustrated on the following scheme, where the total flow through a node is shown in parentheses after the node name: 1 1 A (1) 1 ½ C (1+ ½ =1) B (1) ½ D (½) At every step of the algorithm, only the closest neighbors are considered, that is, the nodes that are one outgoing edge from some of the nodes already included into the network. From these neighboring nodes, the nodes having the highest flow value are selected, and added to the network and to the “wave front”. This process is iterated until the total number of nodes exceeds some pre-established limit; the default value is shown on the “Advanced options” pane, and may be changed by the user: 12 The same algorithm is then performed for the second time, but applied to edges in opposite direction, and both results are merged into one common network. In order for the final network to have the number of nodes as close as possible to the pre-established limit, the limit value is divided by 1½ for each of the two edge directions (not by 2, because several nodes will be considered for both the outgoing and the incoming edge directions). For example, if the limit is set to 50, then not more than 34 nodes will be included in the network at each of the two steps, making the total (statistically) close to 50. It should be noted that if the initial set of nodes is greater than 2/3 of the limit, then the above described algorithm gives the same result as the “Direct interactions” algorithm that does not consider flow through nodes. 6.3 Shortest paths By checking the boxes in appropriate columns, the user selects two sub-lists from the initial list of nodes: the “From” list of nodes (every path should start from some of them), and the “To” list of nodes (every path should end at some of them). The two lists may intersect or even coincide. Additionally, a third sub-list “Through” may be defined; every path should pass through at least one of those nodes: If however the user checks nodes in the “Through” column only, this means (in this given context only) specifying the same nodes as both “From” and “To”. For every node in the “From” list, a set of shortest paths is built to the nodes of the “To” list. Namely, for every from-to pair of nodes, the set of shortest paths is built using the Dijkstra algorithm. If for a given from-to pair there are more than one paths of the same minimal length, then all of them are included and will be shown. Strictly speaking, not the shortest paths are shown but the edges they consist from; the paths may then be “imagined” by the user (on the scheme hereafter they are shown in green): 13 The maximum length for a path to be considered for inclusion is set by the user in the “Number of steps in the path” option (see 7.2). The process of path building is stopped when the path length achieves this maximum value. Also, there is a possibility to include into the resulting network not all the possible edges between the nodes found, but only those that constitute the shortest paths between the nodes (the “Edges in Shortest Paths” option, see 7.3). On the network built, the “from” nodes are put in green circles, the “to” nodes in red circles, and the “trough” nodes in violet circles. Nodes participating in the two or three lists are shown in two- or three-color circles – see hereafter a fragment of the “Network legend” page: 6.4 Self regulation For a user-specified initial set of nodes, another set is found containing nodes that relate to those initial nodes in some specific way. Namely, all those nodes are considered from which there exist “transcription regulation” edges towards the initial nodes, thus forming the list of “transcription factors”. Then, the “Shortest paths” algorithm is applied (see 6.3), where the “from” list is the initial set of nodes, and the “to” list is the list of the transcription factors. 14 As in the basic “Shortest paths” algorithm, this algorithm accepts the “Number of steps in the paths” option (see 7.2) and the “Edges on Shortest Paths” option (see 7.3). 6.5 Expand by one interaction The requested network is built by considering all the outgoing and incoming edges (i.e. interactions) for the nodes of the initial list, and complementing the initial list with all those edges and all nodes at the other end of the edges (thus located at one-interaction distance from the initial nodes): Obviously, all the edges and nodes being added should satisfy all the pre-filters set before the network building starts. 6.6 Manual expand This algorithm functions in a similar way as Expand by One Interaction. However, this algorithm will expand around each initial node by the number of interactions specified in the drop-down menu. 15 6.7 Analyze network The algorithm starts with building a “large network” by applying a simplified version of the “Auto Expand” algorithm (see 6.2) to the initial list of objects. The limit for the number of edges is set to 2 or 3 times the size of the initial list (the degree of saturation of the network with expression data markers depends on that limit). Expansion is first done in the outgoing sense of edges; incoming edges are considered only if the limit has not been achieved using the outgoing edges only. Then, the large network is “cut” into smaller fragments. This is done in a cyclical manner, i.e. fragments are created sequentially one by one. Edges used in a fragment are never reused in subsequent fragments; nodes may be reused, but with different edges leading to them in different fragments. Creation of every next fragment starts with selecting the “most connected” node, i.e. the node that has the largest number of links to the neighboring nodes. Only those links are considered that have not yet been used in previous fragments. This most connected node is taken as the “pole” of a new fragment. A “node queue” is used, where nodes are always ordered from the most connected to the least connected. When starting creation of a new fragment, the queue is empty. First, all direct neighbors of the pole node are put in the queue. Then, the least connected node is extracted from the queue and added to the network, while its neighbors are added to the queue, and the queue is subsequently reordered. This process is performed till the queue empties (which signifies that some connected component of the total network has been selected altogether) or till the number of nodes in the network currently being built reaches the limit value (defined by the user in the “Number of nodes in a fragment” advanced option; default value is 50). Finally, all edges between the nodes of the new fragment are put into the fragment and deleted from the large network. Nodes that have been “stripped” of all their edges are also removed from the large network, the queue is emptied, and the construction of a next fragment starts as described above. 6.8 Transcription regulation This algorithm starts with a small sub-network that consists of the initial list of objects plus all the “immediate transcription factors” for those initial objects, i.e. the objects that are linked to at least one of the initial objects by an edge of the “transcription regulation” type. Then, a separate network is built around every such transcription factor, using the Auto Expand algorithm (see 6.2) along the incoming edges taken in the opposite direction, and adding the transcription factor’s targets from the initial list; every such network is limited by the “Number of nodes in a fragment” advanced option (default value is 50). The algorithm delivers a list of networks, one per transcription factor. 6.9 Analyze network (transcription factors / receptors) Both algorithms start with creating two further lists of objects for the initial list of objects: the list of transcription factors and the list of receptors. The first list comprises, as in 6.8, the starting nodes of the “transcription regulation” edges ending at some of the initial objects (their targets). The second list consists of the ending nodes of the “receptor binding” edges starting at some of the initial objects (their ligands). The three lists (initial, transcription factors, and receptors) may have intersections. Then, the edges of the network receive different weights, according to the option described in 7.5. These weights are used to calculate path lengths. As a special feature, the length of any canonical path (“noodle”) is considered equal to 1 regardless of the number of edges in the noodle. 16 The next step of the algorithm consists in finding shortest paths from those receptors to those transcription factors. (Note: the present algorithm does not search for the paths in the opposite direction, i.e. from the receptors to the transcription factors). At the final stage both algorithms operate differently but similarly. The first algorithm (“Analyze network, transcription factors”) for every transcription factor looks for the closest receptor and delivers a network consisting of all the shortest paths from that receptor to that transcription factor. The algorithm delivers one specific network per each transcription factor in the list. Similarly, the second algorithm (“Analyze network, receptors”) for every receptor looks for the closest transcription factor and delivers a network consisting of all the shortest paths from that receptor to that transcription factor. The algorithm delivers one specific network per each receptor in the list. For example, let us consider the following table of distances (shortest paths lengths) between 4 transcription factors and 3 receptors: SP length R1 R2 R3 TF1 1 5 - TF2 3 7 2 TF3 5 3 3 TF4 4 - 2 Then the first algorithm will deliver four networks each consisting of shortest paths, resp., R1->TF1, R3->TF2, R2,3->TF3 and R3->TF4. The second will deliver three networks R1->TF1, R2->TF3 and R3-TF2,4. We can see that the path R2->TF2 will not be returned by either of the algorithms. Every network built by these algorithms may optionally be enriched with, resp., the ligands of the receptors and the targets of the transcription factors (the “Add complementary objects” advanced option, see 7.8). The networks thus obtained may then be grouped, and merged within every group (the “Group networks” advanced option, see 7.9). Namely, if we are building one network for every transcription factor, then all such networks with the same receptors are grouped, and every such group is then merged into one larger network. Similarly, if we are building one network for every receptor, then all the networks with the same transcription factors are grouped and merged within each group. 7 7.1 Advanced options and their use in the algorithms Discard objects This option is active with every algorithm. The user may select none, or both, or any one of the options. If the option “Experiments” is checked, then the current network will comprise only those objects that are present in the currently active experiment files, showing an expression not less than the threshold established by the appropriate parameters. If the option “User list” is checked, then the network will consist of the user-selected objects only. When both options are selected, the network will consist of those objects among the use-selected list that are present in the active experiment files and have an expression greater than the pre-established threshold. Selection of edges to be included in the network is straightforward: all those edges (interactions) between the selected nodes (objects) will be included that satisfy the criteria set by the chosen algorithm and by the other options. 17 7.2 Number of steps in the path This option is active with the “Shortest paths” (6.3) and “Self regulation” (6.4) algorithms. The user may set a maximum length for searching shortest paths between nodes. Paths of greater length between pairs of nodes will not be considered at all. For example, if this parameter is set to 5, and there exist between some nodes A and B two paths of length 3 and two paths of length 4, and there exist no path shorter than 3 (and satisfying all the prefilters specified), then the only paths to be included in the network will be those paths of length 2. If, in contrast, the “Number of steps in the path” is set to 4, and there is no paths shorter than 5 between A and B (after pre-filtering), then no path at all will be drawn between A and B. It is to be noted that, depending on whether we consider directed or non-directed edges, the “shortest path” concept will be applied to both sets of paths from A to B and from B to A independently, or to one common set of paths regardless their direction. 7.3 Edges in Shortest Paths This option is active with the “Shortest paths” (6.3) and “Self regulation” (6.4) algorithms. It suggests showing on the network only those edges that participate in the shortest paths being built by the algorithm, rather than showing all possible edges between the objects included in the network. 7.4 Use interaction weights This option may be used with all algorithms except “Analyze network (transcription factors)” and “Analyze network (receptors)”. Every interaction is assigned a weight depending on its mechanism and effect. If this option is selected, then shortest paths are calculated using the weight of every edge (interaction): greater is the weight, longer the edge. Presently the weights are rigidly pre-assigned to every <mechanism, effect> pair – see table below; the user cannot reassign weights. Effect → -1 0 1 2 6 7 Mechanism ↓ tech unspecified activation inhibition decrease amount increase amount -1 tech 0,1 9999 9999 9999 9999 9999 0 unspecified 9999 100 10 10 10 10 2 covalent 9999 10 1 1 3 3 18 modification 3 +P 9999 10 1 1 100 100 4 -P 9999 10 1 1 100 100 5 bind 10 10 1 1 100 100 6 competition 9999 10 30 2 300 300 7 transformation 9999 100 100 100 100 100 8 cleavage 9999 100 20 3 1 2 9 transcription regulation (TR) 9999 10 1 1 9999 5 relation 9999 0,1 9999 9999 9999 9999 12 influence on 9999 expression (IE) 500 100 100 9999 300 14 complex subunit 9999 (CS) 0,1 9999 9999 9999 9999 15 catalysis 9999 1 9999 9999 9999 9999 16 transport 9999 3 10 10 3 3 17 receptor binding 9999 1 1 1 9999 9999 10 class (CR) Notes: 1) Weights should be understood as distances between the two ends of a link. Therefore, lesser is the weight of an interaction (having the appropriate end already selected for the inclusion in the network being built), more are chances to include the interaction, and hence its opposite end, into the network. 2) Consequently, probability of including links with greater weights decreases accordingly. Inclusion of links weighted “9999” is highly improbable. 3) Links with very low weight (less than 1) practically do not increase the total length of a path. Both ends of such a link may be considered as one node. 4) In the table above, pink-colored data xxxxxx mean that the corresponding combination of an effect and a mechanism presently never appears in our database. 5) The “9999” weight signifies that the corresponding effect-mechanism combination should not appear. 6) Green-colored data xxxxxx indicate (presumably) preferred effect-mechanism interactions are the first candidates for inclusion into the network being built. combinations. Such 7) Effect and mechanism both equal to -1 denote a link between a metabolite and a reaction. 8) Effect equal to -1 where mechanism is “bind” denotes links that have been automatically uploaded using the JnJ parser. 7.5 Assign weights to interactions (“Edges weights”) This option is active with the two algorithms to which the previously described option “Use interaction weights” (7.4) does not apply, namely, to the “Analyze network (transcription factors)” and “Analyze network (receptors)” (6.9) algorithms. • • Inside – edges between initial nodes. Nearest – edges going from or to an initial node. 19 • • Outside – edges between nodes which do not belong to the initial list. Forbidden – edges between the transcription factors found and their targets, or between the receptors found and their ligands. Weights by default (shown in the option selection pane) are selected in a way to give preference to paths passing through edges from the initial list and not passing through other receptors or transcription factors. If needed, other weights may be selected by the user. The length of any canonical path (CP, “noodle”) is considered equal to 1. This applies only to a whole CP, from end to end. This value cannot be changed by the user. Therefore, for any path, if a portion of it represents a canonical path, then the length of the covered portion is 1. This may lead sometimes to difference in length calculations when the path encounters several mutually recovering noodles; in such a case, the minimum possible length is taken. Consider the following example: CP1 A B C D E F G H I J K L CP2 This path from A to L is partly covered by two different CPs, one from C to I (CP1), another from E to J (CP2). The total length of the path may be calculated either as the sum AB+BC+CP1+IJ+JK+KL or as the sum AB+BC+CD+DE+CP2+JK+KL. Of the two results, the shortest is taken. Moreover, not considering any CP at all may sometimes give even shorter result, e.g. if in the above example all the edges making up CP2 are “internal”, thus having a length close to 0 (in the default length setting). Such a result will also be taken for length comparison. 7.6 Show non-connected root nodes This option is available in all the 6 algorithms that build one network (6.1–6.6), and not available in the algorithms that build a set of networks. The following diagram illustrates how it works: 7.7 Number of nodes in a fragment . This option is available with the following algorithms: “Auto expand” (6.2), “Manual expand” (6.6), “Analyze network” (6.7), “Transcription regulation” (6.8). It sets the maximum limit of nodes in the network to build (in the first two cases) or in each network fragment (in the last two cases). The limit is maintained only approximately, because the graph expansion results from a series of local operations around its individual 20 nodes; so, when the limit is exceeded, there are no criteria to decide which of the nodes and edges are to be dropped, and which ones should be maintained. 7.8 Add complementary objects This option works with the two mutually symmetrical algorithms, “Analyze network (transcription factors)” and “Analyze network (receptors)” (6.9). Both algorithms build a set of network fragments, each of which draws shortest paths between some receptor (sometimes several receptors at once, in the first algorithm) and some transcription factor (sometimes several transcription factors at once, in the second algorithm). When this option is set, each network fragment shows, in the first algorithm, the ligand(s) for that fragment’s receptor, and in the second algorithm, the target(s) of that fragment’s transcription factor. Both ligands and targets are added to network fragments build by either of the two algorithms. 7.9 Group networks This option, as the previous one, works with the “Analyze network (transcription factors)” and “Analyze network (receptors)” algorithms (6.9). In both cases it allows to decrease the number of separate network fragments being built, by grouping some fragments into larger ones. Namely, if we were building network fragments for transcription factors, then the set of fragment thus obtained would initially contain one fragment per each transcription factor. With this option activated, the set of fragments is divided into several groups, every group containing network fragments for transcription factors having the same receptor. Then every such group of fragments is replaced with one larger fragment obtained by merging all the fragments of the group. Symmetrically, if we were building network fragments for receptors, one per each receptor, then the set of all those fragments is divided into groups, every group containing fragments for different receptors corresponding to the same transcription factor, and every such group is then merged into one larger fragment. 8 Showing high-throughput experimental data on networks Every network object (node) has its associated genes. This makes it possible displaying “HT data” (highthroughput data from experiments) on the networks. These data are shown as multicolored circles near some of the network nodes, namely, the nodes corresponding to the genes that have notably changed their expression in at least one of the experiments (taking into account the threshold value and the p-value set for the experiment). As always, only those experiments are considered that have been uploaded and activated (in the Data Manager window) before network building starts. HT data are also shown on maps, but here they look as a range of “thermometers”. On the networks, a different display method has been chosen because a network typically contains much more nodes than a typical map, so a range of thermometers would hardly be readable. For this reason, HT data are shown on networks as colored circles that express only qualitative (increased/decreased) and not quantitative (by which value) data. Specific case is a network node corresponding to a protein complex. In such a case, the node is marked with a HT data circle only if every sub-unit of the complex has some data from the same experiment. 21
© Copyright 2026 Paperzz