Network building algorithms

Network building algorithms
1
2
3
4
5
6
General considerations....................................................................................................................................3
1.1
Terminology...........................................................................................................................................3
1.2
Pre-filters, algorithms and parameters ...................................................................................................3
Network building steps ...................................................................................................................................3
2.1
Pre-filtering step.....................................................................................................................................3
2.2
Transformation step ...............................................................................................................................3
2.3
Building requested network ...................................................................................................................4
The Network Options window ........................................................................................................................4
3.1
Structure of the window .........................................................................................................................4
3.2
The textual and graphical Help screens .................................................................................................5
3.3
Upper pane .............................................................................................................................................6
3.4
Pre-filters pane .......................................................................................................................................6
3.5
Standard options pane (selecting an algorithm) .....................................................................................7
3.6
Advanced options pane ..........................................................................................................................7
3.7
Network objects pane.............................................................................................................................8
Pre-filters ........................................................................................................................................................8
4.1
Choose a tissue (according to the Applied Biosystems’ taxonomy) ......................................................9
4.2
Choose a subcellular localization...........................................................................................................9
4.3
Choose species .......................................................................................................................................9
4.4
Choose orthologs....................................................................................................................................9
4.5
Choose object types ...............................................................................................................................9
4.6
Choose interaction types ........................................................................................................................9
Pre-filters on the “Advanced options” pane..................................................................................................10
5.1
Filter interactions by confidence level .................................................................................................10
5.2
Use unspecified reactions.....................................................................................................................10
5.3
Use indirect interactions.......................................................................................................................10
5.4
Use binding interactions.......................................................................................................................10
Network building algorithms (“Standard Options” pane).............................................................................11
6.1
Direct interactions ................................................................................................................................11
6.2
Auto Expand ........................................................................................................................................12
6.3
Shortest paths .......................................................................................................................................13
6.4
Self regulation ......................................................................................................................................14
6.5
Expand by one interaction....................................................................................................................15
6.6
Manual expand .....................................................................................................................................15
6.7
Analyze network ..................................................................................................................................16
6.8
Transcription regulation .......................................................................................................................16
6.9
Analyze network (transcription factors / receptors) .............................................................................16
1
7
8
Advanced options and their use in the algorithms ........................................................................................17
7.1
Discard objects.....................................................................................................................................17
7.2
Number of steps in the path .................................................................................................................18
7.3
Edges in Shortest Paths ........................................................................................................................18
7.4
Use interaction weights ........................................................................................................................18
7.5
Assign weights to interactions (“Edges weights”) ...............................................................................19
7.6
Show non-connected root nodes ..........................................................................................................20
7.7
Number of nodes in a fragment............................................................................................................20
7.8
Add complementary objects.................................................................................................................21
7.9
Group networks....................................................................................................................................21
Showing high-throughput experimental data on networks ...........................................................................21
2
1
1.1
General considerations
Terminology
It should be recalled that the term “network” is used in our technical and user documents in two different senses:
(1) the “full network”, i.e. the graph comprising all the objects and their interactions that are currently present in
the MetaCore database; (2) any “specific network”, built of such objects and interactions only, that have been
either specified by the user in an “initial list of objects”, or relate in some way to those initial objects, directly or
indirectly, according to user-specified criteria. Therefore every network in the second sense (“specific network”)
is a sub-graph of the network in the first sense, i.e. of the “full network”.
In this technical document, the term “network” is sometimes used in a yet third “intermediate” sense (1a): “prefiltered network”, that is, a graph consisting of all those objects and their interactions that have been selected
from the MetaCore database according to user-specified “pre-filters”.
“Networks” in plural will always signify “specific networks” built according to user-defined criteria.
1.2
Pre-filters, algorithms and parameters
User-defined criteria for building specific networks are of two different kinds:
a) Criteria concerning the types of objects/interactions to be selected from the database irrespective of their
relations to the initial objects, that is, criteria that may be verified by considering characteristics of every
database object or interaction without even knowing the list of initial objects. Such criteria are called pre-filters.
b) Criteria concerning relations of objects/interactions being considered for inclusion in a specific network, to
objects from a user-specified initial list. Such relations may be direct, i.e. found in the database tables for a
given initial object, or indirect, i.e. multi-step paths leading to the initial object via one or more “intermediate”
objects. These criteria are set by the user when specifying which algorithm should be used for network building.
After specifying an algorithm, the user can modify some of its characteristics by specifying advanced options
pertaining to the algorithm (each of those options has a predefined default setting).
2
Network building steps
Building of any network according to user-specified criteria comprises the following technical steps (which are
not discernible to the user):
1) Pre-filtering, which consists of loading all those objects and interactions from the database that pass the
user-specified pre-filters.
2) Transformation of the “pre-filtered network” loaded.
3) Building a specific network (or networks) by applying the user-specified algorithm and advanced
options.
2.1
Pre-filtering step
During the pre-filtering step, all such objects and interactions are loaded from the MetaCore database, which
satisfy all the pre-filters currently specified by the user (pre-filters are described in full details in the subsequent
sections). During this process, information on individual objects and interactions is assembled from a large
number of different database tables and kept together. In this way, a pre-filtered full network is dynamically
constructed, which is still too large for being displayed in any way.
2.2
Transformation step
Some local transformations are then applied to the pre-filtered network built in the previous step. They are
necessary to better calculate shortest paths and to select appropriate elements when building the specific
network requested by the user.
Today, only two such transformation procedures are implemented. One of them splits some types of interactions
to be considered in both directions (see section 5.4). The second one concerns nodes representing groups of
objects (see hereafter).
3
Splitting a node that represents a group of objects:
a b
a1
a
b1
a1
G
e1
b
f1
e2
e1
E1
G2
G1
E2
E1
f2
b1
e2
E2
This drawing schematically shows a fragment of a pre-filtered network around a node representing a group of
objects, before the transformation (on the left), and after the transformation (on the right). The node G
represents a group of two or more similar objects E1, E2… (only two are shown on the drawing). Arrows e1,
e2,… go from those objects-members to the group. Arrows a, a1,…, b, b1,… denote interactions of any object
in the group with some other objects; they are supposed to be valid for every member of the group.
On the right, the same network fragment is shown after transformation. The group G is now represented by
paired nodes G1 and G2 (always 2 nodes, irrespective of the number of group members). The node G1 receives
all the incoming interaction arrows shown on the left figure, while the node G2 starts all the outgoing interaction
arrows of the left figure. The old membership arrows e1, e2… now enter the node G2, while the node G1 starts
a new set of outgoing membership arrows f1, f2… towards every group member.
The new modified network is used in the following way. Suppose we want to consider the path a1-G1-b1 as the
path applied to one specific member E1 of the group G1. Using the “transformed” path, we will instead consider
a more detailed path a1-G1-f1-E1-e1-G2-b1. The membership arrows f1 and e1 are considered as having zero
length, thus not increasing artificially the total length of the path.
Such a “node split” may become necessary, in particular, when an interaction arrow, say a1, is valid only for
some of the group members (e.g. for E1 but not for E2).
2.3
Building requested network
A specific network is then built using only such objects and interactions that have passed through pre-filters in
Step 1 (and modified whenever necessary in Step 2). Conceptually this process can be thought as delimiting
some sub-graph in a potentially very large pre-filtered graph that has already been built in Step 1. Various
network building algorithms (described in Section 6 in more detail) apply different methods and criteria when
searching for most relevant objects/interactions, sorting them out, finding shortest paths between nodes etc.
Typically, the network is built starting from the initial user-specified list of root objects by gradually adding
their close or more distant neighbors. Whether every root object remains in the resulting network or not at the
end of this process depends on the algorithm and options that have been selected by the user.
3
3.1
The Network Options window
Structure of the window
The Network Options window has 5 panes: the upper pane (always open) and for other panes that can be
individually folded or unfolded by the user. When this window is called (e.g. by clicking “Build network for
selected objects” in the upper-right corner of the “Selected network objects” pane in the “Signaling networks”
window), it opens with the “Standard Options” pane unfolded, and other three panes folded:
4
The “Standard options” pane in fact is that one where the user selects one specific network building algorithm.
Note: The “Pre-filters” and “Advanced options” panes also contain some presets – see appropriate sections
below.
Clicking on the red “Build network” button launches the user-selected (or preset) algorithm with user-selected
(or preset) options. Note: Each of the four algorithms on the upper two lines of the “Standard options” pane
build a series of networks, and not a single one.
3.2
The textual and graphical Help screens
The “Network options” window, as other MetaCore windows, has a “Help” button in its upper-left corner. A
click on this button opens a new window with a general explanation on various network building methods.
In addition, many of the “Standard options” and “Advanced options” bear a question mark; clicking on this
mark opens a new window with a contextual help in a purely graphical form, with no textual explanations. The
same screen opens in every case, scrolled up to appropriate schematic image. Hereafter some of those schematic
images are given to illustrate various network algorithms. All those schemes use the same legend:
5
“Root nodes” denote “objects from the initial list” (user defined).
3.3
Upper pane
The first line asks the user whether to build the requested network in a new window (default):
The second line is for specifying semantic type of the network to build. Default stands for merged (metabolic +
regulatory) network:
The third line asks whether to use or not the canonical pathways, in vernacular called noodles. Default stands
for using noodles. Canonical pathways are linear sequences of interactions having specific biologic meaning and
manually curated by biologists. The MetaCore database actually contains descriptions of more than 50 000
canonical pathways. To check each and every noodle for its relevance to the network in course of construction
takes very much time, esp. when the initial list of root objects is relatively long; for this reason, a warning
appears in line with the button:
A noodle is a directed sequence of network nodes and edges between them. On the network being built, a
noodle is emphasized by coloring appropriate edges. Most of the network building algorithms consider every
noodle as a “shortcut” between its two extremities, with the total length always considered equal to 1.
3.4
Pre-filters pane
Only half of the available pre-filters are accessible on this pane (see section 0 for full details). The other half is
accessible on the Advanced options pane (see section 3.6).
At start, the pane is folded (not visible). By default, no filters are active. When you activate any of them (by
clicking Select), a pop-up window opens presenting a list of options to select from.
6
3.5
Standard options pane (selecting an algorithm)
The four algorithms that are proposed on the upper two lines of this pane will build a series of networks rather
than a single one (see their description in sections 0 – 6.9):
Each of the remaining algorithms builds exactly one network (see their descriptions in sections 6.1 – 6.6):
Note: The upper group of four algorithms (“Analyze…”) in certain cases is inactive and not displayed, for
example, if we come here by pressing the “Rebuild” button on the screen of an already built network.
3.6
Advanced options pane
This pane contains some additional pre-filters (described in details in Section 5) and also a list of additional
options (see Section 7). Pre-filters on this pane differ in behavior from those of the “Pre-filters” pane in the
following important aspect: the edges (interactions) that are filtered out by any of these pre-filters will not be
considered by network building algorithms, but may optionally be added to the network already built (the
“Show but not use” column):
All these 4 pre-filters may be activated with any of the network building algorithms.
The set of additional options that appear on the “Advanced options” pane at any given moment varies
depending on the network algorithm. For every algorithm selected on the “Standard options” pane these options
propose to set specific conditions of execution of the algorithm. In contrast, the options that do not affect the
current algorithm are not displayed on the pane.
Hereafter the full list of those options is given (this list never appears in full):
7
All these options are discussed in detail in Section 7.
3.7
Network objects pane
This pane allows the user to specify desired position of objects on the paths being built by the currently selected
algorithm. There are 4 columns of checkboxes; using the first 3 of them, the user can individually specify for
every object cited in the “Name” column, whether the paths that start from that object (resp. pass through, or
arrive to) should be considered when building the desired network. These conditions are logically OR-ed, so
that every path will be included that satisfies at least one of the conditions checked. On the figure below, for
example, the checkboxes mark for inclusion those paths that start from “PREG1”, or go to “PRPK”, or pass
through “PIG11”, or start from “PP2C”:
The fourth column is “Avoid”; a box checked in this column signifies that the corresponding object should not
appear on any path in the network being built. Boxes checked in this column should be considered as OR-ed
between them, and finally, their whole combination should be appended by “AND NOT” to the OR-ed
combination of all the boxes checked in the first three columns.
Presently, “Shortest paths” (see section 6.3) is the only algorithm where all the four columns are active and may
be used as above described. All other algorithms propose only the through and avoid columns, while from and
to become inactive. In those other algorithms, the through column simply signifies that on every path that
otherwise satisfies all other conditions of the algorithm, at least one of the checked objects should be found.
By default, the pane shows the initial list of root objects; the through column is fully checked, the other columns
are fully unchecked. The user can add more objects by clicking “Add network objects” (the “Search network
objects” pop-up window opens).
Three buttons on top of each column are: check all (left button), un-check all (right button) or check/uncheck all
(central flip-flop button).
4
Pre-filters
Pre-filters operate at an early stage of loading objects and interactions from the MetaCore database, by
“rejecting” some of the objects and interactions that should not even be considered as potential nodes and edges
of the requested network(s). Rejecting a node naturally implies rejection of all edges that start or end at that
node. If, after rejecting a number of nodes/edges, some non-rejected node becomes isolated (orphan node), that
8
node is also rejected, though only conditionally: there is a user option to return isolated nodes back into the
network.
At present, 5 pre-filters are defined on the “Pre-filters” pane; they are described in this section. Yet 4 more prefilters are accessible on the “Advanced options” pane (see section 5).
Pre-filters from the “Pre-filters” pane:
By default (at start) all the 5 pre-filters are switched off. Every filter may be individually switched on, by
clicking its check-box. This will activate the appropriate “Select” button; a click on it opens a popup with a list
where one or more options may be selected, or even all of them. Initially, all the options are unchecked in the
list of every pre-filter except “subcellular localization”, in which the list is fully checked.
4.1
Choose a tissue (according to the Applied Biosystems’ taxonomy)
Only those objects will be loaded from the database that are explicitly marked in the database tables as localized
in at least one of the tissues (selected by the user from the popup list of tissues). Other (non-selected)
localizations of the same object do not matter.
Note: When this filter is enabled, an object for which no localization at all is specified in the database, will not
pass even the whole list of tissues is checked. To load such objects with unspecified localization, this pre-filter
should be disabled.
Note: The Applied Biosystems’ list of tissues comprises about 30 entries. GeneGo biologists use (not
systematically) a much more detailed system of approximately 1000 different tissue specifications. These data,
however, sleep in the MetaCore database without ever being shown to users. In contrast, the entire Applied
Biosystems’ gene/protein localization table has been put into MetaCore, and is explored by this pre-filter. The
two tissue classification systems exist therefore in parallel.
4.2
Choose a subcellular localization
This pre-filter operates similarly to the previously described one, and uses standard subcellular localizations
taxonomy. By default, the whole list of localizations is selected.
4.3
Choose species
This pre-filter contains the options Homo sapiens, Mus musculus, and Rattus norvegicus. The choice here
restricts the pre-filtered network to only those objects linked to genes of the chosen species.
4.4
Choose orthologs
For this pre-filter, there is a notion “Human by default”. The exact meaning of this pre-filter is: load from the
database those Human objects for which there are orthologs for the organism that has been selected. For
example, if the user has selected Mouse, then Human objects having Mouse orthologs will be accepted and all
other objects will be rejected.
4.5
Choose object types
To select the types of objects to be accepted by this pre-filter, the same GeneGo list of object types is proposed
that is illustrated on the “Network Legend”, which opens when clicking the Legend button on the network
window.
4.6
Choose interaction types
Similar to the previous pre-filter; interaction types are to be selected from the GeneGo list illustrated on the
Network Legend.
9
5
Pre-filters on the “Advanced options” pane
These pre-filters apply to edges of the network, not to its nodes; also, they are presented to the user in a different
way than the pre-filters of the previous section. There are three columns of radio-buttons; the button in the first
column, when hit, sets the strongest filtering condition (it is set by default). If the user hits Show but not use
instead, then the interactions that have been rejected by the filter will be nevertheless loaded, but will not be
taken into account when performing the algorithm specified. These interactions will be finally added to the
network when it is already built (provided that both ends of such interaction belong to that network). If the Use
option is selected instead, those low priority interactions will be not only displayed on the network, but also will
be used by the network building algorithm.
Hereafter those pre-filters are described in more details, including the additional button in the last line.
Illustrative schemes are taken from the graphical help pop-up that opens when hitting the question mark in the
corresponding line.
5.1
Filter interactions by confidence level
By default, only those interactions are loaded from the database, which have been manually curated; they are
considered to be the most reliable ones. The least reliable (low-trust) are data obtained using automated parsing
of texts in English (natural language processing, NLP).
Note: There are about 10 different “trust levels” that are set by GeneGo biologists when annotating articles.
Strictly speaking, those 10 trust levels are not fully ordered; there is only some rough idea of whether a given
trust is low, medium or high. They are primarily used by biologists themselves, e.g. when creating or updating a
map. In contrast, they are presently NOT USED by MetaCore network building algorithms. What is used in
MetaCore (the “Filter interactions by confidence level” pre-filter from the “Advanced Options” panel), is just
one of those 10 trust levels which stands for “Natural Language Processing” (NLP), that is, annotations created
by AUTOMATIC parsers of textual input. All other 9 levels are “manually curated”, that is, set by biologists in
course of MANUAL annotation. By default, NLP-created interactions are filtered out, but the user has an option
to include them as well.
5.2
Use unspecified reactions
By default, interactions with both type and mechanism being marked in the database as unspecified are not
loaded. The user can switch to using these interactions for network building (see scheme below), or just to
showing them on the network already built.
5.3
Use indirect interactions
By default indirect interactions are not loaded from the database. The user can switch to using these interactions
for network building (no illustrative scheme for this option on the graphical help page), or just to showing them
on the network already built. Indirect interactions are interactions marked in the database as having the
mechanism influence on expression, or unspecified.
5.4
Use binding interactions
The exact meaning of this option is: use interactions with binding mechanism even if their type is not specified
(marked as unspecified), or specified as complex formation (see scheme below). By default: not use such
interactions. The option of using them is split here into two sub-options: normal, when such interaction is used
only in the direction in which it is specified in the database; and bidirectional, when it is considered as defined
in both directions.
10
In the latter case (“bidirectional”), after performing the pre-filtering step, a supplementary operation is
performed during the network transformation step (see section 2.2). It consists in splitting bidirectional edges
into a pair of unidirectional ones. Namely, edges corresponding to binding interactions of the above mentioned
types, and which are in essence non-directional (their “main” direction is specified in the database for pure
formality) are considered during the network building as a pair of edges in opposite directions. Those added
edges going into opposite direction are used when necessary by network building algorithms, but they are not
shown on the network built.
6
Network building algorithms (“Standard Options” pane)
We will first describe the algorithms that build one network at a time. Those algorithms that build a whole series
of networks will be considered starting from the subsection 0.
6.1
Direct interactions
No real algorithmic work is performed in this case. Simply, a graph is built whose nodes are all “root objects”
(that is, objects from the initial list), and whose edges are all interactions between those root objects (that have
passed through the pre-filters specified):
11
6.2
Auto Expand
A network is built “around” the initial list of root objects. All edges are considered as directional by the
algorithm; outgoing and incoming paths are explored separately during the network expansion process.
Two criteria are used by the network for selecting the “most relevant” nodes and edges expanding around the
root nodes: proximity of a node, and “traffic”, or flow trough the node. This is explained hereafter in more
detail.
Each of the root nodes is considered having the “flow” through it equal to 1. These root nodes are the origin of a
“wave front” that is then iteratively calculated and used by the algorithm.
First, consider network expansion process in the outgoing path direction. If a node with the flow value n has k
outgoing edges, then the outgoing flow is equally distributed between those edges, so that every edge receives
n/k. Furthermore, if a node has some number of incoming edges with already calculated flow values, then those
flow values are summed up to obtain the total flow through that node. The flow value 1 is considered to be the
maximum, so if the sum of incoming flows exceeds 1, the result will be reduced to 1. (This limitation, in
particular, makes the algorithm performing even in presence of looping paths).
Consider an example. Let the nodes A and B each have the flow = 1; let the node A has the only outgoing edge
leading to C, that is, the whole flow from A is directed to C, so C receives flow=1; and let the node B has two
outgoing edges, one to C another to D, that is, each of C and D receive ½ of flow from B. We assume that C and
D have no other incoming edges. Then, the total flow through D is ½, and the total flow through C is 1 + ½ = 1
because 1 is the upper limit. This is illustrated on the following scheme, where the total flow through a node is
shown in parentheses after the node name:
1
1
A (1)
1
½
C (1+ ½ =1)
B (1)
½
D (½)
At every step of the algorithm, only the closest neighbors are considered, that is, the nodes that are one outgoing
edge from some of the nodes already included into the network. From these neighboring nodes, the nodes
having the highest flow value are selected, and added to the network and to the “wave front”. This process is
iterated until the total number of nodes exceeds some pre-established limit; the default value is shown on the
“Advanced options” pane, and may be changed by the user:
12
The same algorithm is then performed for the second time, but applied to edges in opposite direction, and both
results are merged into one common network. In order for the final network to have the number of nodes as
close as possible to the pre-established limit, the limit value is divided by 1½ for each of the two edge directions
(not by 2, because several nodes will be considered for both the outgoing and the incoming edge directions). For
example, if the limit is set to 50, then not more than 34 nodes will be included in the network at each of the two
steps, making the total (statistically) close to 50.
It should be noted that if the initial set of nodes is greater than 2/3 of the limit, then the above described
algorithm gives the same result as the “Direct interactions” algorithm that does not consider flow through nodes.
6.3
Shortest paths
By checking the boxes in appropriate columns, the user selects two sub-lists from the initial list of nodes: the
“From” list of nodes (every path should start from some of them), and the “To” list of nodes (every path should
end at some of them). The two lists may intersect or even coincide. Additionally, a third sub-list “Through”
may be defined; every path should pass through at least one of those nodes:
If however the user checks nodes in the “Through” column only, this means (in this given context only)
specifying the same nodes as both “From” and “To”.
For every node in the “From” list, a set of shortest paths is built to the nodes of the “To” list. Namely, for every
from-to pair of nodes, the set of shortest paths is built using the Dijkstra algorithm. If for a given from-to pair
there are more than one paths of the same minimal length, then all of them are included and will be shown.
Strictly speaking, not the shortest paths are shown but the edges they consist from; the paths may then be
“imagined” by the user (on the scheme hereafter they are shown in green):
13
The maximum length for a path to be considered for inclusion is set by the user in the “Number of steps in the
path” option (see 7.2). The process of path building is stopped when the path length achieves this maximum
value.
Also, there is a possibility to include into the resulting network not all the possible edges between the nodes
found, but only those that constitute the shortest paths between the nodes (the “Edges in Shortest Paths” option,
see 7.3).
On the network built, the “from” nodes are put in green circles, the “to” nodes in red circles, and the “trough”
nodes in violet circles. Nodes participating in the two or three lists are shown in two- or three-color circles – see
hereafter a fragment of the “Network legend” page:
6.4
Self regulation
For a user-specified initial set of nodes, another set is found containing nodes that relate to those initial nodes in
some specific way. Namely, all those nodes are considered from which there exist “transcription regulation”
edges towards the initial nodes, thus forming the list of “transcription factors”. Then, the “Shortest paths”
algorithm is applied (see 6.3), where the “from” list is the initial set of nodes, and the “to” list is the list of the
transcription factors.
14
As in the basic “Shortest paths” algorithm, this algorithm accepts the “Number of steps in the paths” option (see
7.2) and the “Edges on Shortest Paths” option (see 7.3).
6.5
Expand by one interaction
The requested network is built by considering all the outgoing and incoming edges (i.e. interactions) for the
nodes of the initial list, and complementing the initial list with all those edges and all nodes at the other end of
the edges (thus located at one-interaction distance from the initial nodes):
Obviously, all the edges and nodes being added should satisfy all the pre-filters set before the network building
starts.
6.6
Manual expand
This algorithm functions in a similar way as Expand by One Interaction. However, this algorithm will expand
around each initial node by the number of interactions specified in the drop-down menu.
15
6.7
Analyze network
The algorithm starts with building a “large network” by applying a simplified version of the “Auto Expand”
algorithm (see 6.2) to the initial list of objects. The limit for the number of edges is set to 2 or 3 times the size of
the initial list (the degree of saturation of the network with expression data markers depends on that limit).
Expansion is first done in the outgoing sense of edges; incoming edges are considered only if the limit has not
been achieved using the outgoing edges only.
Then, the large network is “cut” into smaller fragments. This is done in a cyclical manner, i.e. fragments are
created sequentially one by one. Edges used in a fragment are never reused in subsequent fragments; nodes may
be reused, but with different edges leading to them in different fragments. Creation of every next fragment starts
with selecting the “most connected” node, i.e. the node that has the largest number of links to the neighboring
nodes. Only those links are considered that have not yet been used in previous fragments. This most connected
node is taken as the “pole” of a new fragment. A “node queue” is used, where nodes are always ordered from
the most connected to the least connected.
When starting creation of a new fragment, the queue is empty. First, all direct neighbors of the pole node are put
in the queue. Then, the least connected node is extracted from the queue and added to the network, while its
neighbors are added to the queue, and the queue is subsequently reordered. This process is performed till the
queue empties (which signifies that some connected component of the total network has been selected
altogether) or till the number of nodes in the network currently being built reaches the limit value (defined by
the user in the “Number of nodes in a fragment” advanced option; default value is 50).
Finally, all edges between the nodes of the new fragment are put into the fragment and deleted from the large
network. Nodes that have been “stripped” of all their edges are also removed from the large network, the queue
is emptied, and the construction of a next fragment starts as described above.
6.8
Transcription regulation
This algorithm starts with a small sub-network that consists of the initial list of objects plus all the “immediate
transcription factors” for those initial objects, i.e. the objects that are linked to at least one of the initial objects
by an edge of the “transcription regulation” type. Then, a separate network is built around every such
transcription factor, using the Auto Expand algorithm (see 6.2) along the incoming edges taken in the opposite
direction, and adding the transcription factor’s targets from the initial list; every such network is limited by the
“Number of nodes in a fragment” advanced option (default value is 50). The algorithm delivers a list of
networks, one per transcription factor.
6.9
Analyze network (transcription factors / receptors)
Both algorithms start with creating two further lists of objects for the initial list of objects: the list of
transcription factors and the list of receptors. The first list comprises, as in 6.8, the starting nodes of the
“transcription regulation” edges ending at some of the initial objects (their targets). The second list consists of
the ending nodes of the “receptor binding” edges starting at some of the initial objects (their ligands). The three
lists (initial, transcription factors, and receptors) may have intersections.
Then, the edges of the network receive different weights, according to the option described in 7.5. These
weights are used to calculate path lengths. As a special feature, the length of any canonical path (“noodle”) is
considered equal to 1 regardless of the number of edges in the noodle.
16
The next step of the algorithm consists in finding shortest paths from those receptors to those transcription
factors. (Note: the present algorithm does not search for the paths in the opposite direction, i.e. from the
receptors to the transcription factors).
At the final stage both algorithms operate differently but similarly. The first algorithm (“Analyze network,
transcription factors”) for every transcription factor looks for the closest receptor and delivers a network
consisting of all the shortest paths from that receptor to that transcription factor. The algorithm delivers one
specific network per each transcription factor in the list.
Similarly, the second algorithm (“Analyze network, receptors”) for every receptor looks for the closest
transcription factor and delivers a network consisting of all the shortest paths from that receptor to that
transcription factor. The algorithm delivers one specific network per each receptor in the list.
For example, let us consider the following table of distances (shortest paths lengths) between 4 transcription
factors and 3 receptors:
SP length
R1
R2
R3
TF1
1
5
-
TF2
3
7
2
TF3
5
3
3
TF4
4
-
2
Then the first algorithm will deliver four networks each consisting of shortest paths, resp., R1->TF1, R3->TF2,
R2,3->TF3 and R3->TF4. The second will deliver three networks R1->TF1, R2->TF3 and R3-TF2,4. We can
see that the path R2->TF2 will not be returned by either of the algorithms.
Every network built by these algorithms may optionally be enriched with, resp., the ligands of the receptors and
the targets of the transcription factors (the “Add complementary objects” advanced option, see 7.8).
The networks thus obtained may then be grouped, and merged within every group (the “Group networks”
advanced option, see 7.9). Namely, if we are building one network for every transcription factor, then all such
networks with the same receptors are grouped, and every such group is then merged into one larger network.
Similarly, if we are building one network for every receptor, then all the networks with the same transcription
factors are grouped and merged within each group.
7
7.1
Advanced options and their use in the algorithms
Discard objects
This option is active with every algorithm. The user may select none, or both, or any one of the options. If the
option “Experiments” is checked, then the current network will comprise only those objects that are present in
the currently active experiment files, showing an expression not less than the threshold established by the
appropriate parameters. If the option “User list” is checked, then the network will consist of the user-selected
objects only. When both options are selected, the network will consist of those objects among the use-selected
list that are present in the active experiment files and have an expression greater than the pre-established
threshold. Selection of edges to be included in the network is straightforward: all those edges (interactions)
between the selected nodes (objects) will be included that satisfy the criteria set by the chosen algorithm and by
the other options.
17
7.2
Number of steps in the path
This option is active with the “Shortest paths” (6.3) and “Self regulation” (6.4) algorithms. The user may set a
maximum length for searching shortest paths between nodes. Paths of greater length between pairs of nodes will
not be considered at all. For example, if this parameter is set to 5, and there exist between some nodes A and B
two paths of length 3 and two paths of length 4, and there exist no path shorter than 3 (and satisfying all the prefilters specified), then the only paths to be included in the network will be those paths of length 2.
If, in contrast, the “Number of steps in the path” is set to 4, and there is no paths shorter than 5 between A and B
(after pre-filtering), then no path at all will be drawn between A and B.
It is to be noted that, depending on whether we consider directed or non-directed edges, the “shortest path”
concept will be applied to both sets of paths from A to B and from B to A independently, or to one common set
of paths regardless their direction.
7.3
Edges in Shortest Paths
This option is active with the “Shortest paths” (6.3) and “Self regulation” (6.4) algorithms. It suggests showing
on the network only those edges that participate in the shortest paths being built by the algorithm, rather than
showing all possible edges between the objects included in the network.
7.4
Use interaction weights
This option may be used with all algorithms except “Analyze network (transcription factors)” and “Analyze
network (receptors)”. Every interaction is assigned a weight depending on its mechanism and effect. If this
option is selected, then shortest paths are calculated using the weight of every edge (interaction): greater is the
weight, longer the edge. Presently the weights are rigidly pre-assigned to every <mechanism, effect> pair – see
table below; the user cannot reassign weights.
Effect →
-1
0
1
2
6
7
Mechanism ↓
tech
unspecified activation inhibition decrease amount
increase amount
-1
tech
0,1
9999
9999
9999
9999
9999
0
unspecified
9999
100
10
10
10
10
2
covalent
9999
10
1
1
3
3
18
modification
3
+P
9999
10
1
1
100
100
4
-P
9999
10
1
1
100
100
5
bind
10
10
1
1
100
100
6
competition
9999
10
30
2
300
300
7
transformation
9999
100
100
100
100
100
8
cleavage
9999
100
20
3
1
2
9
transcription
regulation (TR)
9999
10
1
1
9999
5
relation 9999
0,1
9999
9999
9999
9999
12 influence
on 9999
expression (IE)
500
100
100
9999
300
14 complex subunit 9999
(CS)
0,1
9999
9999
9999
9999
15 catalysis
9999
1
9999
9999
9999
9999
16 transport
9999
3
10
10
3
3
17 receptor binding
9999
1
1
1
9999
9999
10 class
(CR)
Notes:
1) Weights should be understood as distances between the two ends of a link. Therefore, lesser is the weight of
an interaction (having the appropriate end already selected for the inclusion in the network being built), more
are chances to include the interaction, and hence its opposite end, into the network.
2) Consequently, probability of including links with greater weights decreases accordingly. Inclusion of links
weighted “9999” is highly improbable.
3) Links with very low weight (less than 1) practically do not increase the total length of a path. Both ends of
such a link may be considered as one node.
4) In the table above, pink-colored data xxxxxx mean that the corresponding combination of an effect and a
mechanism presently never appears in our database.
5) The “9999” weight signifies that the corresponding effect-mechanism combination should not appear.
6) Green-colored data xxxxxx indicate (presumably) preferred effect-mechanism
interactions are the first candidates for inclusion into the network being built.
combinations.
Such
7) Effect and mechanism both equal to -1 denote a link between a metabolite and a reaction.
8) Effect equal to -1 where mechanism is “bind” denotes links that have been automatically uploaded using the
JnJ parser.
7.5
Assign weights to interactions (“Edges weights”)
This option is active with the two algorithms to which the previously described option “Use interaction weights”
(7.4) does not apply, namely, to the “Analyze network (transcription factors)” and “Analyze network
(receptors)” (6.9) algorithms.
•
•
Inside – edges between initial nodes.
Nearest – edges going from or to an initial node.
19
•
•
Outside – edges between nodes which do not belong to the initial list.
Forbidden – edges between the transcription factors found and their targets, or between the receptors
found and their ligands.
Weights by default (shown in the option selection pane) are selected in a way to give preference to paths passing
through edges from the initial list and not passing through other receptors or transcription factors. If needed,
other weights may be selected by the user.
The length of any canonical path (CP, “noodle”) is considered equal to 1. This applies only to a whole CP,
from end to end. This value cannot be changed by the user. Therefore, for any path, if a portion of it represents a
canonical path, then the length of the covered portion is 1. This may lead sometimes to difference in length
calculations when the path encounters several mutually recovering noodles; in such a case, the minimum
possible length is taken. Consider the following example:
CP1
A
B
C
D
E
F
G
H
I
J
K
L
CP2
This path from A to L is partly covered by two different CPs, one from C to I (CP1), another from E to J (CP2).
The total length of the path may be calculated either as the sum AB+BC+CP1+IJ+JK+KL or as the sum
AB+BC+CD+DE+CP2+JK+KL. Of the two results, the shortest is taken. Moreover, not considering any CP at
all may sometimes give even shorter result, e.g. if in the above example all the edges making up CP2 are
“internal”, thus having a length close to 0 (in the default length setting). Such a result will also be taken for
length comparison.
7.6
Show non-connected root nodes
This option is available in all the 6 algorithms that build one network (6.1–6.6), and not available in the
algorithms that build a set of networks. The following diagram illustrates how it works:
7.7
Number of nodes in a fragment
.
This option is available with the following algorithms: “Auto expand” (6.2), “Manual expand” (6.6), “Analyze
network” (6.7), “Transcription regulation” (6.8). It sets the maximum limit of nodes in the network to build (in
the first two cases) or in each network fragment (in the last two cases). The limit is maintained only
approximately, because the graph expansion results from a series of local operations around its individual
20
nodes; so, when the limit is exceeded, there are no criteria to decide which of the nodes and edges are to be
dropped, and which ones should be maintained.
7.8
Add complementary objects
This option works with the two mutually symmetrical algorithms, “Analyze network (transcription factors)” and
“Analyze network (receptors)” (6.9). Both algorithms build a set of network fragments, each of which draws
shortest paths between some receptor (sometimes several receptors at once, in the first algorithm) and some
transcription factor (sometimes several transcription factors at once, in the second algorithm). When this option
is set, each network fragment shows, in the first algorithm, the ligand(s) for that fragment’s receptor, and in the
second algorithm, the target(s) of that fragment’s transcription factor. Both ligands and targets are added to
network fragments build by either of the two algorithms.
7.9
Group networks
This option, as the previous one, works with the “Analyze network (transcription factors)” and “Analyze
network (receptors)” algorithms (6.9). In both cases it allows to decrease the number of separate network
fragments being built, by grouping some fragments into larger ones.
Namely, if we were building network fragments for transcription factors, then the set of fragment thus obtained
would initially contain one fragment per each transcription factor. With this option activated, the set of
fragments is divided into several groups, every group containing network fragments for transcription factors
having the same receptor. Then every such group of fragments is replaced with one larger fragment obtained by
merging all the fragments of the group. Symmetrically, if we were building network fragments for receptors,
one per each receptor, then the set of all those fragments is divided into groups, every group containing
fragments for different receptors corresponding to the same transcription factor, and every such group is then
merged into one larger fragment.
8
Showing high-throughput experimental data on networks
Every network object (node) has its associated genes. This makes it possible displaying “HT data” (highthroughput data from experiments) on the networks. These data are shown as multicolored circles near some of
the network nodes, namely, the nodes corresponding to the genes that have notably changed their expression in
at least one of the experiments (taking into account the threshold value and the p-value set for the experiment).
As always, only those experiments are considered that have been uploaded and activated (in the Data Manager
window) before network building starts.
HT data are also shown on maps, but here they look as a range of “thermometers”. On the networks, a different
display method has been chosen because a network typically contains much more nodes than a typical map, so a
range of thermometers would hardly be readable. For this reason, HT data are shown on networks as colored
circles that express only qualitative (increased/decreased) and not quantitative (by which value) data.
Specific case is a network node corresponding to a protein complex. In such a case, the node is marked with a
HT data circle only if every sub-unit of the complex has some data from the same experiment.
21

Download Report

Network building algorithms

Paperzz.com

Your Paperzz