Current innovations and future challenges of

B RIEFINGS IN BIOINF ORMATICS . VOL 16. NO 3. 497^525
Advance Access published on 24 June 2014
doi:10.1093/bib/bbu021
Current innovations and future
challenges of network motif detection
Ngoc Tam L. Tran, Sominder Mohan, Zhuoqing Xu and Chun-Hsi Huang
Submitted: 20th January 2014; Received (in revised form) : 25th May 2014
Abstract
Network motif detection is the search for statistically overrepresented subgraphs present in a larger target network. They are thought to represent key structure and control mechanisms. Although the problem is exponential
in nature, several algorithms and tools have been developed for efficiently detecting network motifs. This work analyzes 11 network motif detection tools and algorithms. Detailed comparisons and insightful directions for using
these tools and algorithms are discussed. Key aspects of network motif detection are investigated. Network motif
types and common network motifs as well as their biological functions are discussed. Applications of network
motifs are also presented. Finally, the challenges, future improvements and future research directions for network
motif detection are also discussed.
Keywords: network motif; random network; graph isomorphism; statistical significance; network motif detection
INTRODUCTION
Network motifs were first theorized by Shen-Orr
et al. [1] as patterns of inter-connections occurring
in many different parts of a network at numbers
that are significantly higher than those in random
networks. Networks that contain motifs extremely
vary. Some examples of these networks include protein–protein interaction (PPI), gene regulation, food
webs, neuron connectivity, electronic circuits,
World Wide Web (WWW), network traffic and
social networks [2–5]. Certain network motifs such
as feed-forward loop (FFL) and bi-fan have been
shown to recur in completely different biological
networks [6, 7]. These motifs can be found in
Table 1.
Discovered network motifs are theorized to highlight key control mechanisms that regulate target
networks. By identifying key control mechanisms
in biological networks, researchers could increase
the accuracy and efficiency of medications while
speeding up their production. Network motifs
could also bridge gaps between distinct disciplines
and allow for fruitful collaboration. Two networks,
which share similar significance profiles, are thought
to have related structures and functioning methods.
Significance profile is a metric, which is used to
measure the significance of subgraphs’ frequencies
[6]. If certain networks, for instance, PPI and transistor, were found to have similar significance profiles, research could be borrowed and shared by both
biologists and very large scale integration (VLSI) designers. Each might have unique insights to offer
such as methods for design, control or analysis.
Although the study of network motif detection is
not tra nascent field, there is still a significant
amount of research needed for improving network
motif detection.
In this work, we analyze 11 network motif detection tools and algorithms. We provide detailed comparisons and insightful directions for using these tools
and algorithms. We discuss the key aspects of network motif detection, the types of network motifs
and common network motifs with their biological
functions. We also present several applications of
network motifs. Finally, we present the challenges,
future improvements and future research directions
for network motif detection.
Corresponding author. Ngoc Tam L. Tran, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT
06269-2155, USA. Tel: þ1-860-296-7533, E-mail: [email protected]
Ngoc Tam L. Tran is a graduate student at Department of Computer Science and Engineering, University of Connecticut.
Sominder Mohan is an undergraduate student at Swarthmore College.
Zhuoqing Xu is an undergraduate student at University of Connecticut.
Chun-Hsi Huang is an Associate Professor at Department of Computer Science and Engineering, University of Connecticut.
ß The Author 2014. Published by Oxford University Press. For Permissions, please email: [email protected]
498
Tran et al.
Table 1: Network motifs (Courtesy of [8 ^16])
Type
Motif
Type
Motif
Single
node
with selfedge
Pair-wise
Autoregulation
Illustration
Illustration
Positive
Negative
Positive feedback loops
Double-positive
Negative feedback loops
Double-negative
Slow
Fast
Cascade
Cascades
Hub
Single-input module
(SIM)
Bipartite
Dense overlapping
regulons (DOR)
Positive
Negative
Bi-fan
Clique
Protein clique
Interacting transcription
factors that co-regulate a
third gene
Feed-forward loop
(FFL)
Coherent type 1
Coherent type 2
Coherent type 3
Coherent type 4
Incoherent type 1
Incoherent type 2
Incoherent type 3
Incoherent type 4
Co-regulated interacting
proteins
Mixed-feedback loop
between transcription
factors that co-regulate a
gene
Biparallel
For motifs SIM, DOR, bi-fan, protein clique, interacting transcription factors that co-regulate a third gene, FFL,
co-regulated interacting proteins, and mixed-feedback loop between transcription factors that co-regulate a gene,
directed edge represents interaction between a transcription factor and its target gene and bidirectional edge connects
interacting proteins.
Current innovations and future challenges of network motif detection
DESCRIPTION OF NETWORK
MOTIF DETECTION
Network motif detection is the problem of finding
smaller graphs (motifs) within a larger graph (target
network) that correspond to certain statistical thresholds. Before stating a definition of the problem, we
introduce a few definitions below. To start, our
target network and all motifs found are represented
as graphs:
Graph
A graph is a set of verticesVconnected to each other by a set of
edges E.
Network motifs are defined as being found directly from within the target network, meaning that
the exact shape of the motif must be present somewhere in the target’s structure. Mathematically, we
can assert that a motif must be an induced subgraph
of its target network.
Induced Subgraph
Let G be a graph with vertices Vand edges E. Let V 0 V
and E 0 E. An induced subgraph H of G is a graph such that
H is defined by V 0 and E 0. In other words, an induced subgraph H is a graph that can be completely defined by vertices
and edges of G.
We know that network motif graphs are subsets of
the target graphs. However, some network motif
graphs can have different shapes but their properties
are mathematically identical. Consider two network
motif graphs: one has a star shape and the other has a
pentagon shape as in Figure 1. These network motif
graphs look nothing alike. However, they have the
same properties. For instance, both have five edges
and five vertices. The degree of corresponding vertices between them is the same. Each vertex has the
same degree of two in both graphs. Each network
Figure 1: Isomorphic graphs.
499
motif graph is fully connected. Both have the same
number of connected component, which is one.
Each pair of connected vertices between them corresponds. They have neither loop edges nor parallel
edges. Graphs that bear such similarities are known as
isomorphic.
Graph Isomorphism
Assume graphs G ¼ {V, E} and H ¼ {V 0 ,E 0 }. G and H are
isomorphic if there exists a bijective function f between Vand
0
V such that for 0each edge fu,vg 2 E there is an edge
f ðuÞ,f ðvÞ 2 E .
We can count the number of isomorphic induced
subgraphs in a target network to establish their frequency. However, we need to know the layouts of
these subgraphs in the target network. There are
three different classifications that can be used for
counting network motifs [17], and different tools
use different ones.
Frequency 1
The frequency of a subgraph is the number of times it occurs in a
target network.This is also known as subgraph frequency.
Frequency 2
Frequency also measures how motifs can be overlaid in the target
network. There are three separate measurements that can be
used to determine how freely motifs can share like elements [17]:
(i) F1 frequency is completely unrestricted and allows distinct
motifs to share both edges and vertices.
(ii) F2 frequency is more restricted, and distinct motifs can
only share vertices.
(iii) F3 frequency is the most restricted; distinct motifs cannot
share any vertices or edges.
The next step after obtaining the frequencies of
isomorphic induced subgraphs is to find out whether
or not they are significant using statistical significance
testing. By running the same frequency test on a
large number of similar random networks, we can
accumulate a large set of frequency values that provide some insight into whether or not the value obtained from the target network is significant.
The random networks are used to establish default
values for frequency and other metrics, otherwise
known as a statistical null hypothesis. Generated
random networks are also known as null-models
[18]. Testing all null models yields certain average
values for all our scores, which can be used to set
500
Tran et al.
thresholds. If a particular score from the target network breaks its determined threshold, this indicates
significant data are found.
Scoring Thresholds
Scoring thresholds are used to test whether or not subgraphs are
statistically overrepresented and can therefore be called motifs.
(i) Z-score: The Z-score of a motif is a way to measure
how many more motifs are in the target network than the average random network. It is calculated as follows [19]:
fin frand
zðmÞ ¼ pffiffiffiffiffiffiffiffiffi
s2rand
where m is a motif, fin is the number of motifs in the target network, frand and srand are the mean and standard deviation of
its appearances in the set of random networks.
(ii) P-value: The P-value of a motif is the number of
times that motif appears in a random network is equal or larger
than the number of times that motif appears in the target network divided by the total number of random networks. It is a
probability value ranging from 0 to1. A motif is considered statistically significant if it has P-value < 0.01 [6].
(iii) Significance Profile: The significance profile is
a vector of Z scores of a set of motifs, which is normalized to
length 1 [19].The significance profile of a motif i is calculated
as follows [19]:
zi
ffi
SPi ¼ rffiffiffiffiffiffiffiffiffiffi
n
P
2
zi
i¼1
where zi is the Z-score of motif i, and n is the number of motifs
in the set.
The network motif detection problem can be
stated generally as follows:
Network Motif Detection
The search for induced isomorphic subgraphs within a target
network that occur significantly more often in the target network
than in the random network using scoring thresholds.
COMPLEXITY OF NETWORK
MOTIF DETECTION
Network motif detection is computationally very
expensive. As target networks grow, there is more
room for induced subgraphs to appear. Furthermore,
as subgraphs grow larger, there are more potential
ways to overlay them within a network as well as a
larger total number of possible motifs. All of this
cost is exacerbated by the large runtime multiplier
of having to repeat the computation for a large
number of random networks. Moreover, finding
isomorphic subgraphs in the target network is an
NP problem, which is neither known to be NPComplete nor can be solved in polynomial time [6].
TYPES OF NETWORK MOTIFS
Generally, there are five types of network motifs,
which can be found in Figure 2. They are single
node with self-edge, pair-wise, cascade, hub, bipartite, and clique [8–12]. Some common network
motifs identified by each type and their biological
functions are discussed in the section below. The
illustrations of these network motifs can be found
in Table 1.
Single node with self-edge
The network motif identified for this type is autoregulation motif [10]. There are two types of autoregulation namely negative autoregulation (NAR)
and positive autoregulation (PAR) [9, 10].
The NAR is one of the most abundant network
motifs [11]. It is known about 40% of known transcription factors in Escherichia coli are NAR motifs
[11]. This network motif is also found abundant in
yeast and higher organisms [11]. In NAR motif, a
transcription factor represses the transcription of its
own gene [10]. The NAR motif has two essential
functions: (i) accelerate response time of gene circuits
and (ii) decrease cell to cell variation in protein
levels [9].
In the PAR motif, a transcription factor increases
its production’s rate. Thus, its functions are opposite
to the NAR motif in which the response time of
gene circuits is decreased and the cell to cell variation
in protein levels is increased [9].
Pair-wise
The network motif of this type can be interaction
between two connected proteins [8, 9]. The identified network motifs are positive feedback loops and
negative feedback loops [9, 10].
In the positive feedback loops, two transcription
factors regulate each other [9]. There are two types
of positive feedback loops: double-positive loop and
double-negative loop. In the double-negative loop,
Current innovations and future challenges of network motif detection
501
Figure 2: Types of network motifs (Courtesy of [8 ^12]).
two transcription factors repress each other so that
there are two steady states in which one is off and the
other is on and vice versa. In the double-positive
loop, two transcription factors activate each other.
Thus, there are two steady states: both are on or
off [9, 10].
The negative feedback loops contain interactions
between two genes or two proteins in which interactions happen on different timescales. For instance,
gene X slowly activates gene Y, which in turn rapidly inhibits gene X [10].
Cascade
The cascade network motif can be a sequence of activations of genes. When the upstream gene reaches
an appropriate threshold, it activates the downstream
gene. There are two types of cascades: positive and
negative. In positive cascade, the genes are sequentially activated. In negative cascade, the genes are
sequentially repressed [10].
Hub
The network motif identified for this type can be a
pattern of a regulator that regulates a group of target
genes [9, 12]. A regulator can also regulate itself [10].
An example of this type is the single-input module
(SIM) [9]. The main function of SIM is to control
synchronized expression of a group of genes with
shared function [9, 10].
Bipartite
The network motif identified for this type is a set of
regulators that jointly control a set of genes [9, 10].
Examples of this type include dense overlapping regulons (DOR) or multi-input motifs (MIMs) and
bi-fan [9].
The DOR contains several input regulators that
jointly regulate several output genes. This network
motif is found in E. coli and yeast. It has several functions such as carbon utilization, anaerobic growth
and stress response [9, 10].
The bi-fan network motif is also found in transcription regulation networks of E. coli and
Saccharomyces cerevisiae yeast. It contains two input
regulators that jointly regulate two output genes.
This network motif can be categorized into coherent
and incoherent bi-fan network motifs. The coherent
bi-fan network motif has both inputs as promoters
while the incoherent bi-fan network motif has one
input as a promoter and the other input as a repressor. In general, the bi-fan network motif controls the
order of signal propagation and its role can be signal
sorters, filters and synchronizers [13, 14].
Clique
This type of network motif can be a protein complex
consists of three or more proteins interacting to form
a clique [9]. Several network motifs of this type have
been detected as follows [8, 16].
Protein clique
This network motif contains three proteins interacting with each other. It is the most abundant network
motif in the PPI networks. Ninety-two percent of
the occurrences of this network motif correspond to
known protein complexes [8].
Interacting transcription factors that coregulate a third gene
This network motif has two transcription factors
interacting with each other, and they jointly regulate
a third gene. Most of the interacting transcription factors pairs of this network motif have the same
502
Tran et al.
function, which is either co-activating or co-repressing genes [8].
Feed-forward loop (FFL)
In this network motif, a transcription factor regulates
another transcription factor and both together regulate a target gene [8, 10, 15]. This network motif is
found in E. coli, yeast and other organisms [10].
There are eight types of FFLs, as each interaction
in the FFL can be either activation or repression as
in Table 1 [10]. The coherent type 1 FFL and the
incoherent type 1 FFL are the most common FFLs
[10]. The incoherent FFL has a role of sign-sensitive
accelerator. It accelerates the response time of the
target gene expression by following stimulus steps
in one direction such as from off to on but not in
the opposite direction [15]. The coherent FFL has a
role of sign-sensitive delay [15].
Co-regulated interacting proteins
In this network motif, two genes interacting with
each other and they are regulated by a common
transcription factor. This network motif is found in
many different cellular pathways [8].
Mixed-feedback loop between transcription factors that co-regulate a gene
This network motif can be a combination of two
network motifs: two transcription factors that coregulate a third gene and the feed-forward loop.
Thus, this topology allows combined regulation
methods [8].
Biparallel
In this network motif, a regulator controls two other
regulators, which co-regulate a target gene. This network motif is found in transcription and phosphorylation networks [16].
APPLICATIONS OF NETWORK
MOTIFS
Network motifs have a wide range of applications as
follows.
Network motifs have been used for identifying
application protocols in network traffic. This application supports network administrators to secure and
manage network resources. The implementation
shows that motif profiles outperform traditional profiles for correctly identify application protocols in
network traffic [4].
Similar network motifs found in different networks reveal the structural similarity between these
networks. Thus, network motifs can be used to classify networks into super superfamilies [19].
Network motifs have been employed to validate
the construction of evolutionary trees using parsimony methods. In this application, the correctness
of evolutionary trees, which are built based on the
character overlap graph, is validated by finding
under-represented network motifs called holes in
the character overlap graph. The network motifs in
this application typically are squares without crossing
edges [20].
Network motifs found in human signaling network
have been used to identify breast cancer patients. In
this application, three-node network motifs in human
signaling network have been screened for identifying
cancer-associated motifs in breast cancer samples from
normal samples. This method has higher accuracy for
identifying breast cancer patients, and it may help for
breast cancer diagnosis and therapy as well as other
types of cancer [21].
Network motifs also provide explanations for
better understanding functional roles of some genes
in gene regulation. For instance, identifying recurring miRNA that contains motifs in gene regulation
networks improves the understanding of functional
roles of miRNAs in gene regulation [22].
Network motifs also allow predicting protein–
protein interaction in the PPI network. In this application, three-node and four-node network motifs
have been used to predict the correct interaction
partner of a protein. This method achieves high
accuracy for prediction of the interactions in the protein interaction network [23].
The labeled network motifs found can be used to
predict the functions of unknown proteins in the PPI
network. In this application, network motifs are discovered based on structure and biological meanings.
The discovered network motifs are labeled so that
they can be used to predict the functions of unknown proteins in the PPI network [24].
Network motifs have been utilized for identifying
network activity. In this application, network motifs
are mapped to applications. This implementation
achieves 85% average accuracy. As a result, it improves network resource management as well as
security enforcement [25].
Another application of network motifs is that the
directed feedback loop and feed forward loop have
been identified as dominant contributors to local
Current innovations and future challenges of network motif detection
information storage capability in biological and artificial networks. Thus, the finding can explain why
some recurrent neural networks are known for good
memory performance [26].
Lastly, network motifs have been used to explore
the mechanisms of cervical carcinoma response
to epidermal growth factor (EGF) in regulation
network. Because regulation network is large and
complex for identifying which component of the
network is significant, network motifs provide
better understanding of the modularity as well as
large scale structure of the network. Thus, identifying network motifs may reveal the mechanisms
underlying the response to growth factor activation
in regulation network [27].
CLASSIFICATION OF NETWORK
MOTIF DETECTION ALGORITHMS
Network motif detection algorithms can be classified
into two categories: network-centric and motifcentric algorithms [6].
Network-centric algorithm determines the frequency of a given subgraph size k in the target network by using isomorphic subgraphs checking. It
compares this frequency obtained in the target network with the frequency in the random networks for
this subgraph to determine if it is a motif [28].
Motif-centric algorithm enumerates all possible
subgraphs size k. Then, it checks each subgraph
size k with the target network to find a match and
determines its frequency. This frequency is compared
with the frequency in the random networks for this
subgraph to determine if it is a motif [28].
Motif-centric algorithm has a drawback that it
may spend unnecessary time for checking generated
subgraphs that may not be found in the target network [6].
The method for calculating subgraphs in the network by motif-centric and network-centric algorithms can be classified into exact counting and
approximation [7]. The former method is limited
by the enormous computational task in large networks. Thus, it can find small motifs up to four
nodes and motif generalizations up to six nodes
[29]. The latter method was developed to overcome
the complexity of the exact counting method so that
it can find larger motifs [29, 30]. The exact counting
methods include exhaustive recursive search (ERS)
[3], enumerating subgraphs (ESU) [31] and compact
topological motifs [7, 32]. The methods for
503
approximation include edge sampling [3], randomized version of ESU from a search tree [33] and
tree-filtering search [7, 34]. These methods are discussed in the next section.
GENERAL AND BASIC
TECHNIQUES USED IN NETWORK
MOTIF DETECTION
Random network generations
Generating random networks is an essential step in
network motif detection because it is used to detect
motifs in the target network. The generated random
network must have the same properties such as the
number of nodes, the number of edges, the degree of
nodes and so on as the target network. There are two
common algorithms, which are switching algorithm
and matching algorithm, used for generating random
networks [35].
The switching algorithm utilizes a Markov chain
for generating a random graph of a given degree.
The algorithm uses Monte Carlo switching steps
for switching a pair of edges (A ! B, C ! D)
chosen randomly to (A ! D, C ! B) by applying
the rule that does not allow multiple edges or selfedges. This process is repeated for Q E times
where E is the number of edges in the graph and
Q has a value approximately to 100 for achieving
sufficient randomization. This algorithm can sample
networks uniformly [35].
The matching algorithm for generating random
networks contains the following steps. First, the algorithm prepares a set of nodes where each node is
assigned a set of ‘stubs’, which are half edges of incoming and outgoing edges. Next, pairs of in-stubs
and out-stubs are randomly selected and joined to
form network edges. This step allows self-edges
and repeat edges. Next, the algorithm searches for
self-edges and repeat edges and rewires them without altering the degree of any node. This step is
carried out until no self-edge or repeat edge
exists in the network. This algorithm has a drawback
that generates a biased sample of random
networks [35].
Exhaustive recursive search algorithm
The ERS is an exact algorithm, which takes the
input network in the form of adjacency matrix and
exhaustively scans the entire matrix for all type of
subgraphs of sizes 3 and 4 only. The algorithm
counts the number of appearances of each type of
504
Tran et al.
subgraph in the target network and also in the
random networks. Subsequently, it determines isomorphic subgraphs for each subgraph type. Then,
each subgraph type is assessed for its statistical significance [3].
Edge sampling algorithm
This algorithm belongs to the family of approximate
algorithms. It samples an n-node subgraph by selecting random connected edges to expand the subgraph
until a set of n nodes is reached. The algorithm
contains the following steps. First, a random edge is
selected from the network and it is expanded by
selecting random neighboring edges repeatedly
until n nodes are reached for this subgraph. To
select a random edge for expanding the subgraph’s
size by one, the algorithm keeps a list of all neighboring edges and it randomly selects an edge from
that list. This process is repeated until a subgraph of n
nodes is reached. This edge sampling algorithm is not
uniform because the probabilities of sampling different specific subgraphs are not equal even if they have
the same topology. To compensate this drawback,
the algorithm implements a correction method,
which calculates the probability P of sampling a specific subgraph to guarantee unbiased estimation
of subgraph concentrations. The algorithm calculates
the concentrations of n-node subgraphs as follows. It
assigns a score Si , which is set to zero initially to each
sample subgraph type i. Next, a weighted score W is
added to the accumulated score Si of the appropriate
subgraph type i. The estimated subgraph concentrations after ST samples are calculated as follows [30].
Ci ¼
Si
L
P
Sk
k¼1
where Si is the score of subgraph type i, L is the total
number of different subgraph types, and Sk enumerates through all the different subgraphs.
The concentration of each subgraph is used to
determine whether or not it is statistically significant
[30].
Frequent pattern finder (FPF)
The FPF algorithm searches for given size patterns
that occur with maximum frequency under a given
frequency concept. The algorithm builds a tree for
only patterns that are supported by the target graph.
It traverses the tree and examines only its promising
branches. A tree is built starting from the root that
contains the simplest possible pattern with one edge
and two vertices. The children are constructed by
having the parent’s pattern extending by one edge.
Duplicate patterns are not allowed. The canonical
label is assigned to each pattern and it is used to
identify the pattern. Isomorphic graphs are identified
if they have the same canonical label. When the
frequency of a pattern of intermediate size falls
below the frequency of a pattern of target size discovered so far, the algorithm discards this branch of
the tree. If there is a nearly maximum frequent pattern of target size found early in the search process
then it is most likely that the frequency threshold of
intermediate size patterns is discarded early. Thus,
the number of patterns to be searched decreases drastically [36].
Enumerating subgraphs algorithm
The ESU is an exact algorithm, which enumerates all
size k subgraphs. The algorithm starts with a vertex v
from the input graph and it adds vertices, one at a
time, to the VExtension set that have two properties.
First, the label of these vertices must be larger than
the label of v. Second, these vertices can only be
neighbors to a newly added vertex w and they
cannot be neighbors to a vertex already in VSubgraph.
The subgraph is extended until size k subgraph is
reached. The algorithm outputs each size k subgraph
exactly once [31].
Randomized version of ESU (RANDESU) from a search tree
The RAND-ESU is an approximate algorithm,
which was designed to overcome the drawbacks of
the nonuniform edge sampling and the expensive
biased correction method by Kashtan et al. [37]. It
can efficiently enumerate all size k subgraphs and
randomly omits some subgraphs during its execution
so that an unbiased subgraph sampling can be obtained. A general concept of enumerating all size k
subgraphs is to follow. First, it starts with a vertex v
from the input graph. Then, it extends v by adding
vertices to VExtension set that have two properties:
(i) the label of these vertices must be greater than
the label of v, and (ii) they cannot be neighbors of
a vertex in VSubgraph . This procedure results in an
ESU-tree with an important property that can be
used to efficiently sample random subgraphs uniformly. In addition, it is much faster because there
is no biased correction needed [37].
Current innovations and future challenges of network motif detection
505
Tree-filtering search
2. MAVisto
This algorithm was designed to find network motifs in
PPI networks only. First, the algorithm finds the repeated subgraphs in the network. This step is performed by finding repeated size k trees and then it
uses repeated size k trees to partition the network
graph. Subsequently, it performs graph join operation
for finding repeated size k subgraphs. Second, the algorithm verifies the frequency of repeated subgraphs
in the random networks. Finally, it determines the
uniqueness values of the repeated subgraphs using
their frequencies in the PPI network and in the
random networks. The details of this algorithm are
discussed in the subsection NeMoFinder [34].
MAVisto (Motif Analysis and Visualization tool) [40]
is network motif detection tool for biological networks. The tool was developed in 2005 for analyzing
and visualizing network motifs. MAVisto relies on an
editor called Gravisto [41] for graph visualization
and a toolkit for implementing graph algorithms.
MAVisto also employs an advanced force-directed
layout algorithm [42] for drawing networks [40].
The advanced force-directed layout algorithm is
designed for drawing aesthetically pleasing, two dimensional undirected graphs with straight edges. The
algorithm has the following characteristics. It distributes the vertices evenly. It constructs edge lengths
uniformly, and it reflects inherent symmetry.
The algorithm has the advantages of speed and simplicity [42].
MAVisto allows discovering motifs of a given size
specified by the number of nodes or the number of
edges. MAVisto uses all three different frequencies
F1 , F2 and F3 for identifying motifs. It also uses
Z-scores to measure the statistical significance of discovered motifs. MAVisto relies on the FPF algorithm
[36] discussed above for the motif finding [40].
MAVisto is written in Java. It allows Pajek-.Net[43] and GML [44] as inputs. Its output is detailed.
MAVisto contains several views for motif visualizations such as motif table, motif view, motif fingerprint and motif matches. Motif table provides
information on unique network motif label, motif’s
size, structural properties and so on. Motif view provides visualization of motif’s structure. Motif fingerprint is a diagram of motif frequency spectrum of the
target network. Motif matches view allows visual
examination of the occurrences of a motif within
the analyzed network and their matches. MAVisto
is fast for detecting motif sizes 3–5 in directed
networks by using a lookup table for isomorphic
checking [40].
MAVisto’s user friendliness and its variety of frequency thresholds make it unique, even if it does not
incorporate the fastest algorithm.
Compact topological motifs
Discovering topological motifs using compact notation is an exact counting method. This method was
designed to overcome the combinatorial explosion of
isomorphic subgraphs by using compact location lists,
which are location lists of the vertices of the motifs.
Instead
of enumerating k elements out of n, it uses the
n
form, where k is the number of immediate
k
neighbors of a vertex out of n possible immediate
neighbors. Thus, it reduces the size of the output
significantly without losing information [38].
NETWORK MOTIF DETECTION
TOOLS AND ALGORITHMS
1. mfinder
mfinder [30] is a command line tool, and it is the first
network motif detection tool developed in 2004. It
uses the edge sampling algorithm discussed above for
subgraphs sampling. Because the runtime of this
algorithm does not depend on the network size,
mfinder can explore large networks and detect
larger motifs that are unreachable by the exhaustive
enumeration algorithms. mfinder uses an F1 frequency threshold, meaning that the vertices and
edges of motifs are freely shared [30].
mfinder also implements several methods such as
switching, stubs and go-with-the-winners for generating random networks [39]. mfinder is not suitable
for finding large motifs due to the directly exponential sampling procedure [30]. However, its runtime is
independent of network size and it is able to detect
subgraphs that have very low concentration [30].
3. NeMoFinder
NeMoFinder [34] is an algorithm developed in 2006
for detecting repeated and unique meso-scale network motifs in large PPI networks. It takes four
input parameters specified by the user: PPI network
G, frequency threshold F, uniqueness threshold S
and maximum network motif size K. Its output is a
506
Tran et al.
set of repeated and unique motifs from size 2 to a
specified maximum size K [34].
NeMoFinder uses F1 frequency for finding motifs.
The algorithm contains the following steps [34].
(i) Discovering repeated subgraphs in the PPI
network
(a) Finding repeated size k trees
In this step, the algorithm first finds size 2
tree. Then, it extends to size 3 tree, size 4
tree and so on until it reaches size k tree.
Next, it counts the occurrences of each size
k tree in the network and determines if
it is a repeated tree and adds it to the set
Tk by using user-defined frequency threshold F.
(b) Using repeated size k trees to partition graph G
In this step, the algorithm uses size k trees in
Tk to divide the graph G into a set of graphs
such that each graph contains a size k tree in
Tk (2 k K).
(c) Performing graph join operation to find repeated size k
graphs
In this step, the algorithm generates size k
subgraphs for each tree t in Tk.
Then, it joins t with each of these subgraphs
to generate size k subgraphs with k edges and
add them to the candidate set Ck . Next, it
checks the occurrences of each subgraph in
Ck and determines if it is repeated subgraph
and adds it to the set of frequent subgraphs
by using user-defined threshold F. Next, the
repeated subgraphs are used to generate all
possible k vertex and k edge subgraphs.
Subsequently, the repeated subgraphs are
joined with a newly generated subgraphs to
obtain (k þ 1) edge subgraphs, and it is added
to the set of frequent subgraphs. This process
is repeated until no repeated subgraph can be
found or a complete graph of k (k 1)/2
edges is reached.
Finally, the algorithm outputs a set of the
repeated trees and subgraphs from size 2 to
size K.
(ii) Verifying the frequency of repeated subgraphs in the random networks
In this step, the algorithm employs Markov
chain algorithm for generating random networks, which have the same single vertex characteristics as the PPI network. Then, it verifies
the frequency of the frequent subgraphs in each
random network.
(iii) Verifying the uniqueness values of repeated
subgraphs
Lastly, the algorithm calculates the uniqueness
value for each frequent subgraph using its frequencies in the PPI network and the random
networks.
NeMoFinder is scalable because of partitioning the
network into a set of graphs, which results in counting the frequency of a size k subgraph in the network. This problem is reduced to finding the
number of graphs that contain the subgraph, which
is downward closed. Thus, this algorithm can analyze
scale-free networks. NeMoFinder also utilizes the
idea in SPIN [45] for searching repeated trees and
extending them to subgraphs for reducing the computational complexity [34].
SPIN (SPanning tree-based maximal graph
mINing) is a spanning tree-based frequent subgraph
mining algorithm that mines only maximal frequent
subgraphs from a graph database [45]. Maximal frequent subgraphs are subgraphs that are not contained
within any other frequent subgraphs [45]. Frequent
subgraph mining extract frequent subgraphs that
have frequency above a specified threshold in a
given dataset [46]. Because SPIN only mines maximal frequent subgraphs, it can reduce the size of
the output as well as the computational time significantly [45].
NeMoFinder differs from SPIN in which it examines occurrences of a subgraph in a network while
SPIN only verifies if a subgraph occurs in a graph.
Moreover, NeMoFinder discovers repeated unlabeled subgraphs from a single graph while SPIN employs equivalence classes for finding maximal labeled
frequent subgraphs in a set of graphs [34].
The algorithm also implements the Graph Cousins
technique. This technique reduces the computational time for generating candidate subgraphs and
frequency counting for finding repeated subgraphs.
The traditional way for generating a subgraph candidate from a tree is by adding a new edge to that
tree. The resulting graph is verified if it exists in
the candidate set. However, the candidate set can
become very large and checking a graph for its existence in the candidate set involves graph isomorphism checking. Thus, the Graph Cousins technique is
designed to overcome the complexity and reduces
the computational time. There are three types of
Current innovations and future challenges of network motif detection
graph cousins between graphs g and h. Type I or
Direct Cousin has h isomorphic to a subgraph g’,
which has the same number of vertices and edges
as g but g 6¼ g’. Type II or Twin Cousin has h isomorphic to subgraph g. Type III or Distant Cousin
has h as a disconnected subgraph [34].
NeMoFinder is written in Cþþ, and it can
find network motifs up to size 12 in the PPI
network [34].
4. FANMOD
FANMOD [33] is a tool developed in 2006 for fast
network motif detection. It implements the RANDESU [37] novel algorithm discussed above for
enumerating and sampling subgraphs [33]. This
algorithm uses F1 frequency for finding motifs.
FANMOD is written in Cþþ and it can detect
network motifs up to size 8. The tool can also detect
motifs in colored networks. FANMOD implements
the canonical graph labeling algorithm called
NAUTY [47] for grouping subgraphs into isomorphic subgraph classes [33].
NAUTY (No AUTomorphisms, Yes?) is a software package containing several programs written in
C language to implement McKay’s algorithm for
determining the automorphism group of a vertexcolored graph and for computing the canonical
labeling, which is used for isomorphic graphs testing.
Two graphs are isomorphic if they have the same
canonical labeling [48].
FANMOD calculates the frequency of subgraph
classes in a number of random graphs specified by the
user. These random graphs are created from the original network by switching edges between vertices.
However, they preserve the degree sequence of
the original network. FANMOD allows selecting
different switching schemes for generating random
graphs. The tool is much faster than mfinder and
MAVisto [33].
FANMOD only accepts an edge-list text file.
However, its output options are far better. It is
able to generate HTML files containing basic visualizations of the network motifs. It also allows exporting the results to different formats for further
analysis [33].
FANMOD’s relative speed, ease of use and rich
customization, make it one of the most competitive
tools available today. However, its memory usage
increases remarkably when the subgraph size and
network size increase [49].
507
5. Grochow-Kellis
Grochow-Kellis [29] is a network motif detection
algorithm developed in 2007 for detecting large network motifs based on a novel symmetry-breaking
technique. Because the algorithms that use network-centric approach can only detect network
motifs up to size 8, this algorithm takes a new approach called motif-centric for discovering larger
network motifs. The algorithm has an exponential
speedup because the symmetry-breaking technique
eliminates repeated isomorphism checking. As a
result, the algorithm can detect network motifs up
to size 15. Furthermore, it can be applied to any type
of network. The algorithm uses F1 frequency for
finding motifs [29].
The algorithm has five distinguishable features.
First, instead of enumerating subgraphs, which increases the complexity, the algorithm exhaustively
looks for the instances of a single query graph in
the network by using McKay’s geng and directg [47]
programs. These programs are parts of the NAUTY
[47] package. The geng program generates small
graphs while the directg program generates small digraphs with given underlying graph [50]. Second,
the algorithm maps the query graph to the network
in all possible ways for checking isomorphic subgraphs. Third, it uses a novel technique, subgraph
symmetries, which allows finding an instance of
a single query graph only once. This technique
speeds up the algorithm by exponential factor. It
also allows writing discovered instances to the disk,
which improves memory usage. Fourth, the algorithm has a better isomorphic subgraph checking
than other motif finding algorithms because it considers the degree of each node as well as the degrees
of each node’s neighbors. Finally, the algorithm utilizes the subgraph hashing technique for hashing the
graphs using their degree sequences. This technique
improves the isomorphic subgraph checking process
significantly [29].
The algorithm has five noticeable advantages.
First, it can find larger motifs up to size 15.
Second, it can query a particular subgraph for significant checking. Third, it is able to cluster all discovered instances of a given subgraph into clusters
so that larger structures can be examined from the
formation of these clusters. Fourth, it can save time
and space. Finally, the algorithm can be easily parallelized which is advantageous for future improvement. The algorithm was also implemented in
Java [29].
508
Tran et al.
6. Kavosh
Kavosh [49] is a network motif finding algorithm
developed in 2009. It is based on counting all size
k subgraphs of the target network. The goal of this
algorithm is to find network motif of any given size
with less memory usage and lower CPU time.
Kavosh can find network motifs greater than eight
nodes. It uses the F1 frequency threshold. Kavosh is
written in Cþþ. The algorithm contains four major
steps as follows [49].
(i) Enumeration
This step finds all subgraphs of a given size in the
target network. The algorithm implements an efficient method for enumerating subgraphs size k as
follows. To count all subgraphs size k of a given
graph with vertices that are numerically labeled,
the algorithm finds all subgraphs that include a particular vertex. Then, it removes that vertex from the
network and repeats the process consecutively for
successive vertices. To count subgraphs size k that
include a particular vertex, the algorithm builds
trees that have special properties and restrictions as
follows. They have maximum depth of k and they
are rooted at this vertex. The children of each vertex
are incoming and outgoing adjacent vertices. Each
vertex appears only once so that no duplicate vertices
are allowed. The children of a tree must have numerical labels larger than the label of the root of that
tree. These properties results in counting a subgraph
only once [49].
Kavosh also implements the revolving door
ordering algorithm [49, 51], which is the minimal
change order in which two consecutive objects
differ by exactly two positions [52]. This algorithm
allows saving time for calculation performs on each
object that differs slightly from its predecessor [53].
The revolving door algorithm is known to be the
fastest algorithm for generating combinations of vertices, for enumerating subgraphs [49].
(ii) Classification
This step classifies discovered subgraphs into isomorphic classes. The algorithm employs NAUTY
[47] for finding isomorphic subgraphs. It inputs the
adjacency matrix of each discovered subgraph in the
previous step to NAUTY for generating canonical
labeling as a class identifier of that subgraph [49].
(iii) Random graph generation
This step generates random graphs such that it
preserves the degree sequence of the target network.
The algorithm implements the switching method
that is similar to Milo’s random model [35, 54] for
generating random graphs. This switching method is
described in the section ‘Random network generations’ [49].
(iv) Motif identification
This step identifies motifs from discovered subgraphs based on statistical parameters such as frequency, Z-score and P-value [49].
7. MODA
MODA (network MOtif Discovery Algorithm) [55]
is a network motif detection algorithm developed in
2009. It was designed to target large network motifs
(greater than size 8) efficiently. MODA is written in
C#. The algorithm uses F1 frequency for finding
network motifs. It implements the pattern growth
approach [17], which reduces the cost of isomorphic
subgraphs checking. This approach starts with size k
trees, and they are extended until a complete graph
with k nodes is reached. The algorithm exploits the
use of the previous query graph for the current query
graph if it is a supergraph of the previous one so that
the information can be re-used for calculating the
frequency of a particular query graph. This technique
reduces the computational time [55].
The algorithm also utilizes expansion trees that
extend minimal query graphs by adding edges to
them until a complete graph is obtained. The expansion tree Tk has the following distinguishable characteristics. Each node except for the root is a query
graph of size k. These query graphs become more
complete by traversing down the tree. The root is k,
which is the size of a query graph. Each node in level
ith has a graph of size k and contains ðk 2 þ iÞ
edges. The first level contains the number of nodes
that are equal to the number of nonisomorphic trees
of size k. Each node except for the root is a graph
that is nonisomorphic to all other graphs in Tk . Each
node except for the root is a subgraph of its child.
2
, which is
There is only one leaf node at level k 3kþ4
2
k2 3kþ4
a complete graph with k nodes and
edges.
2
Each node also contains an adjacency matrix corresponding to the graph. The expansion tree is generated by following a particular procedure, and it is
created only once. The tree is a static data structure
so that it can be stored and retrieved whenever the
algorithm needs. The expansion tree Tk is also used
for calculating the frequency of the subgraphs. The
algorithm also implements the mapping module for
calculate subgraph frequency. The mapping module
allows storing calculated mapping in the memory for
Current innovations and future challenges of network motif detection
later use. This mapping module implements the
symmetry-breaking technique [29] for counting
subgraph only once. In addition, the algorithm implements sampling throughout the network inside
the mapping module to speed up the mapping process. It also implements the enumeration module,
which speeds up the subgraph frequency-calculating
process [55].
8. G-Tries
G-Trie (Graph reTRIEval) [56] is a specialized data
structure developed in 2010. It is built based on
prefix tree, which provides sharing common topology. This data structure allows building a multiway
tree with the property that the descendants of a node
share a common substructure. It also allows storing
subgraphs, computing the frequency of subgraphs
efficiently, as well as efficient searching for finding
network motifs. Because of its sharing common
structure, G-Trie uses F1 frequency for finding
motifs [56].
G-Trie has the following characteristics. Each
node of the tree represents a single graph vertex
and its corresponding edges to predecessors. Nodes
that have common predecessors share common substructures. A node is a subgraph of their children.
A graph becomes more complete by traversing
down the tree. Each vertex is assigned an index,
and the index is increased when traversing down
the tree. Each graph is represented by an adjacency
matrix. The tree is built by following a particular
procedure. It starts with the root and one subgraph
is inserted to the tree at a time. The canonical labeling is used to ensure that isomorphic graphs produce
the same adjacency matrix for the same G-Trie. It
also guarantees that the order of vertices that have
the largest number of edges needs to appear first in
the matrix. Because G-Trie allows sharing common
substructures, the more common substructures the
less memory needed as well as the size of the tree
decreases. When there is no common substructure or
less common substructures, the tree would require a
substantial amount of storage space, as motif’s size
and network’s size increase. However, once the
tree is constructed searching and retrieval can be obtained more efficiently. This data structure eases the
subgraph census and isomorphic checking. However,
the algorithm still uses NAUTY tool [47] for
isomorphic checking. To avoid over counting
subgraphs, a symmetry-breaking technique, which
is similar to the technique in Grochow-Kellis [29],
509
is implemented. G-Trie was implemented in
Cþþ [56].
gtrieScanner [57] developed in 2012 is the only
tool that implemented G-Trie data structure. It is a
command line tool, and it only allows finding one
network motif size at a time. gtrieScanner can take
the input network graph in text format and outputs
the result in text or html format. The tool is written
in Cþþ. Its current release version 0.1 is only for
Linux system. gtrieScanner is a limited preliminary
tool, which is still under active development [57].
9. NetMODE
NetMODE (Network MOtif DEtection) [58] is a
network motif detection software package developed
in 2012. This is the first software package that does
not depend on NAUTY [47] tool for isomorphic
subgraphs checking. Although NAUTY is one of
the fastest tool for isomorphic subgraphs checking,
it is still too costly for calling NAUTY for million or
billion times. However, the algorithm has to pay a
cost for this independence by storing k-node subgraph data in the memory for k 5 in its pretreatment phase. NetMODE uses a novel approach when
k ¼ 6. However, it can only detect network motifs
up to size 6. NetMODE was developed based on
Kavosh [49] but it is not a variant of Kavosh, as it
has its own features. NetMODE uses F1 frequency
for finding motifs. The distinguishable features of
NetMODE are to follow. NetMODE stores all canonical labels in the memory using brute-force
search so that it does not have to call NAUTY
[47]. However, this practice only works for
3 k 5. When k ¼ 6, the algorithm uses a different
approach, which involves the Reconstruction
Conjecture in graph theory. The algorithm also contains two stages for finding size 6 motif: (i) process
the input network, and (ii) process the comparison
graphs. When k 7, it is impractical [58].
NetMODE contains a variety of methods for sampling similar graphs. An appropriate method can be
chosen based on the input network. It has several
variants of the switching method that is similar to
the method in [35]. This switching method is
described in the section ‘Random network generations’. The switching method in NetMODE is a
mixture of advantage features from both Kavosh
[49] and FANMOD [33]. NetMODE samples
random graphs using nonuniform distribution. It
has an alternative method, which implements the
local constant mode, for sampling similar graphs,
510
Tran et al.
but this method is slower than the switching methods. For subgraph enumeration, NetMODE employs the subgraph iteration procedure from
Kavosh [49] but without using the revolving door
algorithm [58].
NetMODE also has other features as follows. It
contains a verbose mode, which allows the users to
analyze the isomorphic subgraphs retuned in the
input network and the comparison graphs. Its
stdin/stdout can be interfaced with other packages
such as R for other analyses such as drawing motifs.
It also contains a burnin feature in which some comparison graphs generated by the switching method
are discarded. This feature leads to a better collection
of comparison graphs. NetMODE also contains high
performance computing feature for comparison
graphs. Although this feature is a basic coarse-grained
parallelism, it allows NetMODE to achieve near
linear speedup [58].
10. Acc-MOTIF
Acc-MOTIF (accelerated Motif) [59] is a network
motif detection software developed in 2012. It implements combinatorial techniques for accelerating
the motif-finding process. Acc-MOTIF contains a
number of algorithms for exact counting isomorphic
subgraph motifs of size 3, 4 and 5 independently.
Acc-MOTIF contains two main techniques. First,
instead of listing induced subgraphs, it calculates
the number of isomorphic patterns. Second, it assigns
an integer variable to each isomorphic pattern and
increments it directly instead of checking for isomorphic subgraphs.
pffiffiffiffiThe algorithms have the complexities of O m m for motif size 3 and O m2 for
motif size 4, where m is the number of edges in the
network graph [59].
Acc-MOTIF uses F2 frequency for finding motifs.
Its speed depends on the size of the network. Thus, it
may not viable for large target networks.
11. QuateXelero
QuateXelero is a network motif detection algorithm
developed in 2013. It is written in Cþþ.
QuateXelero usesF1 frequency for finding motifs.
The algorithm is based on FANMOD [33].
However, it intends to reduce the number of calls
to NAUTY [47]. Thus, QuateXelero minimizes this
cost by implementing a quaternary tree data structure
in the ESU algorithm in FANMOD for faster motif
detection. An example of a quaternary tree can be
found in Figure 3. A quaternary tree has the
Figure 3: An example of a quaternary tree of depth 3
that has a root and three internal nodes. One internal
node has four children. The search for string ‘321’
starts at the root and visits children 3 and 2 in the
path. The search is completed by adding a new leaf,
which is number 1 (Courtesy of Khakabimamaghani
et al. [5]).
following properties. Each internal node can have
at most four children and at most five neighbors
with one is its parents and the other four are its
children. An edge connecting a parent to a child
can be labeled by using number, character or
symbol. Once the tree is constructed and labeled, it
can be used to search for a given string containing
the same set of symbols that are used for labeling the
tree. The search for a particular string, for example
string ‘321’ in Figure 3, starts at the root and propagates down the tree by visiting its children. First, the
first symbol is read from the input string, which is 3
in this case. The current pointer is set to root and it
moves to the child that has the connecting edge label
matches with a search symbol read from the input
string. Then, the second symbol is read from the
input string, and the process is repeated at the current
node. If a symbol is not found in a child of the current node, a new child is added to the current node
for that symbol. The current pointer is moved to this
new child. The search goes on until the search string
is exhausted. Figure 3 depicts this process with the
path containing dotted edges for the search string
‘321’. This quaternary tree allows partial classification
for enumerated subgraphs and reduces the need of
calling NAUTY [5, 47].
QuateXelero contains three main steps: enumeration, classification and motif detection. In the enumeration step, QuateXelero uses quaternary tree for
enumerating subgraphs by building and extending
the tree. In the classification step, the algorithm
checks for isomorphic subgraphs by exploiting the
quaternary tree data structure, which reduces the
Current innovations and future challenges of network motif detection
number of calls to NAUTY. Thus, the computational time is reduced drastically. In the motif detection step, the algorithm generates random networks
using the same method as used in G-Trie [56]. Then,
it calculates Z-scores for determining the significance
of the motifs. Like G-Tries, QuateXelero consumes
a considerable amount of memory in trading for its
speedup [5].
A summary of all 11 tools and algorithms discussed
above can be found in Table 2.
RESULTS AND DISCUSSIONS
The network datasets used for evaluating the tools
and algorithms are varied. They are biological network, dictionary, electronic network, food web,
power grid network, social network, WWW network and others. Some tools and algorithms use a
single dataset. Others use a wide variety of dataset
types. All tools and algorithms were evaluated on at
least one biological network. Some tools and algorithms were evaluated on the same set of datasets for
comparison purpose. A summary of these datasets
can be found in Table 3.
mfinder
mfinder was evaluated on five different networks:
the transcription network of E. coli (423 nodes and
519 edges), the transcription network of S. cerevisiae
yeast (685 nodes, 1052 vertices), the Caenorhabditis
elegans (C. elegans) neural network (280 nodes, 2170
edges), the WWW network (325 000 nodes,
1 460 000 edges), and the food web of birds, fishes
and invertebrates (83 nodes, 391 edges) [30].
The E. coli and the S. cerevisiae yeast networks are
available on the Uri Alon’s Complex Networks [30,
60]. The WWW network is the network of hyperlinks between web pages in ndu domain [30].
The performance of mfinder was compared with
the exhaustive enumeration method [3] on a WWW
network [61] for motif size 3 and on a transcriptional
regulation network of E. coli [1] for motif sizes 3–5.
The results show mfinder detected all network
motifs found by the exhaustive enumeration
method. Besides, the evaluation of mfinder on the
neural network of C. elegans [62] shows it can detect
larger motif sizes 5 and 6 that are unreachable by the
exhaustive enumeration algorithm. In general, it is
able to find larger network motifs in larger networks,
with the runtimes not depending on the network
size [30]. Besides, the sampling method of mfinder
511
is significantly faster than the exhaustive enumeration
method. It is able to estimate the subgraph concentration at very high accuracy even for subgraphs that
have low concentration. The evaluation results also
show mfinder can detect motifs up to size 7 [30].
The experimental results show mfinder is able to
detect most common network motifs including
cascade-type motif (positive cascade), hub-type
motif (single-input module), bipartite-type motifs
(dense overlapping regulons, bi-fan) and cliquetype motifs (feed-forward loop, biparallel). In addition to common network motifs, mfinder is able to
detect several other network motifs. Although the
tool does not specify the forms of network motifs
it is able to detect, it has the capability to discover
network motifs from two to eight nodes.
mfinder is publicly available. The tool can be
run on Windows 2000, Windows XP and Linux.
However, it is no longer supported.
MAVisto
The FPF algorithm [36] in MAVisto was tested only
on a transcription network of S. cerevisiae yeast. This
network comes from the transcriptional regulatory
networks in S. cerevisiae, and it contains 62 nodes
and 93 edges [36].
The performance of MAVisto was not compared
with other tools and algorithms. The evaluation results show it can detect network motifs up to size 7.
Besides, different frequency concepts produce different results. For instance, if the analysis for a particular
network searches for all possible occurrences of a pattern then the frequency concept F1 would produce
better results [36].
Thus, depending on a particular analysis, an appropriate frequency concept should be chosen.
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the tool does not specify the types of network
motifs it is able to detect, MAVisto has the capability
to detect network motifs from two to nine nodes.
MAVisto is publicly available but it is no longer
supported.
NeMoFinder
NeMoFinder was evaluated on two real life datasets.
The Uetz [63] dataset consists of 957 PPIs and 1004
proteins of S. cerevisiae. The MIPS CYGD dataset
[64] has 10 199 PPIs and 4341 proteins after eliminating redundancy and orphan links from the whole
genome PPI network of S. cerevisiae [34].
7
8
12
8
15
>8
>8
Unspecified 9
6
5
Unspecified 12
mfinder
MAVisto
NeMoFinder
FANMOD
Grochow-Kellis
Kavosh
MODA
G-Tries
NetMODE
Acc-Motif
QuateXelero
5
6
9
12
15
8
13
9
8
Motif’s
max. size
Algorithm /
tool name
Output
format
Unspecified
Unspecified
Adjacency list
Unspecified
Unspecified
Unspecified
Unspecified
Text
Unspecified
Unspecified
Unspecified
Unspecified
Adjacency list
Text
(Text format)
Unspecified
Adjacency list
Text, HTML
(Text format)
Unspecified
Pajek-.Net, GML Graphics
Adjacency list
Text
(Text format)
Motif’s max. Input format
size in acceptable
running time
Edge Sampling
Method
Algorithm
Software package
Software package
Data structure
Algorithm
Networkcentric
Only PPI
network
2013
Yes
Yes
N/A
2012
2012
N/A
N/A
Yes
N/A
N/A
N/A
No
No
Maintenance?
2010
2009
2009
Tested on biolo- Networkcentric
gical, social,
and electronic
networks
Pattern growth, Tested on biolo- Motif-centric
sampling,
gical network
symmetryonly
breaking, expansion tree
Pattern growth Various network Networktree,
types
centric
Symmetrybreaking
Exhaustive, pat- Tested on social Networkcentric
and biological
tern growth
networks
tree, various
switching
methods,
Parallelism
Exhaustive,
Various network Motif-centric
combinatorial
types
techniques
Exhaustive, pat- Various network Networktern growth
types
centric
quaternary
tree
2006
2006
2005
2007
Any network
Various network Networktypes
centric
Networkcentric
Biological network only
2004
Network /
Published
motif centric year
Various network Networktypes
centric
Networkspecific
Motif-centric
Frequent pattern finder
(FPF) with
pattern tree
Algorithm
Graph Cousins,
pattern
growth
Best overall, tool, Randomized
visualization
enumeration,
sampling, pattern growth
tree
Algorithm
Symmetrybreaking,
exhaustive,
sampling
Algorithm
Exhaustive, pattern growth
tree
Tool, visualization
and analysis
Command-line
tool
Userfriendliness
Table 2: A summary of 11 network motif finding tools and algorithms
Yes
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
Yes
Available
online?
Reference
34
29
56
55
59
Windows, Linux 5
All platforms
Windows 32-bit 58
N/A
N/A
Windows 32-bit, 49
Linux
N/A
Windows 32-bit, 33
Mac, Linux
N/A
Windows 2000, 30
Windows XP,
Linux OS
All platforms
40
Platform
512
Tran et al.
QuateXelero [5]
Acc-MOTIF [59]
NetMODE [58]
MODA [55]
G-Tries [56]
Kavosh [49]
Grochow-Kellis [29]
FANMOD [33]
NeMoFinder [34]
PPI network of S. cerevisiae yeast
PPI network of S. cerevisiae yeast
Transcription network of E. coli
Transcription network of S. cerevisiae yeast
Neural network of C. elegans
Food web of the Ythan estuary
PPI network of S. cerevisiae
Transcription network of S. cerevisiae
Metabolic pathway of E. coli network
Transcription network of S. cereviciae yeast
Real social network
Electronic network
Transcription network of E. coli
Network of common associations between a group of dolphin
Electronic circuit network
Benchmark social network with heterogeneous communities
PPI network of yeast
U.S.A. western states power grid network
Real social network
Metabolic pathway of E. coli network
Transcription network of S. cerevisiae yeast
Complete directed graph
Transcription network of E. coli
Transcription network of S. cerevisiae yeast
Roget (Roget.net is a directed network contain cross-references in Roget’s
Thesaurus)
CS phd
Epa
California
ODLIS (odlis.net is directed network based on the ODLIS: Online
Dictionary of Library and Information Science)
Words E.
PairsFSG
foldoc.net is a directed network of Free On-line Dictionary of Computing
Transcription network of S. cerevisiae yeast
Metabolic pathway of E. coli
PPI network of the budding yeast
Real social network
Dolphins social network
Electronic circuit
Transcription network of E.coli
Transcription network of S. cerevisiae yeast
Neural network of C. elegans
W W W network of hyperlinks between web pages in ndu domain
Food web of birds, fishes and invertebrates
Transcription network of S. cerevisiae yeast
mfinder [30]
MAVisto [40]
Dataset
Tool/algorithm
N/A
N/A
423
688
306
135
1379
685
672
688
67
97
423
62
252
1000
2361
4941
67
672
688
50
418
688
1022
1882
4271
6175
2900
7381
5018
12905
688
672
2361
67
62
252
Genealogy
N/A
N/A
Dictionary
N/A
N/A
Dictionary
Biological network
Biological network
Biological network
Social network
Social network
Electronic network
423
685
280
325 000
83
62
Number of nodes
Biological network
Biological network
Biological network
Biological network
Biological network
Food web
Biological network
Biological network
Biological network
Biological network
Social network
Electronic network
Biological network
Dolphins network
Electronic network
Social network
Biological network
Power grid network
Social network
Biological network
Biological network
Directed graph
Biological network
Biological network
Dictionary
Biological network
Biological network
Biological network
W W W network
Food web
Biological network
Type
Table 3: A summary of the datasets used by motif finding tools and algorithms
46 281
63 608
109 092
1079
1275
6646
182
159
399
1740
8965
16150
18 241
N/A
N/A
519
1079
2345
597
2493
1052
1276
1079
182
189
519
159
399
7770
6646
6594
182
1276
1079
2540
519
1079
5074
519
1052
2170
1460 000
391
93
Number of edges
N/A
N/A
Pajek datasets
Uri Alon’s Complex Networks
N/A
Pajek datasets
N/A
University of Michigan Network Data
Uri Alon’s Complex Networks
Pajek datasets
N/A
N/A
Pajek datasets
Uri Alon’s Complex Networks
Uri Alon’s Complex Networks
N/A
ndu domain
N/A
Young Lab (Transcriptional Regulatory
Networks in S. cerevisiae)
N/A
N/A
Uri Alon’s Complex Networks
Uri Alon’s Complex Networks
N/A
N/A
N/A
N/A
Uri Alon’s Complex Networks
Uri Alon’s Complex Networks
N/A
N/A
Uri Alon’s Complex Networks
University of Michigan Network Data
Uri Alon’s Complex Networks
N/A
Pajek datasets
University of Michigan Network Data
N/A
Uri Alon’s Complex Networks
Uri Alon’s Complex Networks
N/A
Uri Alon’s Complex Networks
Uri Alon’s Complex Networks
Pajek datasets
Data source
Current innovations and future challenges of network motif detection
513
514
Tran et al.
Figure 4: Runtimes for different network motif sizes
for NeMoFinder, FPF (MAVisto), Sampling (mfinder)
and Enumeration in Uetz PPI Network of S. cerevisiae
[63] (Courtesy of Chen et al. [34])
The performance of NeMoFinder was compared
with other algorithms such as the enumeration
method (Exhaustive Recursive Search) [3], sampling
method (edge sampling algorithm) [30] and FPF [17]
as shown in Figure 4. The result shows NeMoFinder
achieves larger motifs as well as better runtimes with
20- to 100-fold speed up in the Uetz PPI network.
Besides, it can detect all motifs up to size 13 within
an acceptable running time for this network.
NeMoFinder also outperforms the FPF for up
to100-fold speedup under various frequency thresholds. It can find motif up to size 12 for the MIPS
dataset [34].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the algorithm does not specify the types of network motifs it is able to detect, NeMoFinder has the
capability to detect network motifs from 2 to 13
nodes. NeMoFinder is only an algorithm, and its
source code is not publicly available.
Figure 5: Comparison of runtimes for different network motif sizes for Grochow-Kellis algorithm and
two versions of Milo et al. algorithm [3] in PPI
Network of S. cerevisiae. The speed-up of GrochowKellis algorithm is also indicated (Courtesy of
Grochow et al. [29]).
The RAND-ESU [37] algorithm in FANMOD
was compared with the edge sampling algorithm
(ESA) in mfinder on four different datasets above.
The results show RAND-ESU is much faster than
ESA by several orders of magnitude for subgraph
sizes 5. Besides, RAND-ESU is more consistent
than ESA for sampling quality for different networks
because it is unbiased as well as its capability for
estimating the total number of subgraphs [37].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the tool does not specify the types of network
motifs it is able to detect, FANMOD has the capability to detect network motifs from three to eight
nodes. FANMOD is publicly available. Its latest
update was in 2006. The tool can be run on
Windows 32-bit, Mac and Linux.
FANMOD
Grochow-Kellis
The RAND-ESU [37] algorithm in FANMOD was
evaluated on four different networks: the transcription network of E. coli [1] (423 nodes, 519 edges), the
transcription network of S. cerevisiae yeast [3] (688
nodes, 1079 vertices), the neural network of C. elegans [30] (306 nodes, 2345 edges) and the food
web of the Ythan estuary (135 nodes, 597 edges).
The E. coli and the S. cerevisiae yeast networks are
available on the Uri Alon’s Complex Networks
[33, 60].
Grochow-Kellis algorithm was evaluated on two
biological networks: the PPI network of S. cerevisiae
yeast (1379 nodes, 2493 edges) and the transcription
network of S. cerevisiae yeast (685 nodes, 1052 edges)
[29].
The algorithm was compared with two versions of
Milo et al. algorithm, which is the exhaustive recursive search [3]. The result in Figure 5 shows
Grochow-Kellis achieves an exponential improvement in time over other two algorithms [29].
Current innovations and future challenges of network motif detection
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the algorithm does not specify the types of network motifs it is able to detect, Grochow-Kellis has
the capability to detect all types of network motifs
from 1 to 15 nodes.
The software implemented this algorithm is not
publicly available, and it is only accessible by
request [29].
515
capability to detect network motifs with three or
more nodes. Kavosh’s source code is publicly available. Its latest update was in 2013. Kavosh can be run
on Windows 32-bit and Linux.
MODA
MODA was tested only on the E. coli [1] transcription network, which contains 423 nodes and 519
edges. This network is available on Uri Alon’s
Complex Networks [55, 60].
MODA was assessed for its computational time for
enumerating subgraph appearances but not for determining the occurrences of the motifs. Its runtime was
compared with Grochow-Kellis [29], mfinder [30],
FANMOD [33] and MAVisto [40]. It was not able
to compare with NeMoFinder because there is no
implementation of NeMoFinder available. The
comparison result in Figure 6 shows MODA outperforms mfinder, Grochow-Kellis and MAVisto for
enumerating subgraph in the target network only.
This comparison does not include the computational
time for the randomized networks. The result shows
MODA is able to find size 9 motifs in an acceptable
running time [55].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the algorithm does not specify the types of network motifs it is able to detect, MODA has the capability to detect network motifs with two or more
nodes. MODA’s source code is publicly available. Its
latest update was in 2009.
Kavosh
Kavosh was evaluated on four different networks:
the metabolic pathway of E. coli (672 nodes, 1276
edges), the transcription network of S. cereviciae
yeast (688 nodes, 1079 edges), the real social network (67 nodes, 182 edges) and the electronic network (97 nodes, 189 edges). The E. coli and the
S. cereviciae yeast networks are available on the Uri
Alon’s Complex Networks [49, 60].
Kavosh’s performance was compared with mfinder [30], MAVisto [40] and FANMOD [33] in
Table 4. For the E. coli network, Kavosh is comparable to FANMOD but it outperforms other tools. It
can find larger motifs in acceptable running times.
For the yeast, social and electronic networks, Kavosh
outperforms all other tools, and it can also find larger
motifs in acceptable running times [49].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the algorithm does not specify the types of network motifs it is able to detect, Kavosh has the
Table 4: Performance comparisons between Kavosh, FANMOD [33], MAVisto [40] and mfinder [30] using E. coli
network [64], social network [30] and electronic network [30]
E. coli
S. cereviciae
Social
Electronic
Kavosh
FANMOD
MAVisto
mfinder
Kavosh
FANMOD
MAVisto
mfinder
Kavosh
FANMOD
MAVisto
mfinder
Kavosh
FANMOD
MAVisto
mfinder
3
4
5
6
7
8
9
10
11
12
0.30
0.81
13 532
31
1.35
2.20
15 784
32
0.04
0.46
393
12
0.08
0.53
210.00
7.00
1.84
2.53
^
297
34.59
41.41
^
306
0.23
0.84
1492
49
0.36
1.06
1727.00
14.00
14.91
15.71
^
23 671.8
1003.92
1111.95
^
33 548.2
1.63
3.07
^
798
0.02
4.34
6 696 000.00
109.80
141.98
132.24
^
^
20 212.99
24 292.05
^
^
10.48
17.63
^
181076.8
11.39
24.24
^
2020.20
1374.01
1205.97
^
^
746 385.86
926 745.34
^
^
69.43
117.43
^
^
77.22
160.00
^
^
13173.74
9256.61
^
^
17111178.28
18851135.4
^
^
415.66
845.93
^
^
422.61
967.99
^
^
121110.31
^
^
^
337 076 691.32
^
^
^
2594.19
^
^
^
2823.70
^
^
^
112 0560.16
^
^
^
7 211199 226.13
^
^
^
14 611.23
^
^
^
18 037.56
^
^
^
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
^
^
^
^
N/A
N/A
N/A
N/A
135 752.35
^
^
^
^
^
^
^
N/A
N/A
N/A
N/A
997 893.27
^
^
^
Network motif size is listed across from 3 to 12. The column underneath each motif size shows different runtimes in seconds for each algorithm
(Courtesy of Kashani et al. [49]).
516
Tran et al.
G-Trie
G-Trie was evaluated on a variety of networks: the
dolphins social network [66, 67] (62 nodes, 159
edges), the electronic circuit [3] network (252
Figure 6: Runtimes of MODA, Grochow-Kellis [29],
mfinder [30], FANMOD [33] and MAVisto [40] algorithms
for motif size 3 to 9 (Courtesy of Omidi et al. [55]).
nodes, 399 edges), the benchmark social network
with heterogeneous communities [68] (1000 nodes,
7770 edges), the PPI network of yeast [69, 70] (2361
nodes, 6646 edges) and the U.S.A. Western states
power grid network [67, 71] (4941 nodes, 6594
edges) [56].
The dolphins social network and the power grid
network are available on the University of Michigan
Network Data [56]. The PPI network of yeast is
available on the Pajek datasets [72] site. The electronic circuit is accessible on the Uri Alon’s Complex
Networks [60].
The performance of G-Trie was compared with
FANMOD [33] for network-centric and GrochowKellis [29] for motif-centric on various networks
above using the same common Cþþ platform for
the original network and the random networks. The
comparison results in Table 5 show G-Trie outperforms FANMOD [33] and Grochow-Kellis [29] for
all networks with different motif size ranges. It also
shows G-Trie can detect motifs up to size 9 in efficient running times [56].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the algorithm does not specify the types of network motifs it is able to detect, G-Trie has the capability to detect network motifs with three or more
Table 5: Comparison of G-Trie with FANMOD [33] and Grochow-Kellis [29] on five different networks (dolphins
[66, 67], circuit [3], social [68], yeast [69, 70] and power [67, 71] networks)
Network
Dolphins
Circuit
Social
Yeast
Power
Motif size
5
6
7
8
9
6
7
8
3
4
5
3
4
5
3
4
5
6
7
Census original network
Average census on similar random networks
FanMod
Grochow
G-Trie
FanMod
Grochow
G-Trie
vs FanMod
vs Grochow
0.07
0.48
3.02
19.44
100.86
0.49
3.28
17.78
0.31
7.78
208.3
0.47
10.07
268.51
0.51
1.38
4.68
20.36
101.04
0.03
0.28
3.44
73.16
2984.22
0.41
3.73
48
0.11
1.37
31.85
0.33
2.04
34.1
1.46
4.34
16.95
95.58
765.91
0.01
0.04
0.23
1.69
6.98
0.03
0.22
1.52
0.02
0.56
14.88
0.02
0.36
12.73
0
0.02
0.1
0.55
3.36
0.13
1.14
8.34
67.94
493.98
0.55
3.53
21.42
0.35
13.27
531.65
0.57
12.9
400.13
0.91
3.01
12.38
67.65
408.15
0.04
0.35
3.55
37.31
366.79
0.24
1.34
7.91
0.11
1.86
62.66
0.35
2.25
47.16
1.37
4.4
17.54
92.74
630.65
0.01
0.07
0.46
4.03
24.84
0.03
0.17
1.06
0.02
0.57
22.11
0.02
0.41
14.98
0.01
0.03
0.14
0.88
5.17
16.00
17.27
18.21
16.87
19.88
19.57
20.55
20.17
14.67
23.28
24.05
31.67
31.15
26.70
113.25
107.43
91.06
76.52
78.92
4.75
5.24
7.74
9.27
14.76
8.43
7.81
7.45
4.75
3.26
2.83
19.33
5.44
3.15
171.00
157.07
128.96
104.90
121.94
The execution time is in seconds.The speedup ratios are also indicated (Courtesy of Ribeiro et al. [56]).
Current innovations and future challenges of network motif detection
nodes. G-Trie is only a data structure, and its source
code is not publicly available.
NetMODE
NetMODE was tested on four different networks:
the real social network [49] (67 nodes, 182 vertices),
the metabolic pathway of E. coli [49] (672 nodes,
1276 vertices), the transcription network of
S. cerevisiae yeast [49] (688 nodes, 1079 edges) and
the complete directed graph (50 vertices, 2540 vertices). The social, E. coli and S. cerevisiae yeast networks come from the Kavosh source code, and
they are accessible on the Uri Alon’s Complex
Networks [58, 60].
NetMODE was compared with Kavosh [49] and
FANMOD [33] with and without multi-cores using
several switching methods. The comparison results in
Table 6 show NetMODE achieves better runtimes
517
for 4-node and 6-node subgraphs for the yeast and
social networks [58].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the software does not specify the types of network motifs it is able to detect, NetMODE has the
capability to detect network motifs from three to six
nodes. NetMODE’s source code is publicly available.
Its latest update was in 2012. NetMODE can be run
on Windows 32-bit.
Acc-MOTIF
Acc-MOTIF was evaluated on various networks selected from Uri Alon’s Complex Networks [60] and
Pajek datasets [72] site. The description for individual
dataset can be found in Tables 3 and 7.
Acc-MOTIF was compared with FANMOD [33]
on various networks described above in Table 7. The
Table 6: Comparisons of runtimes in seconds between NetMODE, Kavosh [49] and FANMOD [33] under various
switching methods for social network (4 -node) [49] and transcription network of S. cerevisiae yeast (6 -node) [49]
(Courtesy of Li et al. [58])
Network
Tool/algorithm
Fixed
bidirectional
edges
No regard
Global
constant
Local
constant
Uniform
local constant
Yeast (4 -node subgraph census)
Kavosh
FANMOD
NetMODE
NetMODE 4 -core
Kavosh
FANMOD
NetMODE
NetMODE 4 -core
214.2
^
12.1
2.9
50.5
^
33.5
10.3
^
318
12.6
2.9
^
464
87.2
24.1
^
319
12
3
^
139
31.4
10.1
^
318
12.3
3
^
147
34.9
10.8
^
^
^
^
^
^
34.5
11.6
Social (6 -node subgraph census)
Table 7: Comparisons of runtimes between Acc-MOTIF and FANMOD [33] for network motif sizes 3 and 4 on
various networks (Courtesy of Meira et al. [59])
Motifs k ¼ 3 (milliseconds)
Motifs k ¼ 4 (seconds)
Network
Nodes (n),
Edges (m)
acc-MOTIF
FANMOD
acc-MOTIF
FANMOD
Reference
Transcription network of E. coli
Transcription network of S. cerevisiae yeast
Roget (Roget’s Thesaurus)
CS phd (genealogy network)
Epa
California
ODLIS (Online Dictionary of Library
and Information Science)
Words E.
PairsFSG
Foldoc (Free On-line Dictionary of Computing)
(418, 519)
(688, 1079)
(1022, 5074)
(1882, 1740)
(4271, 8965)
(6175, 16150)
(2900, 18 241)
0.9 0.1
0.9 0.04
2 0.04
1.2 0.03
2.7 0.3
4.3 0.1
8.1 0.3
2.7 0.2
7.4 0.6
34 0.5
3.2 0.2
131 1
216 2
1, 025 5
0.021 0.001
0.043 0.0004
0.27 0.01
0.055 0.001
0.58 0.01
1.2 0.01
4.5 0.03
0.08 0.003
0.19 0.002
0.76 0.01
0.04 0.0005
9.2 0.07
12.6 0.02
210 2
60
60
71
71
71
71
71
7, 028 174
1, 687 19
2, 938 7
105 0.7
13 0.4
13.3 0.5
> 7200
153 3
439 8
60
71
71
(7381, 46 281)
46 0.4
(5018, 63 608)
42 0.3
(12 905, 109 092) 92 1
518
Tran et al.
result shows it achieves significant speedup over
FANMOD [33] for motif sizes 3 and 4 [59].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the software does not specify the types of network motifs it is able to detect, Acc-MOTIF has the
capability to detect network motifs from three to five
nodes.
Acc-MOTIF software is publicly available. Its current version is 2.0, and it is still under active development [59].
QuateXelero
QuateXelero was evaluated on six networks of different types: the transcription network of S. cerevisiae
yeast [60] (688 nodes, 1079 edges), the metabolic
pathway of E. coli [65] (672 nodes, 1275 edges), the
PPI network of the budding yeast [69, 70] (2361
nodes, 6646 edges), the real social network [49]
(67 nodes, 182 edges), the dolphins social network
[66, 68] (62 nodes, 159 edges) and the electronic
circuit network [3] (252 nodes, 399 edges). The
E. coli, S. cerevisiae yeast and social network are directed networks. The PPI network in budding yeast
and dolphins network are undirected networks. The
electronic circuit is both direct and undirected network [5].
The S. cerevisiae yeast and the electronic circuit
networks are available on the Uri Alon’s Complex
Networks [60]. The PPI network of the budding
yeast is accessible on the Pajek datasets [72] site.
The dolphins social network is available on the
University of Michigan Network Data [5].
QuateXelero was compared with Kavosh [49] and
G-Tries [56] on various networks above with different motif size ranges for the target network and the
random networks. The comparison results can be
found in Table 8. The results show QuateXelero
can detect motifs up to size 12 in acceptable running
times. The results also reveal the following strengths
and weaknesses of QuateXelero [5].
QuateXelero is always faster than the ESU of GTries for enumeration on original networks. It is
generally faster for enumeration on random networks for smaller motifs. QuateXelero is better
than the ESU of G-Tries for larger motif sizes in
directed networks. However, for undirected networks, QuateXelero is better than the ESU of
G-Tries for smaller and larger motifs but not for
medium-sized motifs. The memory usage between
QuateXelero and G-Tries is comparable for some
networks. However, QuateXelero does not show
better memory usage than G-Tries in general [5].
Although the experimental results do not explicitly identify the forms of discovered network motifs
and the algorithm does not specify the types of network motifs it is able to detect, QuateXelero has the
capability to detect network motifs with two or
more nodes. QuateXelero’s source code is publicly
available. Its latest update was in 2013. QuateXelero
can be run on Windows and Linux.
CHALLENGES FOR COMPARING
DIFFERENT TOOLS AND
ALGORITHMS
mfinder and MAVisto are no longer supported.
mfinder can be run only on older versions of
Windows or Linux. MAVisto does not run on the
current version of Java. There is no implementation
for NeMoFinder available for testing this algorithm
[55]. The source code for Grochow-Kellis is not
publicly available and can only be obtained via request. G-Trie is only a data structure and its source
code is not publicly available. The gtrieScanner tool
that implements G-Trie can only find one network
motif size at a time [57]. Testing G-Trie and
QuateXelero for large networks and larger motifs
requires a considerable amount of memory.
Different tools and algorithms accept different
input formats. The conversion of the input into an
acceptable format by each tool and algorithm involves developing procedures or scripts in a programming language.
Comparison with Kavosh
QuateXelero is much faster than Kavosh for all
situations. However, QuateXelero consumes a
large amount of memory comparing to Kavosh for
constructing the quaternary tree [5].
Comparison with G-Tries
OBSERVATIONS ON NETWORK
MOTIF DETECTION TOOLS AND
ALGORITHMS
mfinder is a tool developed in 2004 for overcoming
the drawbacks of the exhaustive enumeration
method by implementing subgraphs sampling
Yeast
Yeast
Yeast
Yeast
Yeast
Electronic
Electronic
Electronic
Electronic
Electronic
Electronic
Electronic
Electronic
E.coli
E.coli
E.coli
E.coli
E.coli
E.coli
E.coli
Social
Social
Social
Social
Social
Social
Social
5
6
7
8
9
5
6
7
8
9
10
11
12
5
6
7
8
9
10
11
5
6
7
8
9
10
11
Kavosh
QuateXelero
Processing times
Directed
23.4
0.5
Directed
438.5
8.9
Directed
14 056.2
166.4
Directed
22 4497
2609.5
Directed
^
53 852.1
Directed / Undirected
0.13
0
Directed / Undirected
0.8
0.08
Directed / Undirected
5.9
0.3
Directed / Undirected
38.7
1.9
Directed / Undirected
278.2
11.9
Directed / Undirected
2614.2
71.2
Directed / Undirected
^
493.3
Directed / Undirected
^
^
Directed
0.48
0.05
Directed
4.3
0.3
Directed
45.3
2.8
Directed
410.7
23.6
Directed
4000
190.7
Directed
^
^
Directed
^
^
Directed
0.11
0.06
Directed
0.82
0.36
Directed
5.4
2.6
Directed
33.3
16.3
Directed
220.3
96.22
Directed
^
^
Directed
^
^
Network Motif size Directionality
46.80
49.27
84.47
86.03
^
nan
10.00
19.67
20.37
23.38
36.72
^
^
9.60
14.33
16.18
17.40
20.98
^
^
1.83
2.28
2.08
2.04
2.29
^
^
QX
G-Tries
QX
Average census
on randoms
ESU þ
G-Tries
Total time
QX
G-Tries
Memory
30.846
0.733
0.693
0.955
37.85
10.51
1.5 MB
532.806
11.201
11.909
17.856
651.07
190.20
2.3 MB
12 314.314
164.596
220.656
344.539 14494.60
3611.77
7.1MB
^
^
^
^
^
^
^
848186.49
6205.98 13 544.20 23 950.802 915 907.47 125 962.31 711 M
0.184
0.015
0.014
0.009
2.13
1.28
1.2 MB
1.097
0.063
0.068
0.051
8.49
5.59
2.4 MB
7.780
0.390
0.376
0.302
45.81
31.29
8.6 MB
^
^
^
^
^
^
^
65.89
2.34
2.360
2.604
79.18
16.11 42 M
483.41
13.89
11.626
14.962
550.76
90.76 206 M
3998.61
82.75
76.793
113.920
4438.76
663.11
1.0 G
^
504.40
^
796.268
^
4557.15
^
0.612
0.063
0.126
0.037
15.63
5.51
4.5 MB
5.604
0.546
0.910
0.303
104.07
33.65 22.1MB
51.092
4.430
7.195
2.600
822.42
274.45 135.4 MB
^
^
^
^
^
^
^
728.69
47.52
86.493
41.078
1223.27
264.96
1.2 G
6357.95
352.46
929.200
402.146
11461.69
2443.21
7.6 G
53 819.37
^
8834.432
^
101184.86
^
44.0 G
0.094
0.031
0.019
0.009
3.55
1.31
5.4 MB
0.581
0.218
0.118
0.070
22.74
9.00 30.7 MB
3.532
1.451
0.725
0.612
154.91
72.78 184.9 MB
^
^
^
^
^
^
^
21.25
9.62
5.830
12.228
119.34
83.37
1.5 G
121.01
54.30
34.570
85.252
731.66
558.70
7.9 G
669.35
273.54
237.761
653.014
4527.95
4368.38 40.0 G
ESU of
G-Tries
Comparison
Census on original
versus Kavosh
1.8 MB
2.5 MB
8.8 MB
^
889 M
3.4 MB
3.9 MB
8.5 MB
^
130 M
678 M
4.6 G
25 G
7.7 MB
13.6 MB
74.6 MB
^
2.4 G
19 G
^
2.7 MB
13.9 MB
143.7 MB
^
2.8 G
18 G
59 G
QX
60
60
60
60
60
3
3
3
3
3
3
3
3
72
72
72
72
72
72
72
49
49
49
49
49
49
49
Reference
Table 8: Comparisons of runtimes between QuateXelero and Kavosh [49] for network motif sizes 5-11 on various networks (Courtesy of Khakabimamaghani et al.
[5])
Current innovations and future challenges of network motif detection
519
520
Tran et al.
technique. However, it suffers the biased subgraphs
sampling so that it has to pay extra expensive cost for
correcting the biased estimation. Nonetheless, the
gains it obtained are noticeable. It is significantly
faster than the exhaustive enumeration method and
its runtime does not depend on the network size.
It can detect motif sizes 5 and 6 that are inaccessible
by the exhaustive enumeration method [30].
MAVisto is a tool developed in 2005 for providing
the motif analysis and visualization that are not supported by mfinder. The tool provides more flexibility for the input format as well as a rich set of
visualizations for motif analysis. MAVisto is only
fast for detecting motif sizes 3–5 in directed networks
but it can find motif up to size 8. MAVisto was designed for finding network motif in biological network only [40].
NeMoFinder is an algorithm developed in 2006
for detecting repeated and unique network motifs in
PPI network only. It is the first algorithm in the
development timeline that is able to detect network
motif up to size 12. Its runtime is better than enumeration method, sampling method and FPF. This is
a big improvement over existing tools around that
time. This algorithm can also analyze scale-free networks [34].
FANMOD is a tool developed in 2006 with the
aims for fast network motif detection and better
motif analysis through the graphical user interface
and flexible output formats. It is an attractive tool
because of its relative speed, ease of use, rich customization, as well as flexible export format. However,
FANMOD is not able to find motifs greater than size
8 due to computational explosion. The reason is that
the number of calls to NAUTY for isomorphic subgraphs checking is enormous when motif size and
network size increase. In addition, its memory
usage increases remarkably when the subgraph size
and network size increase [33].
Grochow-Kellis is an algorithm developed in
2007 for detecting large network motifs based on a
novel symmetry-breaking technique. All tools and
algorithms developed to this point use network centric approach, which limits them from detecting
larger network motif because of the subgraph
census in the entire target network. This is the first
algorithm using motif centric approach, which uses a
single query subgraph, for detecting large network
motifs. However, it suffers the fact that not all query
subgraphs it generated can be found in the target
network. Thus, there are unnecessary computational
times spend using this approach. However, the gains
obtained for this approach are very noticeable. The
algorithm has an exponential speedup. It eliminates
the limitations of memory usage. It can detect network motifs up to size 15. This makes it surpasses all
other existing tools and algorithms for finding large
network motifs. The algorithm can also be applied
to any type of network and it can be easily parallelized [29].
Kavosh is an algorithm developed in 2009 with
the goal to find network motif of any given size with
less memory usage and lower CPU time. The algorithm was tested on biological, social and electronic
networks with the results showing it is able to detect
network motif size greater than 8. The algorithm
surpasses mfinder, MAVisto and FANMOD but it
was not compared with other existing algorithms
[49].
MODA is an algorithm developed in 2009 with
the goal also for detecting motifs greater than size 8
efficiently. It was tested only on the E. coli transcription network [1]. It outperforms Grochow-Kellis
[29], mfinder [30], FANMOD [33] and MAVisto
[40] for this network only [55].
G-Trie is a multi-way tree data structure developed in 2010 for storing subgraphs, which
allows saving computational time for faster motif
finding. It was tested on various network types and
it is able to detect motifs up to size 9. It outperforms
FANMOD [33] and Grochow-Kellis [29], but it
consumes a huge amount of memory for trading
with efficient search and retrieval [56].
NetMODE is a software package developed in
2012. This is the first software package that does
not depend on NAUTY for isomorphic subgraphs
checking. However, the tradeoff for this independence is the cost it has to store k-node subgraph data
in the memory, which is disadvantageous because of
large memory consumption. NetMODE was tested
on social and biological networks. It was compared
with Kavosh [49] and FANMOD [33] but not for
other algorithms. It is able to detect motifs up to
size 6 [58].
Acc-MOTIF is a software developed in 2012. It
contains several algorithms for finding motifs of size
3, 4 and 5 independently. It uses combinatorial techniques for subgraph isomorphic checking and finding
significant motifs. Acc-MOTIF was tested on various
network types for detecting motif sizes 3 and 4. It was
compared with FANMOD [33] only, and the results
show it outperforms FANMOD significantly [59].
Current innovations and future challenges of network motif detection
QuateXelero is an algorithm developed in 2013
for the purpose to reduce the number of calls to
NAUTY by implementing the quaternary tree data
structure. It was tested on various network types. It
outperforms Kavosh [49] for all cases and outperforms G-Tries [56] for some cases. It was not compared with other tools and algorithms. Like G-Tries,
QuateXelero consumes a substantial amount of
memory in trading for its speedup [5].
Some tools and algorithms were designed for a
specific network type. Others were designed for
various networks. Newer tools and algorithms were
designed to overcome some of the shortcomings
such as limited motif size, large memory usage or
massive computational time. Because there are
many challenges for developing an efficient network
motif detection tool, the developed products rolled
out are just only algorithms or limited tools.
However, newer tools and algorithms always show
some improvement aspects.
We have seen these improvements of motif finding tools and algorithms throughout the years.
FANMOD is a user-friendly tool for motif visualization and analysis, but it can detect motifs up to size 8.
All tools and algorithms, which are able to detect
motif size > 8, do not provide user-friendly interface
as well as motif visualization and analysis. Although
they are able to handle large networks and large
motif sizes, they consume large amount of memory
or they were tested on one or few networks.
Kim et al. [7] classified network motifs into structural network motifs and biological network motifs.
Structural network motifs are detected based on
structural uniqueness using scoring thresholds.
Biological network motifs are detected based on biological significance such as topological property and
Gene Ontology (GO) term relevance regardless of
their structure. The authors presented that structural
uniqueness can suggest biological network motifs but
it is not sufficient for determining biological network
motifs. Thus, they developed five algorithms:
EDGEGO-BNM, EDGEBETWEENNESS-BNM,
NMF-BNM, NMFGO-BNM and VOLTAGEBNM for efficiently detecting biological network
motifs using biological significance criteria. These algorithms also consider nonnetwork motifs for their
biological significance. The authors also validated the
discovered biological network motifs for their existence based on three criteria: (i) motifs included in
complex, (ii) motifs included in functional module
and (iii) GO term clustering score. The protein
521
complexes are defined as groups of proteins interacting mutually within a cell at the same time and place.
The functional modules are groups of binding proteins participating in different cellular processes at
different times. The GO term clustering score is
the clustering score calculated based on GO term
relevance. Their validations revealed that by using
biological significance nonnetwork motifs are also
found to be biological meaningful network motifs.
The authors compared their algorithms with mfinder, ESU and RAND-ESU in FANMOD. The
comparisons showed these algorithms produce
more reliable biological network motifs than mfinder, ESU and FANMOD. In addition, they are capable for finding structural network motifs as well.
However, these algorithms are not able to detecting
large biological network motifs [7].
The users may consider these algorithms for finding biological network motifs.
All tools and algorithms discussed above use their
own strategies for detecting network motifs based on
structural uniqueness. Thus, the detected network
motifs can be considered as structural network
motifs [7]. Although MAVisto was designed for detecting network motifs in biological networks and
NeMoFinder was designed for finding meso-scale
network motifs in large PPI networks, all tools and
algorithms above neither consider biological significance for finding network motifs nor using three
validation criteria above for validating the existence
of discovered network motifs in biological networks.
Additionally, these tools and algorithms do not consider nonnetwork motifs for their biological significance but instead filter them out.
At this point, the users may wonder which tool or
algorithm is the right choice for their research. mfinder and MAVisto are no longer supported plus they
are limited tools. mfinder does not support recent
versions of Windows. MAVisto does not support
recent version of Java. There is no implementation
of NeMoFinder available. FANMOD is a fast and
user-friendly tool for various network types, but it
can detect motifs up to size 8. Grochow-Kellis is
only an algorithm but it can detect motifs up to
size 15 and it can be used for any network. Its
source code is not openly available and it can be
obtained via request. Kavosh is also an algorithm
and it was tested on biological, social and electronic
networks only. The algorithm claims it can detect
motifs greater than size 8. The comparison results
show it can detect motif size 12 in an acceptable
522
Tran et al.
running time. MODA is also an algorithm. It was
tested only on the E. coli transcription network and it
outperforms Grochow-Kellis, mfinder, FANMOD
and MAVisto for this network only. The algorithm
claims it can detect motifs greater than size 8.
The comparison results show it can detect motif
size 9 in an acceptable running time. G-Tries is
only a data structure but it can be used for various
network types. Its source code is not publicly available. G-Tries outperforms FANMOD, GrochowKellis and it can detect motifs up to size 9 in efficient
running times. However, it consumes huge amount
of memory. NetMODE is software package that can
detect motifs up to size 6. It was tested on social and
biological networks only. Acc-Motif is a tool that
can detect motif sizes 3, 4 and 5 independently. It
can be used for various network types, and it is much
faster than FANMOD. It can support all types of
platforms, and it is under active development.
QuateXelero is only an algorithm but it can be
used for various network types. It outperforms
Kavosh for all cases and outperforms G-Tries for
some cases. The tool is fast, but it consumes large
amount of memory. It can detect motifs size 12 in
acceptable running times. We have seen the pros and
cons of individual tool and algorithm. Hence, an
appropriate tool or algorithm should be carefully
chosen depending on the type of research being conducted and the available computing resources the
users have locally.
Any tools and algorithms that are developed
should fulfill the needs of the users. The features
probably most concerned by the users include accuracy, speed, motif size and ease-of-use. Therefore, we
have some general remarks as follows.
into the function of organized structure at different
scales is limited [74].
There is no specific number for the size of larger
network motifs that need to be explored. However,
the capability of discovering larger network motifs
by a tool or an algorithm provides the opportunity
for making further discoveries.
(ii) User-Friendliness—The users usually never
encounter the inner workings of any network motif
detection tools. They simply use the provided interface to select parameters and observe progress. For
this reason, it is vital that competitive tools have
sensible user interfaces. Though some tools are
very fast, their limited user interfaces and difficult
setup procedures have led to a tepid reception by
the users.
(iii) I/O Formats—Graphs are stored in different
formats, and it is important for a tool or an algorithm
to accept a wide variety of input formats. It is also
vital for providing the motif visualization that helps
the users better observing the results as well as allowing them to export the results into various formats
for further analysis.
(iv) Sampling or Exhaustive—This is an important aspect of a tool or an algorithm, as it influences
the accuracy of potential results. Thus, it is important
that a tool or an algorithm be very clear about the
kind of method it implements.
(v) Network-Specific—Some tools and algorithms are designed to target one or some specific
type of networks. Some can be used for any network
types. Some tools were tested only on one or few
networks. Therefore, it is vital that a tool or an algorithm be specific on the type of target network it
was developed for.
(i) Practical Motif Size Limit—This limitation of
a tool or an algorithm is critical because discovering
larger network motifs may answer several important
questions. Does a given motif appear independently
in the network? Or the instances of that motif combine to form larger structures? If it is the latter, then
what is the function of these larger structures? Do
different networks that share a certain network motif
also share the same structural combinations of that
motif? These questions can be answered by finding
and analyzing large subgraphs [73].
Moreover, small size of network motifs limits
the scale where features of organization in networks
can be discovered. Hence, the possibility of insight
CONCLUSIONS AND FUTURE
WORK
We have analyzed 11 different network motif detection tools and algorithms as well as discussed their
individual strengths and weaknesses. We have seen
improvements of network motif detection tools and
algorithms throughout the years. However, several
improvements are still needed in the field of network
motif detection.
Discover larger motifs
It requires exploring new techniques for developing better tools and algorithms that enable researchers
Current innovations and future challenges of network motif detection
to discover and analyze larger motifs. Currently, the
literature is saturated with tools and algorithms that
can find motifs in the single-digit range. Discovering
larger motifs imposes a significant challenge on future
development, as the exponential runtime makes a significant impact on the performance.
Improve runtime
The tools and algorithms that have succeeded up
until now have implemented novel ways to speed up
the computation. The most obvious solution to decrease runtime is by implementing parallel algorithms. The network motif detection problem is
extremely parallel [58], and only recently have
tools begun to exploit this fact. However, these
tools and algorithms have only exploited coarsegrain parallelism. There are independent aspects of
the network motif detection that can be executed
simultaneously to reduce the runtime. Some parallelized examples are to follow. Random networks can
be processed simultaneously. This step of the algorithm is the simple one to parallelize. Single query
subgraph can be processed concurrently for different
subgraph sizes. Isomorphic subgraphs checking for
different subgraph sizes can also be processed in parallel. Efficient fine-grained parallelism is perhaps the
most crucial improvement needed currently for network motif detection.
523
convenient for the users to use a tool via the web.
This would allow saving time and resources for the
users so that they do not have to spend time installing
the tool locally because some tools might consume
many resources that may not be available on the local
machine.
There are many different paths for future development of network motif detection. We address
some of them here. One direction would be to predict how motifs appear in a network, which could
provide additional possible optimization for future
tools and algorithms. Furthermore, smaller motif
could be a part of bigger motif, which would
mean researchers could do a quick search for small
motifs and then use them as seeds to find larger ones.
This avenue of research has huge implications for
developing efficient, large motif sampling algorithms.
Another avenue is to reuse the computation with less
memory usage or without using extra memory
usage.
Key Points
Most of network motif detection tools and algorithms that are
able to detect larger motifs greater than 8 nodes do not provide
user-friendly interface as well as motif visualization and analysis.
Detecting large network motifs has cost associated with large
memory tradeoff or spending on unnecessary computational
time.
Efficient fine-grained parallelism is one of the most crucial improvements needed for network motif detection.
There is no web tool developed yet for network motif detection.
Provide user-friendly interface
Most of recent tools and algorithms, that are capable for detecting larger motifs > 8 nodes, do not
provide a user-friendly interface for motif visualization and analysis. Thus, this feature should be
included in the future version of these tools as well
as for new tools, as it allows researchers to gain more
insights into the behavior of network motifs.
FUNDING
Improve I/O
References
This work was supported in part by the National
Science Foundation (NSF) [OCI-1156837 to S.M.,
C.-H.H.]; and U.S. Department of Education
Graduate Fellowships in Areas of National Need
(GAANNs) [P200A130153 to N.T.L.T.].
New tool/algorithm and the future version of
current tools and algorithms should accept a wide
variety of input formats as well as allow exporting
the results into various formats for further analysis.
1.
Provide web tool
3.
As of this writing, there is no web tool develops
for network motif detection yet. It would be more
4.
2.
Shen-Orr SS, Milo R, Mangan S, et al. Network motifs in
the transcriptional regulation network of Escherichia coli. Nat
Genet 2002;31:64–8.
Albert I, Albert R. Conserved network motifs allow protein-protein interaction prediction. Bioinformatics 2004;
20(18):3346–52.
Milo R, Shen-Orr S, Itzkovitz S, et al. Network motifs:
simple building blocks of complex networks. Science 2002;
298:824–7.
Allan EG, Turkett WH, Fulp EW. Using Network Motifs to
Identify Application Protocols. Global Telecommunications
524
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Tran et al.
Conference 2009. GLOBECOM 2009. Honolulu, Hawaii:
IEEE, pp. 1–7.
Khakabimamaghani S, Sharafuddin I, Dichter N, et al.
QuateXelero: an accelerated exact network motif detection
algorithm. PLoS One 2013;8(7):e68073.
Wong E, Baur B, Quader S, et al. Biological network motif
detection: principles and practice. Brief Bioinform 2011;13(2):
202–15.
Kim W, Li M, Wang J, et al. Biological network motif
detection and evaluation. BMC Syst Biol 2001;5:1–13.
Yeger-Lotem E, Sattath S, Kashtan N, et al. Network motifs
in integrated cellular networks of transcription–regulation and
protein–protein interaction. PNAS 2004;101(16):5934–9.
Alon U. Network motifs: theory and experimental
approaches. Nat Rev Genet 2007;8:450–61.
Alon U. SnapShot: network motifs. Cell 2010;143:326.e1.
Madar D, Dekel E, Bren A, et al. Negative auto-regulation
increases the input dynamic-range of the arabinose system
of Escherichia coli. BMC Syst Biol 2011;5:1–9.
Jin G, Zhang S, Zhang X, et al. Hubs with network motifs
organize modularity dynamically in the protein-protein
interaction network of yeast. PLoS One 2007;11:e1207.
Ingram PJ, Stumpf MPH, Stark J. Network motifs: structure
does not determine function. BMC Genomics 2006;7:108.
Lipshtat A, Purushothaman SP, Iyengar R, et al. Functions
of bifans in context of multiple regulatory motifs in signaling networks. BiophysJ 2008;94:2566–79.
Mangan S, Alon U. Structure and function of the feedforward loop network motif. PNAS 2003;100(21):11980–5.
Zhu X, Gerstein M, Snyder M. Getting connected: analysis
and principles of biological networks. Genes Dev 2007;21:
1010–24.
Schreiber F, Schwbbermeyer H. Frequency concepts and
pattern detection for the analysis of motifs in networks.
Trans Comput Syst Biol III Lect Notes Comput Sci 2005;3737:
89–104.
Schmidt C, Weiss T, Komusiewicz C, et al. An analytical
approach to network motif detection in samples of networks with pairwise different vertex labels. Comput Math
Methods Med 2012;2012:1–12.
Milo R, Itzkovitz S, Kashtan N, etal. Superfamilies of evolved
and designed networks. Science 2004;303(5663):1538–42.
Przytycka TM. An important connection between network
motifs and parsimony models. Res Comput Mol Biol 2006;
3909:321–35.
Chen L, Qu X, Cao M, et al. Identification of breast cancer
patients based on human signaling network motifs. Sci Rep
2013;3368:1–7.
Tsang J, Zhu J, van Oudenaarden A. MicroRNA-mediated
feedback and feedforward loops are recurrent network
motifs in mammals. Mol Cell 2007;26(5):753–67.
Albert I, Albert R. Conserved network motifs allow protein-protein interaction prediction. Bioinformatics 2004;
20(18):3346–52.
Chen J, Hsu W, Lee ML, et al. Labeling network motifs in
protein interactomes for protein function prediction. ICDE
2007;546–55.
Turkett W, Fulp E, Lever C. Graph mining of motif
profiles for computer network activity inference. MLG;
2011;1–8.
26. Lizier JT, Atay FM, Jost J. Information storage, loop motifs,
and clustered structure in complex networks. Phys Rev E
2012;86:1–5.
27. Wu SF, Qian WY, Zhang JW, et al. Network motifs in the
transcriptional regulation network of cervical carcinoma
cells respond to EGF. Arch Gynecol Obstet 2013;287:771–7.
28. Kim W, Diko M, Rawson K. Network motif detection:
algorithms, parallel and cloud computing, and related tools.
Tsinghua SciTechnol 2013;18(5):469–89.
29. Grochow JA, Kellis M. Network motif discovery using
subgraph enumeration and symmetry-breaking. Proceedings
of the 11th Annual International Conference on Research in
Computational Molecular Biology, 2007, Oakland, CA, USA,
Vol. 4453. Springer Berlin Heidelberg, 2007, 92–106.
30. Kashtan N, Itzkovitz S, Milo R, et al. Efficient sampling
algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 2004;20:1746–58.
31. Wernicke S. Efficient detection of network motifs. IEEE/
ACM Trans Comput Biol Bioinform 2006;3(4):347–59.
32. Parida L. Discovering topological motifs using a compact
notation. J Comput Biol 2007;14(3):300–23.
33. Wernicke S, Rasche F. FANMOD: a tool for fast network
motif detection. Bioinformatics 2006;22:1152–3.
34. Chen J, Hsu W, Le ML, et al. NeMoFinder: Dissecting
genome-wide protein-protein interactions with meso-scale
network motifs. Proceedings of the 12th ACM SIGKDD
International Conference on Knowledge Discovery and Data
Mining 2006, Philadelphia, PA, USA;106–15.
35. Milo R, Kashtan N, Itzkovitz S, et al. On the uniform
generation of random graphs with prescribed degree sequences 2004. http://arxiv.org/abs/cond-mat/0312028
(27 December 2013, date last accessed).
36. Schreiber F, Schwobbermeyer H. Towards motif detection
in networks: frequency concepts and flexible search.
Proceedings of the International Workshop on Network Tools and
Applications in Biology 2004, Camerino, Italy;91–102.
37. Wernicke S. A faster algorithm for detecting network
motifs. Algorithms Bioinform 2005;3692:165–77.
38. Parida L. Discovering topological motifs using a compact
notation. J Comput Biol 2007;14(3):300–23.
39. Kashtan N, Itzkovitz S, Milo R, et al. Network motif detection tool MFinder tool guide. Technical report 2005.
Rehovot, Israel: Departments of Molecular Cell Biology
and Computer Science and Applied Mathematics,
Weizmann Institute of Science, 2005.
40. Schreiber F, Schwbbermeyer H. MAVisto: a tool for
the exploration of network motifs. Bioinformatics 2005;21:
3572–4.
41. Bachmaier C, Brandenburg FJ, Forster M, et al. Gravisto:
graph visualization toolkit. Graph Drawing 2005;3383:502–3.
42. Fruchterman TMJ, Reingold EM. Graph drawing by
force-directed placement. Softw Pract Exp 1991;21(11):
1129–64.
43. Batagelj V, Mrvar A. Pajek—analysis and visualization of
large networks. Graph Drawing 2002;477–8.
44. Himsolt M. Graphlet: design and implementation of a graph
editor. Softw Pract Exp 2000;30(11):1303–24.
45. Huan J, Wang W, Prins J, et al. Spin: Mining maximal
frequent subgraphs from graph databases. Proceedings of the
10th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining 2004, Seattle,WA, USA;581–6.
Current innovations and future challenges of network motif detection
46. Jiang C, Coenen F, Zito M. A survey of frequent subgraph
mining algorithms. Knowl Eng Rev 2013;28(01):75–105.
47. McKay BD. Practical graph isomorphism. Congr Numer
1981;30:45–87.
48. Hartke SG, Radcliffe AG. McKay’s canonical graph labeling
algorithm. Contemp Math 2009;479:99–111.
49. Kashani ZRM, Ahrabian H, Elahi E, et al. Kavosh: a new
algorithm for finding network motifs. BMC Bioinformatics
2009;10:318.
50. McKay B, Piperno A. Nauty and Traces. http://pallini.di.
uniroma1.it/ (22 February 2014, date last accessed).
51. Kreher D, Stinson D. Combinatorial Algorithms: Generation,
Enumeration snd Search. Florida: CRC Press LTC, 1998.
52. Alamgir Z, Abbasi S. Combinatorial algorithms for listing
paths in minimal change order. Comb Algorithmic Aspects
Netw 2007;4852:112–30.
53. Nijenhuis A, Wilf HS. Combinatorial Algorithms for Computers
and Calculators. London: Academic Press, 1978.
54. Maslov S, Sneppen K. Specificity and stability in topology
of protein networks. Science 2002;296(5569):910–13.
55. Omidi S, Schreiber F, Masoudi-Nejad A. MODA: an efficient algorithm for network motif discovery in biological
networks. Genes Genet Syst 2009;84:385–95.
56. Ribeiro P, Silva F. G-Tries: an efficient data structure for
discovering network motifs. Proceedings of the 2010 ACM
Symposium on Applied Computing 2010, Sierre, Switzerland;
1559–66.
57. gtrieScanner - Quick Discovery of Network Motifs. http://
www.dcc.fc.up.pt/gtries/ (28 December 2013, date last
accessed).
58. Li X, Stones DS, Wang H, et al. NetMODE: Network
motif detection without Nauty. PLoS One 2012;7(12):
e50093.
59. Meira LAA, Maximo VR, Fazenda L, et al. Accelerated
Motif Detection Using Combinatorial Techniques. Signal
ImageTechnology and Internet Based Systems (SITIS), 2012 Eighth
International Conference on 25-29 November, 2012;744–53.
60. Uri AlonLab. http://www.weizmann.ac.il/mcb/UriAlon/
(30 December 2013, date last accessed).
525
61. Barabasi AL, Albert R. Emergence of scaling in random
networks. Science 1999;286:509–12.
62. Achacoso TB, Yamamoto WS. AY’s Neuroanatomy of C. elegans
for Computation. Boca Roton, FL: CRC Press, 1992.
63. Uetz P, Giot L, Cagney G, et al. A comprehensive analysis
of protein-protein interactions in saccharomyces cerevisiae.
Nature 2000;403(6770):623–7.
64. Mewes HW, Frishman D, Guldener U, et al. Mips: a database for genomes and protein sequences. Nucleic Acids Res
2002;30(1):31–34.
65. The E. coli Database. http://www.kegg.com/ (2009, date
last accessed).
66. Lusseau D, Schneider K, Boisseau OJ, et al. The bottlenose
dolphin community of doubtful sound features a large proportion of long-lasting associations. Can geographic isolation
explain this unique trait? Behav Ecol Sociobiol 2003;54(4):
396–405.
67. Newman M. Network data. http://www-personal.umich.
edu/mejn/netdata/ (27 December 2013, date last
accessed).
68. Lancichinetti A, Fortunato S, Radicchi F. Benchmark
graphs for testing community detection algorithms. Phys
Rev E 2008;78:1–6.
69. Bu D, Zhao Y, Cai L, et al. Topological structure analysis of
the protein-protein interaction network in budding yeast.
Nucleic Acids Res 2003;31(9):2443–50.
70. Batagelj V, Mrvar A. Pajek datasets. http://vlado.fmf.uni-lj.
si/pub/networks/data/ (27 December 2013, date last
accessed).
71. Watts DJ, Strogatz SH. Collective dynamics of ‘smallworld’ networks. Nature 1998;393(6684):440–2.
72. Batagelj V, Mrvar A. Pajek datasets 2006. http://vlado.fmf.
uni-lj.si/pub/networks/data/ (30 December 2013, date last
accessed).
73. Kashtan N, Itzkovitz S, Milo E, et al. Topological generalizations of network motifs. Phys Rev E Stat Nonlin Soft Matter
Phys 2004;70(3 Pt 1):031909.
74. Baskerville K, Paczuski M. Subgraph ensembles and motif
discovery using a new heuristic for graph isomorphism.
Phys. Rev. 2006;74(5 Pt 1):051903.