Detection of Top-K Central Nodes in Social Networks: A Compressive Sensing Approach

2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Detection of Top-K Central Nodes in Social
Networks: A Compressive Sensing Approach
Hamidreza Mahyar
Department of Computer Engineering, Sharif University of Technology (SUT), Email: [email protected]
Abstract—In analysing the structural organization of a social
network, identifying important nodes has been a fundamental
problem. The concept of network centrality deals with the
assessment of the relative importance of a particular node within
the network. Most of the traditional network centrality definitions
have a high computational cost and require full knowledge
of network topological structure. On the one hand, in many
applications we are only interested in detecting the top-k central
nodes of the network with the largest values considering a specific
centrality metric. On the other hand, it is not feasible to efficiently
identify central nodes in a large real-world social network via
calculation of centrality values for all nodes. As a result, recent
years have witnessed increased attention toward the challenging
problem of detecting top k central nodes in social networks with
high accuracy and without full knowledge of network topology.
To this end, we in this paper present a compressive sensing
approach, called CS-TopCent, to efficiently identify such central
nodes as a sparsity specification of social networks. Extensive
simulation results demonstrate that our method would converge
to an accurate solution for a wide range of social networks.
Index Terms—Compressive Sensing; Detection of Central
Nodes; Top k List of Nodes; Social Networks.
I. I NTRODUCTION
In recent years, the study of networks (collections of nodes
joined in pairs by links) is an active area inspired mostly
by the empirical study of real-world systems. They represent
significant non-trivial topological features with patterns of
connection between nodes that are neither purely random nor
purely regular. Typical examples of these networks include
large communication systems (e.g. Internet, telephone network, WWW), technological and transportation infrastructures
(e.g. railroad and airline routes), biological systems (e.g.
gene and/or protein interaction networks), information systems
(e.g. network of citations between academic papers), and
a variety of social interaction structures (e.g. online social
networks) [1–3]. In analyzing the structural organization of a
network, identifying important nodes has been a fundamental
problem. Node importance can be utilized in sorting the
search results of a search engine [4], identifying key actors
in a terrorist network, controlling the spread of diseases in a
biological network [5], cooperative localization in a wireless
sensor network [6], preventing blackouts caused by cascading
failure [7], detecting influential directors in a governance
network [8], investigating absence of influential spreaders in
rumor dynamics [9], and detecting key players and marketing
targets in a social network [10]. By identifying such central
nodes, one can efficiently devise strategies for prevention of
diseases or crime, effective marketing plans and so on.
ASONAM '15, August 25-28, 2015, Paris, France
© 2015 ACM. ISBN 978-1-4503-3854-7/15/08 $15.00
DOI: http://dx.doi.org/10.1145/2808797.2808811
The concept of network centrality which is fundamentally
the term in Social Network Analysis (SNA), deals with the
assessment of the relative importance of a particular node
within the network following some criteria. This concept has
been around for decades and different kinds of measuring
centrality have been proposed for a long time [11; 12]. By
targeting a different goal, each of them suited to consider
node centrality from a different point of view. Conventional
measures of node centrality that considered in this paper are
degree centrality and betweenness centrality. A good measure
should usually include information from both global properties
and local neighborhood, however many researchers consider
these indicators together as a new one to identify central nodes
in networks [13; 14]. On the one hand, most of the traditional
network centrality definitions have a high computational cost
and require full knowledge of network topological structure.
For instance, the conventional betweenness centrality needs
to solve the all-pairs shortest-paths (APSP) problem in the
network, which has long known to be infeasible in large social
networks. When complete structural information of a network
is available, there exist approximation and exact approaches
that can obtain the central nodes. However, for networks for
which complete structural information is not available, these
algorithms are no longer adequate for the task. [15]
On the other hand, in many applications we are only
interested in detecting top-k nodes with the largest values
considering a specific centrality measure. It is often crucial to
efficiently detect the top-k most central nodes of a network,
while the exact order in the top-k list as well as the exact
value of the node centrality are by far not so important [16].
For most purposes, the exact value of node centrality is irrelevant, however the relative importance of nodes specifically
matters. Moreover, it is sufficient to identify set of nodes
of similar importance for the vast majority of applications,
hence, identification of the top-k most important nodes is
remarkably more relevant than precisely ordering the nodes
based on their relative centrality [17]. If the adjacency list
of the network is known which is not often the case in
social networks, the straightforward method, after measuring
centrality values for all nodes, that comes to mind is to use
one of the standard sorting algorithms like Quick sort or
Heap sort. however, even their modest average complexity
O(n log(n)) can be very high for large-scale social networks.
So, it is natural to develop algorithms with high accuracy
for efficiently computing high-quality approximations of the
nodes centralities. A common method for this purpose then
902
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
is to utilize network sampling approaches. For estimating
characteristics of a network, these algorithms must perform
at least two steps [18]: (1) a subset of nodes in the network
must be sampled, and (2) characteristics of interesting nodes
must be estimated in the induced sub-graph consisting of the
sampled nodes. It is noteworthy that the two sub-problems
mentioned above yield two sources of error when estimating
the top-k most central nodes of the network through sampling:
(i) Sampling (Collection) error, due to the fact that only
a partial view of the network might be available, and (ii)
Identification error, due to the fact that even if a complete view
of the network is available then the identification of the top
k central nodes might be inaccurate. Furthermore, because of
massive scale, distributed management, and access limitation
of the real-world social networks, direct measurement of each
individual node in sampling methods can be operationally
difficult with too much overhead and cost. Consequently,
proposing a new approach for efficiently detecting the top-k
central nodes of a social network in an indirect manner without
full knowledge of network topological structure to overcome
the above shortcomings is an inevitable task in social network
analysis. In this paper, we address this substantial problem.
II. P ROBLEM S TATEMENT AND M AIN I DEA
As previously stated, in a large number of real-life applications, we only require to efficiently detect the top-k
central nodes of the network considering a specific centrality
metric [16; 17]. However, it is not feasible to efficiently
identify central nodes in a large real-world social network
via calculation of centrality for all the nodes. In this case,
a prevalent approach for this task is to use network sampling
approaches [19]. In such methods, one should collect a subset
of nodes as the sample set and then approximation of the
centrality computation of sampled nodes is performed on the
induced sub-graph. Finally, nodes with the high centrality
values are selected as the top-k most central nodes of the network and the remaining nodes are completely discarded [20],
which reminds us of compression algorithms. In these popular
approaches, three major drawbacks can be seen:
1) Such approaches yield two sources of error; sampling
(collection) error and identification (compression) error.
2) It is obvious that the approach of sampling with complete rate and then remove the least significant centrality
coefficients leads to the loss of system resources.
3) Constructing the devices or proposing the algorithms
that have the capability of sampling with complete rate
and direct measurement of each individual node can be
difficult, costly and sometimes impossible due to massive
scale, distributed management, and access limitation of
large real-world social networks.
Thus, proposing an efficient approach to address the problem of identify the top-k central nodes in a social network
and also overcome the aforementioned disadvantages is our
main motivation for this paper. Two main questions with this
kind of processing arise [21]: “Why go to so much effort to
acquire all the data in sampling when most of what we get
will be thrown away? Can not we just directly measure the part
that will not end up being thrown away?”. In contrary to the
conventional methods that acquire all the sample data first and
then compress it, we in this paper use the compressive sensing
theory which aims to sample and compress sparse signals,
simultaneously. It indicates that by taking advantages of the
sparsity property, one can efficiently and accurately recover
high-dimensional vectors from a much smaller number of nonadaptive measurements or incomplete observations. In largescale social networks, it is remarkable to develop methods that
can recover high-dimensional unknown node characteristics
from a total number of measurements much smaller than their
dimensions. This is still possible if we have prior knowledge
about some properties of nodes, i.e. sparsity, in the networks.
In our problem, The number of top-k central nodes is much
smaller than the total number of all nodes, which is specifically
the sparsity property in a social network.
Compressive Sensing, also known as Compressed Sensing
or Compressive Sampling (CS) [21–25] is a new research
domain in signal processing and information theory that has
recently drawn much attention for its capability to efficiently
acquire and extract sparse information. For the last couple of
years, CS has been attending in several fields such as astronomy, biology, image and video processing, medicine, and cognitive radio [26; 27] but its applications in networks [28; 29]
are still in its early stages due to some challenges. One of the
most limiting challenges is the construction of measurement
matrix that should be feasible according to the two fundamental constraints: (1) Although, most existing results of CS rely
critically on the assumption that any subset of vector entries
can be aggregated together [21; 23], but this assumption is not
necessarily true in the network monitoring problems where
only nodes that induce a path or connected sub-graph can be
aggregated together in the same measurement. In other words,
measurements are limited by network topological constraints.
(2) More substantially in networks, a measurement matrix is
in a more restrictive class taking only non-negative integer
entries, while random Gaussian measurement matrices usually
used in the CS literature.
As a result, compressive sensing in networks, in comparison
with other CS problems, is entirely different and interesting in
its own right because we can represent a network by its graph.
Therefore, the main idea behind this paper is to propose a
new approach, for the first time, to efficiently identify the topk high centrality nodes of the social networks in an indirect
manner and without full knowledge of network structure via
compressive sensing framework.
III. M ODEL AND P ROBLEM F ORMULATION
Consider the network G = (V, E) where V represents the
set of nodes (vertices) with the cardinality |V | = n and E
as the set of links (edges) with the cardinality |E|. We define
the neighborhood set of node v ∈ V reachable in h hops as
N h (v) = {v ′ ∈ V | v ′ ̸= v and dG (v, v ′ ) ≤ h}, where dG is
the geodesic distance. Centrality provides the standard means
to compare between nodes in networks. The simplest of all
903
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
centrality measures is the degree centrality [30]:
|N 1 (v)|
(1)
n−1
which measures the connectivity of certain node v. However,
the degree centrality of a node in a large social network may
not be representative of its influence on the whole network. A
more involved measure is closeness centrality which is defined
by the average distance of all nodes in the network from v
as [31]:
∑
dG (v, u)
CD (v) =
CC (v) =
u∈V
6
∥x∥p = (
n
∑
|x|p )1/p
(4)
i=1
Note that for p = 0, ∥x∥0 is the number of non-zero
elements in x; for p = 1, ∥x∥1 is the summation of the
absolute values of elements in x; for p = 2, ∥x∥2 is the usual
Euclidean norm; and for p = ∞, ∥x∥∞ is the maximum of
the absolute values in x. We call x is a k-sparse vector if
∥x∥0 = k, namely x has only k non-zero elements. In other
words, the sparsity of the vector x is k. For instance, the top k
central nodes have sparsity property in social networks, so that
the number of these nodes are much smaller than the set of all
nodes in the network. Suppose that we have m measurements
over the network which are some connected sub-graphs over
G. Based on compressive sensing in networks, we would
like to efficiently identify the k central nodes from these m
measurements considering network topological constraints.
Let x ∈ Rn be a non-negative integer vector whose pth entry is the value over node p, and y ∈ Rm denotes the
vector of m measurements whose q-th entry represents the
total additive values of nodes in a connected sub-graph over
G. Let A be an m × n measurement matrix where its i-th
row corresponds to the i-th measurement. For i = 1, ..., m
2
3
4
6
9
8
5
3
7
10
4
Fig. 1: An example network with three measurements
and j = 1, ..., n, Aij = 1 if and only if the i-th measurement
includes node j, and zero otherwise. Hence, in the compact
form we can write this linear system as:
u,w∈N (v)
where σuw is the number of equal distance shortest paths
between node u and w and σ uw (v) is the number of those
that pass through node v.
After defining main centrality measures in social networks,
let us model and formulate the problem of detecting topk central nodes in the networks using strong mathematical
framework of compressive sensing. Considering the network
G = (V, E), suppose every node i has a real value xi , and
vector x = (xi | i = 1, 2, ..., n) is associated with the set V .
ℓp -norm of vector x is defined as the following [21]:
2
5
(2)
n−1
Since the above equation is really describing “farness”, it is
also common to take the reciprocal of the above to justify
the term “closeness”. The most popular measure is perhaps
the betweenness centrality which measures the proportion of
shortest paths in the network that go through node v and can
∞
be introduced by CB
(v) where [32]:
∑
σ uw (v)
h
CB
(v) =
(3)
σ uw
h
1
1
ym×1 = Am×n xn×1
(5)
For example, for the network in Fig. 1 with n = 6 nodes,
|E| = 10 links and m = 3 path measurements, the feasible
measurement matrix A for measuring node features is:
m1 :v5
A=
m2 :v1
m3 :v5
v1
v4
1
v3  1
v1
1

v2
1
0
0
v3
0
1
1
v4
1
1
0
v5
1
1
1
v6

0
1
0
(6)
In compressive sensing, the set of sparse solutions to this
system are of interest. Thus, we need to add a constraint to
limit the solution space. Now, the main question is how to
estimate the node vector x from the measurements vector
y in the case of an under-determined system (m ≪ n). In
this case, the system has numerous answers and based on the
fundamental theory of linear algebra, reconstruction of unique
vector is impossible. However, this is still possible if we add
a constraint that the vector x is sufficiently sparse (e.g. the
number of top-k central nodes is often much smaller than the
set of all nodes), which is often a reasonable assumption in
our mentioned problem (k ≪ n). It is important that sparse recovery over networks using compressive sensing has a closely
related field called graph constrained group testing [33–37].
Group testing and compressive sensing over networks have the
same requirements for measurement matrix and the differences
are only in: (1) x is a logical vector in group testing, instead
of real vector for the CS problem, and (2) the operations
used in each group testing measurement are the logical
“AND” and “OR”, in contrary to the additive linear mixing
of the vector x over real numbers in compressive sensing.
Note that compressive sensing can perform better than group
testing based on the required number of measurements [38].
Hence, we have used compressive sensing throughout this
paper. In addition, CS may abstractly model complex systems
904
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
even when the measurements from certain elements are not
available. Therefore, our proposed approach can be potentially
used in other applications besides social network analysis, i.e.
understanding global diffusion of information.
IV. T HE P ROPOSED M ETHOD : CS-T OP C ENT
In this section, we propose a compressive sensing approach
for detection of top-k central nodes (called CS-TopCent)
in social networks. In this method, we construct a feasible
measurement matrix A to infer social networks and identify
the top k central nodes inside a network via indirect measurements. The pseudo code of the proposed method is shown in
Algorithm 1. This algorithm generally includes 7 steps:
(i) Every node v ∈ V locally computes its weight W (v) in
lines (6)-(8).
(ii) A first node is selected relative to P (v) which is calculated for all nodes v ∈ V in the graph G in lines
(10)-(13).
(iii) The transition matrix is constructed based on the transition probabilities Ptrans in lines (16)-(19), such that
Ptrans (v, u) is the probability of moving from node v to
node u.
(iv) The next node is selected under two different options
according to node existence in the neighbor set of current
node, proportional to the probabilities Ptrans (vcurrent , u)
in lines (15)-(25). The traversed link should not be visited
any more by that measurement.
(v) The update function is called in line (26) and performed
according to the Algorithm 2.
(vi) The steps (iii), (iv), and (v) are fulfilled ‘l’ times which
is the length of a measurement, to generate a new row
for the matrix A in lines (14)-(28).
(vii) All the previous steps are repeated ‘m’ times to construct
a feasible measurement matrix with ‘m’ measurements in
lines (9)-(30).
Now, we describe these steps in detail. As we want to
recover the top-k central nodes as a sparse property in social
networks, we try to traverse these nodes more than the other
nodes by our measurements. To achieve this, we consider a
weight over the nodes of the network based on local clustering
coefficient [39], defined as the proportion of links between the
nodes within its neighborhood divided by the number of links
that could possibly exist between them. We assume each node
knows its neighbor nodes. More formally, for the node v ∈ V ,
the local clustering coefficient is [39]:
2 {euw : u, w ∈ N 1 (v), euw ∈ E}
(
)
C(v) =
(7)
|N 1 (v)| |N 1 (v)| − 1
where euw is the link between the nodes u and w. Computations of the node weight can be calculated in a distributed
fashion by letting each node locally computes the local clustering coefficient using its degree and the degree of its neighbors.
In this method, three situations for a link in the network G
through the measurement construction may have happened: (1)
A link is not selected by that measurement, (2) it is visited
once by that measurement and then removed (never visited
Algorithm 1 The Proposed Method: CS-TopCent
Input: V (G), m, l
1: V (G): set of network nodes
2: m: number of measurements
3: l: number of measurement lengths
4: A = NULL
/*Initializing Measurement Matrix*/
5: Ptrans = NULL
/*Initializing Transition Matrix*/
6: Foreach v ∈ V do
/*Local computation at each node*/
2 {euw :u,w∈N 1 (v),euw ∈E}
(
)
7:
W (v) =
|N 1 (v)| |N 1 (v)|−1
8: end for
9: for i = 1 → m do
10:
Foreach v ∈ V do
/*First Node Selection*/
(
)
W (v)
1
11:
P (v) = n−1
1− ∑
u∈V W (u)
12:
end for
13:
vcurrent = Select first node relative to P (v)
14:
15:
16:
17:
for j = 1 → l do
if ∃ u ∈ N 1 (vcurrent ) then
/*Next Node Selection*/
Foreach u ∈ N 1 (vcurrent ) do
∑
Scoreu = 1 − v∈N 1 (u) W (u, v)
u
18:
Ptrans (vcurrent , u) = ∑Score
u Scoreu
19:
end for
20:
vnext = Select next node relative to Ptrans (vcurrent , u)
21:
N 1 (vcurrent ) = N 1 (vcurrent ) − {vnext }
22:
N 1 (vnext ) = N 1 (vnext ) − {vcurrent }
23:
else
24:
vnext = Trace back to the previous node
25:
end if
26:
CALL update(Ptrans , vcurrent , vnext )
27:
vcurrent = vnext
28:
end for
29:
Add the visited nodes to the matrix A as a new row
30: end for
Output: feasible measurement matrix A
again by that measurement), and (3) it is visited once and if
there needs back tracking to the previous node, it is visited for
the second time. Note that after a link removal, we need to
update the transition matrix, So the update function is called
in step (v). As shown in Algorithm 2, we recalculate the
transition probabilities for both vcurrent , vnext , and all their
neighbors. We expect to have a more accurate method by this
update function.
In the proposed method, to efficiently recover central nodes
in the data vector, we select a good start node for every
m measurements and also assigning proper probabilities to
the neighbors of current node for measuring the best next
node, according to steps (ii), (iii), and (iv). For every
measurement, we first select a good start node proportional to
the probabilities P (v), and then select the next node relative
to the probabilities Ptrans . The next node is chosen l times
which is the length of a measurement, in step (vi). To calculate
the transition probability, there are two steps: Scoring and
Normalization, in step (iii). Because of link removal, it is
possible that a node do not have any neighbor to select
as a next node, thus, in this case we track back to the
previous visited node, shown in line (24). The set of visited
network nodes constructs a measurement as a new row in the
measurement matrix A.
905
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Algorithm 2 The Update Algorithm for CS-TopCent:
update(Ptrans , vcurrent , vnext )
Input: Ptrans , vcurrent , vnext
1: Ptrans : Transition Matrix
2: vcurrent : Current node
3: vnext : Next node
4: Ptrans (vcurrent , vnext ) = 0
5:
6:
7:
8:
9:
10:
11:
12:
Foreach u ∈ N 1 (vcurrent ) do
Recalculate Ptrans (vcurrent , u)
Recalculate Ptrans (u, vcurrent )
end for
Foreach u ∈ N 1 (vnext ) do
Recalculate Ptrans (vnext , u)
Recalculate Ptrans (u, vnext )
end for
points in the figures, represent the mean value of the tests for
all sets with its asymmetric standard deviation. To evaluate
the accuracy of our approach, we can measure the precision
and recall of the method. Precision refers to the number of
correctly recovered nodes in the list of top-k central nodes
divided by the total number of recovered nodes, and recall
refers to the number of correctly recovered nodes in the list
of top-k central nodes divided by the total number of nodes
in the network. To avoid the trade-off between precision and
recall and also to consider both, we use the F-measure metric.
This metric presents the harmonic mean of both precision and
recall, which is defined as:
F-measure = 2 ×
Output: Ptrans
Overall, we construct a feasible measurement matrix with
non-negative integer entries by using m measurements with
the step size of l, as stated in steps (vi) and (vii). In the
proposed approach, each measurement go through a connected
sub-graph which evidences feasibility of the measurement
matrix A considering network topological constraints. After
constructing the measurement matrix A via the CS-TopCent
algorithm and adding the accumulative sum of values on the
visited nodes to the vector y for each measurement, then we
form the linear system of ym×1 = Am×n xn×1 . Finally, we
want to find the sparse solution for this system, so we use
the LASSO model [40; 41] as a reconstruction method for the
optimization step which is defined by:
min ∥x∥1 + ∥Ax − y∥22
x
(8)
We will experimentally evaluate the performance of our
approach, CS-TopCent, with extensive simulations on various
networks in the next section.
V. E XPERIMENTAL E VALUATION
In this section, we evaluate the performance of the proposed
method, called CS-TopCent, under various configurations.
First, we introduce the datasets we used for the evaluation.
Next, we explain settings of the tests. Finally, the achieved
results and their analyses are shown.
A. Datasets
We consider some well-known real-world social networks
as test data: (1) NetSci - Coauthorship network of scientists [42], with 1589 nodes and 2742 links. (2) Zachary’s
Karate Club [43] with 34 nodes and 78 links. (3) Dolphin
Social Network [44] with 62 nodes and 159 links. (4) Les
Miserables - Coappearance Network [45] with 77 nodes and
254 links. (5) Books about US Politics [46] with 105 nodes
and 441 links.
B. settings
In each of the test cases for the datasets, we generated
10 set of measurements. For each network and each set of
measurements, we performed the experiments. The denoted
P recision × Recall
P recision + Recall
(9)
The standard deviation in each figure quantifies the amount
of variation of F-measures at each point. For the optimization
step, we use SPAMS package on MATLAB [47].
In this paper, we consider two popular node centrality
measures degree and betweenness centrality throughout the
experiments. We evaluate our approach in two different scenarios: (1) for measuring the effect of compressive sensing
in our approach for the problem, so the rankings produced
by CS-TopCent in comparison with conventional methods
for detection of top-k central nodes are shown, and (2) for
comparing our method with the work in [48], RW, which is one
of the state-of-the-art method for sparse recovery in networks
via compressive sensing and indirect measurement of nodes.
C. Evaluation Results
Experiment 1 (Effect of Compressed Sensing): As previously stated, it is not feasible to efficiently identify top k
central nodes in a social network via calculation of centrality
values for all the nodes. In this case, a common approach for
this task is to use network sampling methods, which have
three major drawbacks mentioned in section II. Therefore,
we suggest a new approach based on compressive sensing
theory to efficiently detect the top k central nodes using
indirect measurements. In Table I, we compare our proposed
method CS-TopCent with traditional degree centrality ranking
of the network nodes. For the two example network, the top
20 high degree nodes (by theirs IDs) are listed without any
specific order based on both the conventional method and
our proposed approach. In the conventional method, suppose
we have the network topological structure and can directly
measure each network node, thus we sort the nodes according
to their degrees then select the top 20 list of high degree nodes.
The recovery percentage in each network demonstrates that our
proposed method can efficiently recover high degree centrality
nodes even without direct measurement of network nodes and
also without full knowledge of network topology according to
the proposed approach.
In Table II, we compare our proposed method CS-TopCent
with the conventional method for calculating the top 10 list
of high betweenness centrality nodes in two other networks.
The recovery percentage shows the accuracy of the proposed
906
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
TABLE I:
Effect of CS in detecting the top 20 high degree centrality nodes. In
the first two columns for each network, the high degree centrality nodes via their IDs
are listed without specific order by the conventional method and the proposed method
CS-TopCent. The recovered column represents that whether the recovered nodes via our
method exist in the top 20 list of high centrality nodes of conventional method.
Books
Degree
CS-TopCent
9
4
13
9
4
13
85
31
73
41
67
59
74
67
31
72
12
73
41
74
48
85
10
7
75
10
76
12
11
48
72
87
87
5
14
8
59
11
77
14
% of recovery
LesMis
Recovered
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
0
0
1
1
85%
Degree
CS-TopCent
12
12
49
49
56
52
28
40
26
26
24
56
59
24
63
28
65
71
64
63
66
42
25
1
27
59
42
25
58
27
60
50
62
69
1
58
67
65
69
57
% of recovery
Recovered
1
1
0
0
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
0
75%
TABLE II: Effect of CS in detecting the top 10 high betweenness centrality nodes.
In the first two columns for each network, the high betweenness centrality nodes via
their IDs are listed without specific order by the conventional method and the proposed
method CS-TopCent. The recovered column represents that whether the recovered nodes
via our method exist in the top 10 list of high centrality nodes of conventional method.
Karate
Betwn.
CS-TopCent
34
34
1
29
33
14
3
3
2
20
4
28
32
32
9
33
14
1
24
9
% of recovery
Dolphin
Recovered
1
0
1
1
0
0
1
1
1
1
70%
Betwn.
CS-TopCent
37
15
2
21
41
44
38
2
8
18
18
37
21
38
15
46
44
41
55
9
% of recovery
Recovered
1
1
1
1
1
1
1
0
1
0
80%
method. The values of k in detection of top-k central nodes
are different for the two tests due to size of the networks.
Experiment 2 (Effect of number of measurements in accuracy for degree centrality): Fig. 2 shows the performance
evaluation of our method CS-TopCent in comparison with the
RW method [48], in terms of accuracy for detection of high
degree centrality nodes for different number of measurements.
We set the length l of a measurement to n2 . Each point in
the horizontal axis is proportional to the number of required
measurements divided by the number of all nodes in the
network. As it is shown, in all test cases, our CS-TopCent
method performs better than RW in terms of having higher
F-measure for the most number of measurements. In addition,
our method gets higher F-measure even in small number of
measurements (i.e. when the number of measurements is less
than half of the number of existing nodes in the network)
compared to RW.
This improvement can be very important in the situations
where performing measurements has a high cost and the goal is
to do an acceptable recovery on a reasonable cost. Percentage
of improvements for each network is stated below the figures.
The reason for this improvement in recovery can be explored
in many ways. First, in our approach we avoid traversing
links repeatedly more than twice by the cases defined in the
Algorithm 1. This leads to coverage of a greater part of the
network, comparing to RW in which no particular measure
is explicitly taken to avoid this issue. Second, an efficient
neighbor selection method in the measurements leads to have
a fair coverage of nodes. Third, after each transition we call
the update function, shown in Algorithm 2, to consider all
changes and have a more accurate solution.
Experiment 3 (Effect of measurement length in accuracy for degree centrality): Fig. 3 shows the performance
evaluation of our method CS-TopCent in comparison with
the RW method, in terms of accuracy for detection of high
degree centrality nodes for different length of measurements
and fix number of measurements. In this experiment for all
networks and for each percentage of recovery, we ran a set of
measurements containing n5 measurements. It is noteworthy
that we set the number of measurements to 20% of number
of network nodes to show that our approach outperform the
RW method even in small number of measurements. Each
point in the horizontal axis is proportional to the length of
the measurement divided by the number of all nodes in the
network. As clearly depicted, in all test cases our proposed
method have higher F-measure for the most lengths of the
measurements. Percentage of improvements for each network
is stated below the figures.
Experiment 4 (Effect of number of measurements in
accuracy for betweenness centrality): Fig. 4 shows the
performance evaluation of our methods in comparison with
the RW method according to detection of high betweenness
centrality nodes for different number of measurements. The
measurement length l is set to n2 . Each point in the horizontal
axis is proportional to the number of required measurements
divided by the number of all nodes. It is obvious that our
method performs better than RW in all test cases in terms
of having higher F-measure for the most number of measurements even in small number of measurements. The reason for
the better results in recovery can be explored the same as the
previous experiments.
In [18], it is noted that degree centrality can be considered
as an alias for identifying nodes with high betweenness centrality. To achieve this goal in the CS-TopCent framework, we
measure both node importance from information flow standpoint and topological location of the node in the connected
region. Therefore, our proposed method can efficiently detect
central nodes based on both degree centrality and betweenness
centrality in social networks.
Experiment 5 (Effect of measurement length in accuracy
for betweenness centrality): Fig. 5 shows the performance
evaluation of our method compared to RW method, in terms
of accuracy for detection of high betweenness centrality nodes
for different length of measurements and fix number of measurements. The number of measurements is set to n5 for all
907
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
(a) NetSci (Imp. = 91%)
(b) Dolphin (Imp. = 4%)
(c) Books (Imp. = 14%)
(d) Karate (Imp. = 19%)
(e) LesMis (Imp. = 3%)
Fig. 2: Experiment 2: Effect of number of measurements in accuracy for degree centrality with measurements of length
(a) NetSci (Imp. = 78%)
(b) Dolphin (Imp. = 2%)
(c) Books (Imp. = 18%)
(d) Karate (Imp. = 19%)
(e) LesMis (Imp. = 7%)
Fig. 3: Experiment 3: Effect of measurement length in accuracy for degree centrality with the number of measurements
networks and for each percentage of recovery. Each point
in the horizontal axis is proportional to the length of the
measurement divided by the number of all nodes. As clearly
depicted, in all test cases our proposed methods have higher
F-measure for the most lengths of the measurement.
D. Complexity Analysis
Consider the network G = (V, E). According to our
assumption that each node keeps a hash table data structure for
its neighbors, checking whether a node is a neighbor of another
node can be done in nearly constant time. Then, a graph
traversal algorithm can check whether every network node is
mentioned in the neighbors list of neighbors, therfore, this can
be locally done for each node vi in time |N 1 (vi )||N 1 (vi )−1|,
where N 1 (v) is the neighbors set of node v. Since the above
checking can be locally done and each node has at most n − 1
neighbors, the computational cost for the task could be O(n2 )
at worst-case, where |V | = n. The lines (10)-(12) of the
Algorithm 1 can be performed outside the ∑
outer for loop and
executed just once, thus, computation of u∈V W (u) costs
O(n) and local computation of the value P (v) for any node v
can be done in constant time. In the algorithm, selecting the
best next node by checking the computed values at each node
can also be done in O(n). The lines (15)-(25) in our algorithm
can be easily done in O(n). Moreover, the update function,
which is called in line (26), also costs at most O(n) to update
transition probabilities of the current node and the next node.
The next node assignment in line (27) is done in constant time.
Therefore the final computation time will be O(n2 +m×l×n),
where m is the number of measurements, n is the number of
nodes, and l denotes the measurement length.
The space complexity for the transition matrix is O(n2 ) and
O(m × n) for the measurement matrix. In addition, each node
locally stores information of its neighbors in O(n) maximum
space. Therefore the space complexity will be O(n2 + m × n),
where m is the number of measurements and n is the number
of network nodes.
n
2
n
5
VI. C ONCLUSION AND F UTURE W ORK
In this paper, the problem of detecting the top-k central nodes in social networks is investigated. According to
disadvantages of the sampling approaches such as sampling
error, identification error, low precision, high computational
cost, direct measurement of network nodes, full knowledge of
topological structure, we have tried to address this problem.
We proposed a new approach, called CS-TopCent, to construct the feasible measurement matrix for efficiently detecting
the top-k nodes with a high specific centrality metric (i.e.
degree centrality and betweenness centrality) in the social
networks using compressive sensing theory. The simulation
results demonstrate that our proposed approach can improve
the accuracy of detecting high centrality nodes in comparison
with the related works in terms of high F-measure score via
indirect measurement, and also without full knowledge of
network topological structure. As a future work, we are going
to propose an efficient method for detection of top-k central
nodes based on other centrality metrics such as Closeness
centrality and PageRank centrality.
VII. ACKNOWLEDGEMENTS
I would like to thank my supervisors, Prof. Ali Movaghar
and Prof. Hamid R. Rabiee, for the patient guidance, encouragement and advice they have provided throughout my time
as their student.
908
R EFERENCES
[1] S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, pp. 268–276, Mar.
2001.
[2] R. Albert and A.-L. Barabasi, “Statistical mechanics of complex networks,” Rev.
Mod. Phys., vol. 74, pp. 47–97, 2002.
[3] S. Dorogovtsev and J. F. F. Mendes, “Evolution of networks,” Advances in Physics,
vol. 51, pp. 1079–1187, 2002.
[4] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,”
Computer networks and ISDN systems, vol. 30, pp. 107–117, 1998.
[5] J. G. Liu, Z. M. Ren, and Q. Guo, “Ranking the spreading influence in complex
networks,” Physica A, vol. 392, pp. 4154–4159, 2013.
[6] N. Patwari, J. N. Ash, S. Kyperountas, A. O. Hero, R. L. Moses, and N. S. Correal,
“Locating the nodes: cooperative localization in wireless sensor networks,” Signal
Processing Magazine, IEEE, vol. 22, pp. 54–69, 2005.
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
(a) NetSci (Imp. = 99%)
(b) Dolphin (Imp. = 18%)
(c) Books (Imp. = 24%)
(d) Karate (Imp. = 20%)
(e) LesMis (Imp. = 33%)
Fig. 4: Experiment 4: Effect of number of measurements in accuracy for betweenness centrality with measurements of length
(a) NetSci (Imp. = 93%)
(b) Dolphin (Imp. = 23%)
(c) Books (Imp. = 28%)
(d) Karate (Imp. = 15%)
(e) LesMis (Imp. = 40%)
Fig. 5: Experiment 5: Effect of measurement length in accuracy for betweenness centrality with the number of measurements
[7] A. E. Motter and Y. C. Lai, “Cascade-based attacks on complex networks,” Phys.
Rev. E, vol. 6, 2002.
[8] X. Huang, I. Vodenska, F. Wang, S. Havlin, and H. E. Stanley, “Identifying
influential directors in the united states corporate governance network,” Phys. Rev.
E, vol. 84, 2011.
[9] J. Borge-Holthoefer and Y. Moreno, “Absence of influential spreaders in rumor
dynamics,” Phys. Rev. E, vol. 85, 2012.
[10] S. P. Borgatti, “Identifying sets of key players in a social network,” Computational
and Mathematical Organization Theory, vol. 12, pp. 21–34, 2006.
[11] L. Freeman, “A set of measures of centrality based on betweenness,” Sociometry,
vol. 40, pp. 35–41, 1977.
[12] G. Sabidussi, “The centrality index of a graph,” Psychometrika, vol. 31, pp. 581–
603, 1966.
[13] C. H. Comin and L. D. Costa, “Evaluation of node importance in complex
networks,” Phys. Rev. E, vol. 84, 2011.
[14] Y. Yao and D. Liao, “Identifying all-around nodes for spreading dynamics in
complex networks,” Physica A, vol. 391, pp. 4012–4017, 2012.
[15] P. Pantazopoulos, M. Karaliopoulos, and I. Stavrakakis, “On the local approximations of node centrality in internet router-level topologies,” Self-Organizing Systems,
vol. 8221, pp. 115–126, 2014.
[16] K. Avrachenkov, N. Litvak, D. Nemirovsky, E. Smirnova, and M. Sokol, “Monte
carlo methods for top-k personalized pagerank lists and name disambiguation,”
INRIA, Tech Report RR-7367, 2010.
[17] N. Kourtellis, T. Alahakoon, R. Simha, A. Iamnitchi, and R. Tripathi, “Identifying
high betweenness centrality nodes in large social networks,” Social Network
Analysis and Mining, vol. 3, pp. 899–914, 2013.
[18] Y. Lim, D. S. Menasche, B. Ribeiro, D. Towsley, and P. Basu, “Online estimating
the k central nodes of a network,” in IEEE Network Science Workshop, Jun. 2011,
pp. 118–122.
[19] P. Wang, J. Zhao, B. Ribeiro, J. C. Lui, D. Towsley, and X. Guan, “Practical characterization of large networks using neighborhood information,” arXiv:1311.3037v1,
Nov. 2013.
[20] A. S. Maiya and T. Y. Berger-Wolf, “Online sampling of high centrality individuals
in social networks,” Advances in Knowledge Discovery and Data Mining, vol. 6118,
pp. 91–98, 2010.
[21] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp.
1289–1306, Apr. 2006.
[22] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, “Combining geometry
and combinatorics: a unified approach to sparse signal recovery,” in 46th Annual
Allerton Conference on Communication, Control, and Computing, Sep. 2008, pp.
798–805.
[23] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf.
Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005.
[24] E. J. Candes, “Near-optimal signal recovery from random projections: Universal
encoding strategies,” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, Dec.
2006.
[25] D. Donoho and J. Tanner, “Sparse nonnegative solution of underdetermined linear
equations by linear programming,” Natl. Acad. Sci. U.S.A., vol. 102, no. 27, pp.
9446–9451, Mar. 2005.
[26] M. Davenport, M. Duarte, Y. Eldar, and G. Kutyniok, “Introduction to compressed
sensing, chapter in compressed sensing: Theory and applications,” Cambridge
University Press, 2012.
[27] A. C. Sankaranarayanan, P. K. Turaga, R. Chellappa, and R. G. Baraniuk,
“Compressive acquisition of dynamic scenes,” CoRR, abs/1201.4895, pp. 3747–
3752, 2012.
[28] H. Mahyar, H. R. Rabiee, and Z. S. Hashemifar, “UCS-NT: An Unbiased Compressive Sensing Framework for Network Tomography,” in IEEE International
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
909
n
2
n
5
Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver,
Canada, May 2013, pp. 4534–4538.
H. Mahyar, H. R. Rabiee, Z. S. Hashemifar, and P. Siyari, “UCS-WN: An
Unbiased Compressive Sensing Framework for Weighted Networks,” in Conference
on Information Sciences and Systems, CISS 2013, Baltimore, USA, Mar. 2013.
K. Avrachenkov, N. Litvak, M. Sokol, and D. Towsley, “Quick detection of nodes
with large degrees,” in Proc. 9th Workshop on Algorithms and Models for the Web
Graph, 2012, pp. 54–65.
K. Okamoto, W. Chen, and X. Y. Li, “Ranking of closeness centrality for large-scale
social networks,” Frontiers in Algorithmics, vol. 5059, pp. 186–195, 2008.
N. Kourtellis, T. Alahakoon, R. Simha, A. Iamnitchi, and R. Tripathi, “Identifying
high betweenness centrality nodes in large social networks,” Soc. Netw. Anal. and
Mining, pp. 1–16, 2012.
P. Babarczi, J. Tapolcai, and P. H. Ho, “Adjacent link failure localization with
monitoring trails in all-optical mesh networks,” IEEE/ACM Trans. Netw., vol. 19,
no. 3, pp. 907–920, Jun. 2011.
M. Cheraghchi, A. Karbasi, S. Mohajer, and V. Saligrama, “Graph constrained
group testing,” IEEE Trans. Inf. Theory, vol. 58, no. 1, pp. 248–262, Jan. 2012.
N. Harvey, M. Patrascu, Y. Wen, S. Yekhanin, and V. Chan, “Nonadaptive fault
diagnosis for all-optical networks via combinatorial group testing on graphs,” in
IEEE INFOCOM, May 2007, pp. 697–705.
J. Tapolcai, B. Wu, P. H. Ho, and L. Rnyai, “A novel approach for failure
localization in all-optical mesh networks,” IEEE/ACM Trans. Netw., vol. 19, no. 1,
pp. 275–285, Feb. 2011.
B. Wu, P. H. Ho, J. Tapolcai, and X. Jiang, “A novel framework of fast and
unambiguous link failure localization via monitoring trails,” in IEEE INFOCOM,
Mar. 2010, pp. 1–5.
M. Wang, W. Xu, E. Mallada, and A. Tang, “Sparse recovery with graph constraints:
Fundamental limits and measurement construction,” in IEEE INFOCOM, Mar.
2012, pp. 1871–1879.
D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world networks,”
Nature, vol. 393, no. 6684, pp. 440–442, 1998.
R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the
Royal Statistical Society B, vol. 58, pp. 267–288, 1994.
E. J. Candes, M. Rudelson, T. Tao, and R. Vershynin, “Error correction via linear
programming,” in 46th Annual IEEE Symposium on Foundations of Computer
Science (FOCS), Oct. 2005, pp. 668–681.
M. E. J. Newman, “Finding community structure in networks using the eigenvectors
of matrices,” Preprint physics/0605087, 2006.
W. W. Zachary, “An information flow model for conflict and fission in small
groups,” Anthropological Research, vol. 33, no. 4, pp. 452–473, 1977.
D. Lusseau, “The emergent properties of a dolphin social network,” Proceedings
of the Royal Society of London. Series B: Biological Sciences, vol. 270, pp. S186–
S188, Nov. 2003.
D. E. Knuth, “The stanford graphbase: A platform for combinatorial computing,”
Addison-Wesley, Reading, MA, 1993.
Mark Newman, A collection of network data sets, August 2013, http://wwwpersonal.umich.edu/ mejn/netdata/.
SPArse Modeling Software (SPAM), http://spams-devel.gforge.inria.fr/index.html.
[Online]. Available: http://spams-devel.gforge.inria.fr/index.html
W. Xu, E. Mallada, and A. Tang, “Compressive sensing over graphs,” in IEEE
INFOCOM, Apr. 2011, pp. 2087–2095.