2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Detection of Top-K Central Nodes in Social Networks: A Compressive Sensing Approach Hamidreza Mahyar Department of Computer Engineering, Sharif University of Technology (SUT), Email: [email protected] Abstract—In analysing the structural organization of a social network, identifying important nodes has been a fundamental problem. The concept of network centrality deals with the assessment of the relative importance of a particular node within the network. Most of the traditional network centrality definitions have a high computational cost and require full knowledge of network topological structure. On the one hand, in many applications we are only interested in detecting the top-k central nodes of the network with the largest values considering a specific centrality metric. On the other hand, it is not feasible to efficiently identify central nodes in a large real-world social network via calculation of centrality values for all nodes. As a result, recent years have witnessed increased attention toward the challenging problem of detecting top k central nodes in social networks with high accuracy and without full knowledge of network topology. To this end, we in this paper present a compressive sensing approach, called CS-TopCent, to efficiently identify such central nodes as a sparsity specification of social networks. Extensive simulation results demonstrate that our method would converge to an accurate solution for a wide range of social networks. Index Terms—Compressive Sensing; Detection of Central Nodes; Top k List of Nodes; Social Networks. I. I NTRODUCTION In recent years, the study of networks (collections of nodes joined in pairs by links) is an active area inspired mostly by the empirical study of real-world systems. They represent significant non-trivial topological features with patterns of connection between nodes that are neither purely random nor purely regular. Typical examples of these networks include large communication systems (e.g. Internet, telephone network, WWW), technological and transportation infrastructures (e.g. railroad and airline routes), biological systems (e.g. gene and/or protein interaction networks), information systems (e.g. network of citations between academic papers), and a variety of social interaction structures (e.g. online social networks) [1–3]. In analyzing the structural organization of a network, identifying important nodes has been a fundamental problem. Node importance can be utilized in sorting the search results of a search engine [4], identifying key actors in a terrorist network, controlling the spread of diseases in a biological network [5], cooperative localization in a wireless sensor network [6], preventing blackouts caused by cascading failure [7], detecting influential directors in a governance network [8], investigating absence of influential spreaders in rumor dynamics [9], and detecting key players and marketing targets in a social network [10]. By identifying such central nodes, one can efficiently devise strategies for prevention of diseases or crime, effective marketing plans and so on. ASONAM '15, August 25-28, 2015, Paris, France © 2015 ACM. ISBN 978-1-4503-3854-7/15/08 $15.00 DOI: http://dx.doi.org/10.1145/2808797.2808811 The concept of network centrality which is fundamentally the term in Social Network Analysis (SNA), deals with the assessment of the relative importance of a particular node within the network following some criteria. This concept has been around for decades and different kinds of measuring centrality have been proposed for a long time [11; 12]. By targeting a different goal, each of them suited to consider node centrality from a different point of view. Conventional measures of node centrality that considered in this paper are degree centrality and betweenness centrality. A good measure should usually include information from both global properties and local neighborhood, however many researchers consider these indicators together as a new one to identify central nodes in networks [13; 14]. On the one hand, most of the traditional network centrality definitions have a high computational cost and require full knowledge of network topological structure. For instance, the conventional betweenness centrality needs to solve the all-pairs shortest-paths (APSP) problem in the network, which has long known to be infeasible in large social networks. When complete structural information of a network is available, there exist approximation and exact approaches that can obtain the central nodes. However, for networks for which complete structural information is not available, these algorithms are no longer adequate for the task. [15] On the other hand, in many applications we are only interested in detecting top-k nodes with the largest values considering a specific centrality measure. It is often crucial to efficiently detect the top-k most central nodes of a network, while the exact order in the top-k list as well as the exact value of the node centrality are by far not so important [16]. For most purposes, the exact value of node centrality is irrelevant, however the relative importance of nodes specifically matters. Moreover, it is sufficient to identify set of nodes of similar importance for the vast majority of applications, hence, identification of the top-k most important nodes is remarkably more relevant than precisely ordering the nodes based on their relative centrality [17]. If the adjacency list of the network is known which is not often the case in social networks, the straightforward method, after measuring centrality values for all nodes, that comes to mind is to use one of the standard sorting algorithms like Quick sort or Heap sort. however, even their modest average complexity O(n log(n)) can be very high for large-scale social networks. So, it is natural to develop algorithms with high accuracy for efficiently computing high-quality approximations of the nodes centralities. A common method for this purpose then 902 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining is to utilize network sampling approaches. For estimating characteristics of a network, these algorithms must perform at least two steps [18]: (1) a subset of nodes in the network must be sampled, and (2) characteristics of interesting nodes must be estimated in the induced sub-graph consisting of the sampled nodes. It is noteworthy that the two sub-problems mentioned above yield two sources of error when estimating the top-k most central nodes of the network through sampling: (i) Sampling (Collection) error, due to the fact that only a partial view of the network might be available, and (ii) Identification error, due to the fact that even if a complete view of the network is available then the identification of the top k central nodes might be inaccurate. Furthermore, because of massive scale, distributed management, and access limitation of the real-world social networks, direct measurement of each individual node in sampling methods can be operationally difficult with too much overhead and cost. Consequently, proposing a new approach for efficiently detecting the top-k central nodes of a social network in an indirect manner without full knowledge of network topological structure to overcome the above shortcomings is an inevitable task in social network analysis. In this paper, we address this substantial problem. II. P ROBLEM S TATEMENT AND M AIN I DEA As previously stated, in a large number of real-life applications, we only require to efficiently detect the top-k central nodes of the network considering a specific centrality metric [16; 17]. However, it is not feasible to efficiently identify central nodes in a large real-world social network via calculation of centrality for all the nodes. In this case, a prevalent approach for this task is to use network sampling approaches [19]. In such methods, one should collect a subset of nodes as the sample set and then approximation of the centrality computation of sampled nodes is performed on the induced sub-graph. Finally, nodes with the high centrality values are selected as the top-k most central nodes of the network and the remaining nodes are completely discarded [20], which reminds us of compression algorithms. In these popular approaches, three major drawbacks can be seen: 1) Such approaches yield two sources of error; sampling (collection) error and identification (compression) error. 2) It is obvious that the approach of sampling with complete rate and then remove the least significant centrality coefficients leads to the loss of system resources. 3) Constructing the devices or proposing the algorithms that have the capability of sampling with complete rate and direct measurement of each individual node can be difficult, costly and sometimes impossible due to massive scale, distributed management, and access limitation of large real-world social networks. Thus, proposing an efficient approach to address the problem of identify the top-k central nodes in a social network and also overcome the aforementioned disadvantages is our main motivation for this paper. Two main questions with this kind of processing arise [21]: “Why go to so much effort to acquire all the data in sampling when most of what we get will be thrown away? Can not we just directly measure the part that will not end up being thrown away?”. In contrary to the conventional methods that acquire all the sample data first and then compress it, we in this paper use the compressive sensing theory which aims to sample and compress sparse signals, simultaneously. It indicates that by taking advantages of the sparsity property, one can efficiently and accurately recover high-dimensional vectors from a much smaller number of nonadaptive measurements or incomplete observations. In largescale social networks, it is remarkable to develop methods that can recover high-dimensional unknown node characteristics from a total number of measurements much smaller than their dimensions. This is still possible if we have prior knowledge about some properties of nodes, i.e. sparsity, in the networks. In our problem, The number of top-k central nodes is much smaller than the total number of all nodes, which is specifically the sparsity property in a social network. Compressive Sensing, also known as Compressed Sensing or Compressive Sampling (CS) [21–25] is a new research domain in signal processing and information theory that has recently drawn much attention for its capability to efficiently acquire and extract sparse information. For the last couple of years, CS has been attending in several fields such as astronomy, biology, image and video processing, medicine, and cognitive radio [26; 27] but its applications in networks [28; 29] are still in its early stages due to some challenges. One of the most limiting challenges is the construction of measurement matrix that should be feasible according to the two fundamental constraints: (1) Although, most existing results of CS rely critically on the assumption that any subset of vector entries can be aggregated together [21; 23], but this assumption is not necessarily true in the network monitoring problems where only nodes that induce a path or connected sub-graph can be aggregated together in the same measurement. In other words, measurements are limited by network topological constraints. (2) More substantially in networks, a measurement matrix is in a more restrictive class taking only non-negative integer entries, while random Gaussian measurement matrices usually used in the CS literature. As a result, compressive sensing in networks, in comparison with other CS problems, is entirely different and interesting in its own right because we can represent a network by its graph. Therefore, the main idea behind this paper is to propose a new approach, for the first time, to efficiently identify the topk high centrality nodes of the social networks in an indirect manner and without full knowledge of network structure via compressive sensing framework. III. M ODEL AND P ROBLEM F ORMULATION Consider the network G = (V, E) where V represents the set of nodes (vertices) with the cardinality |V | = n and E as the set of links (edges) with the cardinality |E|. We define the neighborhood set of node v ∈ V reachable in h hops as N h (v) = {v ′ ∈ V | v ′ ̸= v and dG (v, v ′ ) ≤ h}, where dG is the geodesic distance. Centrality provides the standard means to compare between nodes in networks. The simplest of all 903 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining centrality measures is the degree centrality [30]: |N 1 (v)| (1) n−1 which measures the connectivity of certain node v. However, the degree centrality of a node in a large social network may not be representative of its influence on the whole network. A more involved measure is closeness centrality which is defined by the average distance of all nodes in the network from v as [31]: ∑ dG (v, u) CD (v) = CC (v) = u∈V 6 ∥x∥p = ( n ∑ |x|p )1/p (4) i=1 Note that for p = 0, ∥x∥0 is the number of non-zero elements in x; for p = 1, ∥x∥1 is the summation of the absolute values of elements in x; for p = 2, ∥x∥2 is the usual Euclidean norm; and for p = ∞, ∥x∥∞ is the maximum of the absolute values in x. We call x is a k-sparse vector if ∥x∥0 = k, namely x has only k non-zero elements. In other words, the sparsity of the vector x is k. For instance, the top k central nodes have sparsity property in social networks, so that the number of these nodes are much smaller than the set of all nodes in the network. Suppose that we have m measurements over the network which are some connected sub-graphs over G. Based on compressive sensing in networks, we would like to efficiently identify the k central nodes from these m measurements considering network topological constraints. Let x ∈ Rn be a non-negative integer vector whose pth entry is the value over node p, and y ∈ Rm denotes the vector of m measurements whose q-th entry represents the total additive values of nodes in a connected sub-graph over G. Let A be an m × n measurement matrix where its i-th row corresponds to the i-th measurement. For i = 1, ..., m 2 3 4 6 9 8 5 3 7 10 4 Fig. 1: An example network with three measurements and j = 1, ..., n, Aij = 1 if and only if the i-th measurement includes node j, and zero otherwise. Hence, in the compact form we can write this linear system as: u,w∈N (v) where σuw is the number of equal distance shortest paths between node u and w and σ uw (v) is the number of those that pass through node v. After defining main centrality measures in social networks, let us model and formulate the problem of detecting topk central nodes in the networks using strong mathematical framework of compressive sensing. Considering the network G = (V, E), suppose every node i has a real value xi , and vector x = (xi | i = 1, 2, ..., n) is associated with the set V . ℓp -norm of vector x is defined as the following [21]: 2 5 (2) n−1 Since the above equation is really describing “farness”, it is also common to take the reciprocal of the above to justify the term “closeness”. The most popular measure is perhaps the betweenness centrality which measures the proportion of shortest paths in the network that go through node v and can ∞ be introduced by CB (v) where [32]: ∑ σ uw (v) h CB (v) = (3) σ uw h 1 1 ym×1 = Am×n xn×1 (5) For example, for the network in Fig. 1 with n = 6 nodes, |E| = 10 links and m = 3 path measurements, the feasible measurement matrix A for measuring node features is: m1 :v5 A= m2 :v1 m3 :v5 v1 v4 1 v3 1 v1 1 v2 1 0 0 v3 0 1 1 v4 1 1 0 v5 1 1 1 v6 0 1 0 (6) In compressive sensing, the set of sparse solutions to this system are of interest. Thus, we need to add a constraint to limit the solution space. Now, the main question is how to estimate the node vector x from the measurements vector y in the case of an under-determined system (m ≪ n). In this case, the system has numerous answers and based on the fundamental theory of linear algebra, reconstruction of unique vector is impossible. However, this is still possible if we add a constraint that the vector x is sufficiently sparse (e.g. the number of top-k central nodes is often much smaller than the set of all nodes), which is often a reasonable assumption in our mentioned problem (k ≪ n). It is important that sparse recovery over networks using compressive sensing has a closely related field called graph constrained group testing [33–37]. Group testing and compressive sensing over networks have the same requirements for measurement matrix and the differences are only in: (1) x is a logical vector in group testing, instead of real vector for the CS problem, and (2) the operations used in each group testing measurement are the logical “AND” and “OR”, in contrary to the additive linear mixing of the vector x over real numbers in compressive sensing. Note that compressive sensing can perform better than group testing based on the required number of measurements [38]. Hence, we have used compressive sensing throughout this paper. In addition, CS may abstractly model complex systems 904 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining even when the measurements from certain elements are not available. Therefore, our proposed approach can be potentially used in other applications besides social network analysis, i.e. understanding global diffusion of information. IV. T HE P ROPOSED M ETHOD : CS-T OP C ENT In this section, we propose a compressive sensing approach for detection of top-k central nodes (called CS-TopCent) in social networks. In this method, we construct a feasible measurement matrix A to infer social networks and identify the top k central nodes inside a network via indirect measurements. The pseudo code of the proposed method is shown in Algorithm 1. This algorithm generally includes 7 steps: (i) Every node v ∈ V locally computes its weight W (v) in lines (6)-(8). (ii) A first node is selected relative to P (v) which is calculated for all nodes v ∈ V in the graph G in lines (10)-(13). (iii) The transition matrix is constructed based on the transition probabilities Ptrans in lines (16)-(19), such that Ptrans (v, u) is the probability of moving from node v to node u. (iv) The next node is selected under two different options according to node existence in the neighbor set of current node, proportional to the probabilities Ptrans (vcurrent , u) in lines (15)-(25). The traversed link should not be visited any more by that measurement. (v) The update function is called in line (26) and performed according to the Algorithm 2. (vi) The steps (iii), (iv), and (v) are fulfilled ‘l’ times which is the length of a measurement, to generate a new row for the matrix A in lines (14)-(28). (vii) All the previous steps are repeated ‘m’ times to construct a feasible measurement matrix with ‘m’ measurements in lines (9)-(30). Now, we describe these steps in detail. As we want to recover the top-k central nodes as a sparse property in social networks, we try to traverse these nodes more than the other nodes by our measurements. To achieve this, we consider a weight over the nodes of the network based on local clustering coefficient [39], defined as the proportion of links between the nodes within its neighborhood divided by the number of links that could possibly exist between them. We assume each node knows its neighbor nodes. More formally, for the node v ∈ V , the local clustering coefficient is [39]: 2 {euw : u, w ∈ N 1 (v), euw ∈ E} ( ) C(v) = (7) |N 1 (v)| |N 1 (v)| − 1 where euw is the link between the nodes u and w. Computations of the node weight can be calculated in a distributed fashion by letting each node locally computes the local clustering coefficient using its degree and the degree of its neighbors. In this method, three situations for a link in the network G through the measurement construction may have happened: (1) A link is not selected by that measurement, (2) it is visited once by that measurement and then removed (never visited Algorithm 1 The Proposed Method: CS-TopCent Input: V (G), m, l 1: V (G): set of network nodes 2: m: number of measurements 3: l: number of measurement lengths 4: A = NULL /*Initializing Measurement Matrix*/ 5: Ptrans = NULL /*Initializing Transition Matrix*/ 6: Foreach v ∈ V do /*Local computation at each node*/ 2 {euw :u,w∈N 1 (v),euw ∈E} ( ) 7: W (v) = |N 1 (v)| |N 1 (v)|−1 8: end for 9: for i = 1 → m do 10: Foreach v ∈ V do /*First Node Selection*/ ( ) W (v) 1 11: P (v) = n−1 1− ∑ u∈V W (u) 12: end for 13: vcurrent = Select first node relative to P (v) 14: 15: 16: 17: for j = 1 → l do if ∃ u ∈ N 1 (vcurrent ) then /*Next Node Selection*/ Foreach u ∈ N 1 (vcurrent ) do ∑ Scoreu = 1 − v∈N 1 (u) W (u, v) u 18: Ptrans (vcurrent , u) = ∑Score u Scoreu 19: end for 20: vnext = Select next node relative to Ptrans (vcurrent , u) 21: N 1 (vcurrent ) = N 1 (vcurrent ) − {vnext } 22: N 1 (vnext ) = N 1 (vnext ) − {vcurrent } 23: else 24: vnext = Trace back to the previous node 25: end if 26: CALL update(Ptrans , vcurrent , vnext ) 27: vcurrent = vnext 28: end for 29: Add the visited nodes to the matrix A as a new row 30: end for Output: feasible measurement matrix A again by that measurement), and (3) it is visited once and if there needs back tracking to the previous node, it is visited for the second time. Note that after a link removal, we need to update the transition matrix, So the update function is called in step (v). As shown in Algorithm 2, we recalculate the transition probabilities for both vcurrent , vnext , and all their neighbors. We expect to have a more accurate method by this update function. In the proposed method, to efficiently recover central nodes in the data vector, we select a good start node for every m measurements and also assigning proper probabilities to the neighbors of current node for measuring the best next node, according to steps (ii), (iii), and (iv). For every measurement, we first select a good start node proportional to the probabilities P (v), and then select the next node relative to the probabilities Ptrans . The next node is chosen l times which is the length of a measurement, in step (vi). To calculate the transition probability, there are two steps: Scoring and Normalization, in step (iii). Because of link removal, it is possible that a node do not have any neighbor to select as a next node, thus, in this case we track back to the previous visited node, shown in line (24). The set of visited network nodes constructs a measurement as a new row in the measurement matrix A. 905 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Algorithm 2 The Update Algorithm for CS-TopCent: update(Ptrans , vcurrent , vnext ) Input: Ptrans , vcurrent , vnext 1: Ptrans : Transition Matrix 2: vcurrent : Current node 3: vnext : Next node 4: Ptrans (vcurrent , vnext ) = 0 5: 6: 7: 8: 9: 10: 11: 12: Foreach u ∈ N 1 (vcurrent ) do Recalculate Ptrans (vcurrent , u) Recalculate Ptrans (u, vcurrent ) end for Foreach u ∈ N 1 (vnext ) do Recalculate Ptrans (vnext , u) Recalculate Ptrans (u, vnext ) end for points in the figures, represent the mean value of the tests for all sets with its asymmetric standard deviation. To evaluate the accuracy of our approach, we can measure the precision and recall of the method. Precision refers to the number of correctly recovered nodes in the list of top-k central nodes divided by the total number of recovered nodes, and recall refers to the number of correctly recovered nodes in the list of top-k central nodes divided by the total number of nodes in the network. To avoid the trade-off between precision and recall and also to consider both, we use the F-measure metric. This metric presents the harmonic mean of both precision and recall, which is defined as: F-measure = 2 × Output: Ptrans Overall, we construct a feasible measurement matrix with non-negative integer entries by using m measurements with the step size of l, as stated in steps (vi) and (vii). In the proposed approach, each measurement go through a connected sub-graph which evidences feasibility of the measurement matrix A considering network topological constraints. After constructing the measurement matrix A via the CS-TopCent algorithm and adding the accumulative sum of values on the visited nodes to the vector y for each measurement, then we form the linear system of ym×1 = Am×n xn×1 . Finally, we want to find the sparse solution for this system, so we use the LASSO model [40; 41] as a reconstruction method for the optimization step which is defined by: min ∥x∥1 + ∥Ax − y∥22 x (8) We will experimentally evaluate the performance of our approach, CS-TopCent, with extensive simulations on various networks in the next section. V. E XPERIMENTAL E VALUATION In this section, we evaluate the performance of the proposed method, called CS-TopCent, under various configurations. First, we introduce the datasets we used for the evaluation. Next, we explain settings of the tests. Finally, the achieved results and their analyses are shown. A. Datasets We consider some well-known real-world social networks as test data: (1) NetSci - Coauthorship network of scientists [42], with 1589 nodes and 2742 links. (2) Zachary’s Karate Club [43] with 34 nodes and 78 links. (3) Dolphin Social Network [44] with 62 nodes and 159 links. (4) Les Miserables - Coappearance Network [45] with 77 nodes and 254 links. (5) Books about US Politics [46] with 105 nodes and 441 links. B. settings In each of the test cases for the datasets, we generated 10 set of measurements. For each network and each set of measurements, we performed the experiments. The denoted P recision × Recall P recision + Recall (9) The standard deviation in each figure quantifies the amount of variation of F-measures at each point. For the optimization step, we use SPAMS package on MATLAB [47]. In this paper, we consider two popular node centrality measures degree and betweenness centrality throughout the experiments. We evaluate our approach in two different scenarios: (1) for measuring the effect of compressive sensing in our approach for the problem, so the rankings produced by CS-TopCent in comparison with conventional methods for detection of top-k central nodes are shown, and (2) for comparing our method with the work in [48], RW, which is one of the state-of-the-art method for sparse recovery in networks via compressive sensing and indirect measurement of nodes. C. Evaluation Results Experiment 1 (Effect of Compressed Sensing): As previously stated, it is not feasible to efficiently identify top k central nodes in a social network via calculation of centrality values for all the nodes. In this case, a common approach for this task is to use network sampling methods, which have three major drawbacks mentioned in section II. Therefore, we suggest a new approach based on compressive sensing theory to efficiently detect the top k central nodes using indirect measurements. In Table I, we compare our proposed method CS-TopCent with traditional degree centrality ranking of the network nodes. For the two example network, the top 20 high degree nodes (by theirs IDs) are listed without any specific order based on both the conventional method and our proposed approach. In the conventional method, suppose we have the network topological structure and can directly measure each network node, thus we sort the nodes according to their degrees then select the top 20 list of high degree nodes. The recovery percentage in each network demonstrates that our proposed method can efficiently recover high degree centrality nodes even without direct measurement of network nodes and also without full knowledge of network topology according to the proposed approach. In Table II, we compare our proposed method CS-TopCent with the conventional method for calculating the top 10 list of high betweenness centrality nodes in two other networks. The recovery percentage shows the accuracy of the proposed 906 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining TABLE I: Effect of CS in detecting the top 20 high degree centrality nodes. In the first two columns for each network, the high degree centrality nodes via their IDs are listed without specific order by the conventional method and the proposed method CS-TopCent. The recovered column represents that whether the recovered nodes via our method exist in the top 20 list of high centrality nodes of conventional method. Books Degree CS-TopCent 9 4 13 9 4 13 85 31 73 41 67 59 74 67 31 72 12 73 41 74 48 85 10 7 75 10 76 12 11 48 72 87 87 5 14 8 59 11 77 14 % of recovery LesMis Recovered 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 85% Degree CS-TopCent 12 12 49 49 56 52 28 40 26 26 24 56 59 24 63 28 65 71 64 63 66 42 25 1 27 59 42 25 58 27 60 50 62 69 1 58 67 65 69 57 % of recovery Recovered 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 75% TABLE II: Effect of CS in detecting the top 10 high betweenness centrality nodes. In the first two columns for each network, the high betweenness centrality nodes via their IDs are listed without specific order by the conventional method and the proposed method CS-TopCent. The recovered column represents that whether the recovered nodes via our method exist in the top 10 list of high centrality nodes of conventional method. Karate Betwn. CS-TopCent 34 34 1 29 33 14 3 3 2 20 4 28 32 32 9 33 14 1 24 9 % of recovery Dolphin Recovered 1 0 1 1 0 0 1 1 1 1 70% Betwn. CS-TopCent 37 15 2 21 41 44 38 2 8 18 18 37 21 38 15 46 44 41 55 9 % of recovery Recovered 1 1 1 1 1 1 1 0 1 0 80% method. The values of k in detection of top-k central nodes are different for the two tests due to size of the networks. Experiment 2 (Effect of number of measurements in accuracy for degree centrality): Fig. 2 shows the performance evaluation of our method CS-TopCent in comparison with the RW method [48], in terms of accuracy for detection of high degree centrality nodes for different number of measurements. We set the length l of a measurement to n2 . Each point in the horizontal axis is proportional to the number of required measurements divided by the number of all nodes in the network. As it is shown, in all test cases, our CS-TopCent method performs better than RW in terms of having higher F-measure for the most number of measurements. In addition, our method gets higher F-measure even in small number of measurements (i.e. when the number of measurements is less than half of the number of existing nodes in the network) compared to RW. This improvement can be very important in the situations where performing measurements has a high cost and the goal is to do an acceptable recovery on a reasonable cost. Percentage of improvements for each network is stated below the figures. The reason for this improvement in recovery can be explored in many ways. First, in our approach we avoid traversing links repeatedly more than twice by the cases defined in the Algorithm 1. This leads to coverage of a greater part of the network, comparing to RW in which no particular measure is explicitly taken to avoid this issue. Second, an efficient neighbor selection method in the measurements leads to have a fair coverage of nodes. Third, after each transition we call the update function, shown in Algorithm 2, to consider all changes and have a more accurate solution. Experiment 3 (Effect of measurement length in accuracy for degree centrality): Fig. 3 shows the performance evaluation of our method CS-TopCent in comparison with the RW method, in terms of accuracy for detection of high degree centrality nodes for different length of measurements and fix number of measurements. In this experiment for all networks and for each percentage of recovery, we ran a set of measurements containing n5 measurements. It is noteworthy that we set the number of measurements to 20% of number of network nodes to show that our approach outperform the RW method even in small number of measurements. Each point in the horizontal axis is proportional to the length of the measurement divided by the number of all nodes in the network. As clearly depicted, in all test cases our proposed method have higher F-measure for the most lengths of the measurements. Percentage of improvements for each network is stated below the figures. Experiment 4 (Effect of number of measurements in accuracy for betweenness centrality): Fig. 4 shows the performance evaluation of our methods in comparison with the RW method according to detection of high betweenness centrality nodes for different number of measurements. The measurement length l is set to n2 . Each point in the horizontal axis is proportional to the number of required measurements divided by the number of all nodes. It is obvious that our method performs better than RW in all test cases in terms of having higher F-measure for the most number of measurements even in small number of measurements. The reason for the better results in recovery can be explored the same as the previous experiments. In [18], it is noted that degree centrality can be considered as an alias for identifying nodes with high betweenness centrality. To achieve this goal in the CS-TopCent framework, we measure both node importance from information flow standpoint and topological location of the node in the connected region. Therefore, our proposed method can efficiently detect central nodes based on both degree centrality and betweenness centrality in social networks. Experiment 5 (Effect of measurement length in accuracy for betweenness centrality): Fig. 5 shows the performance evaluation of our method compared to RW method, in terms of accuracy for detection of high betweenness centrality nodes for different length of measurements and fix number of measurements. The number of measurements is set to n5 for all 907 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (a) NetSci (Imp. = 91%) (b) Dolphin (Imp. = 4%) (c) Books (Imp. = 14%) (d) Karate (Imp. = 19%) (e) LesMis (Imp. = 3%) Fig. 2: Experiment 2: Effect of number of measurements in accuracy for degree centrality with measurements of length (a) NetSci (Imp. = 78%) (b) Dolphin (Imp. = 2%) (c) Books (Imp. = 18%) (d) Karate (Imp. = 19%) (e) LesMis (Imp. = 7%) Fig. 3: Experiment 3: Effect of measurement length in accuracy for degree centrality with the number of measurements networks and for each percentage of recovery. Each point in the horizontal axis is proportional to the length of the measurement divided by the number of all nodes. As clearly depicted, in all test cases our proposed methods have higher F-measure for the most lengths of the measurement. D. Complexity Analysis Consider the network G = (V, E). According to our assumption that each node keeps a hash table data structure for its neighbors, checking whether a node is a neighbor of another node can be done in nearly constant time. Then, a graph traversal algorithm can check whether every network node is mentioned in the neighbors list of neighbors, therfore, this can be locally done for each node vi in time |N 1 (vi )||N 1 (vi )−1|, where N 1 (v) is the neighbors set of node v. Since the above checking can be locally done and each node has at most n − 1 neighbors, the computational cost for the task could be O(n2 ) at worst-case, where |V | = n. The lines (10)-(12) of the Algorithm 1 can be performed outside the ∑ outer for loop and executed just once, thus, computation of u∈V W (u) costs O(n) and local computation of the value P (v) for any node v can be done in constant time. In the algorithm, selecting the best next node by checking the computed values at each node can also be done in O(n). The lines (15)-(25) in our algorithm can be easily done in O(n). Moreover, the update function, which is called in line (26), also costs at most O(n) to update transition probabilities of the current node and the next node. The next node assignment in line (27) is done in constant time. Therefore the final computation time will be O(n2 +m×l×n), where m is the number of measurements, n is the number of nodes, and l denotes the measurement length. The space complexity for the transition matrix is O(n2 ) and O(m × n) for the measurement matrix. In addition, each node locally stores information of its neighbors in O(n) maximum space. Therefore the space complexity will be O(n2 + m × n), where m is the number of measurements and n is the number of network nodes. n 2 n 5 VI. C ONCLUSION AND F UTURE W ORK In this paper, the problem of detecting the top-k central nodes in social networks is investigated. According to disadvantages of the sampling approaches such as sampling error, identification error, low precision, high computational cost, direct measurement of network nodes, full knowledge of topological structure, we have tried to address this problem. We proposed a new approach, called CS-TopCent, to construct the feasible measurement matrix for efficiently detecting the top-k nodes with a high specific centrality metric (i.e. degree centrality and betweenness centrality) in the social networks using compressive sensing theory. The simulation results demonstrate that our proposed approach can improve the accuracy of detecting high centrality nodes in comparison with the related works in terms of high F-measure score via indirect measurement, and also without full knowledge of network topological structure. As a future work, we are going to propose an efficient method for detection of top-k central nodes based on other centrality metrics such as Closeness centrality and PageRank centrality. VII. ACKNOWLEDGEMENTS I would like to thank my supervisors, Prof. Ali Movaghar and Prof. Hamid R. Rabiee, for the patient guidance, encouragement and advice they have provided throughout my time as their student. 908 R EFERENCES [1] S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, pp. 268–276, Mar. 2001. [2] R. Albert and A.-L. Barabasi, “Statistical mechanics of complex networks,” Rev. Mod. Phys., vol. 74, pp. 47–97, 2002. [3] S. Dorogovtsev and J. F. F. Mendes, “Evolution of networks,” Advances in Physics, vol. 51, pp. 1079–1187, 2002. [4] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” Computer networks and ISDN systems, vol. 30, pp. 107–117, 1998. [5] J. G. Liu, Z. M. Ren, and Q. Guo, “Ranking the spreading influence in complex networks,” Physica A, vol. 392, pp. 4154–4159, 2013. [6] N. Patwari, J. N. Ash, S. Kyperountas, A. O. Hero, R. L. Moses, and N. S. Correal, “Locating the nodes: cooperative localization in wireless sensor networks,” Signal Processing Magazine, IEEE, vol. 22, pp. 54–69, 2005. 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (a) NetSci (Imp. = 99%) (b) Dolphin (Imp. = 18%) (c) Books (Imp. = 24%) (d) Karate (Imp. = 20%) (e) LesMis (Imp. = 33%) Fig. 4: Experiment 4: Effect of number of measurements in accuracy for betweenness centrality with measurements of length (a) NetSci (Imp. = 93%) (b) Dolphin (Imp. = 23%) (c) Books (Imp. = 28%) (d) Karate (Imp. = 15%) (e) LesMis (Imp. = 40%) Fig. 5: Experiment 5: Effect of measurement length in accuracy for betweenness centrality with the number of measurements [7] A. E. Motter and Y. C. Lai, “Cascade-based attacks on complex networks,” Phys. Rev. E, vol. 6, 2002. [8] X. Huang, I. Vodenska, F. Wang, S. Havlin, and H. E. Stanley, “Identifying influential directors in the united states corporate governance network,” Phys. Rev. E, vol. 84, 2011. [9] J. Borge-Holthoefer and Y. Moreno, “Absence of influential spreaders in rumor dynamics,” Phys. Rev. E, vol. 85, 2012. [10] S. P. Borgatti, “Identifying sets of key players in a social network,” Computational and Mathematical Organization Theory, vol. 12, pp. 21–34, 2006. [11] L. Freeman, “A set of measures of centrality based on betweenness,” Sociometry, vol. 40, pp. 35–41, 1977. [12] G. Sabidussi, “The centrality index of a graph,” Psychometrika, vol. 31, pp. 581– 603, 1966. [13] C. H. Comin and L. D. Costa, “Evaluation of node importance in complex networks,” Phys. Rev. E, vol. 84, 2011. [14] Y. Yao and D. Liao, “Identifying all-around nodes for spreading dynamics in complex networks,” Physica A, vol. 391, pp. 4012–4017, 2012. [15] P. Pantazopoulos, M. Karaliopoulos, and I. Stavrakakis, “On the local approximations of node centrality in internet router-level topologies,” Self-Organizing Systems, vol. 8221, pp. 115–126, 2014. [16] K. Avrachenkov, N. Litvak, D. Nemirovsky, E. Smirnova, and M. Sokol, “Monte carlo methods for top-k personalized pagerank lists and name disambiguation,” INRIA, Tech Report RR-7367, 2010. [17] N. Kourtellis, T. Alahakoon, R. Simha, A. Iamnitchi, and R. Tripathi, “Identifying high betweenness centrality nodes in large social networks,” Social Network Analysis and Mining, vol. 3, pp. 899–914, 2013. [18] Y. Lim, D. S. Menasche, B. Ribeiro, D. Towsley, and P. Basu, “Online estimating the k central nodes of a network,” in IEEE Network Science Workshop, Jun. 2011, pp. 118–122. [19] P. Wang, J. Zhao, B. Ribeiro, J. C. Lui, D. Towsley, and X. Guan, “Practical characterization of large networks using neighborhood information,” arXiv:1311.3037v1, Nov. 2013. [20] A. S. Maiya and T. Y. Berger-Wolf, “Online sampling of high centrality individuals in social networks,” Advances in Knowledge Discovery and Data Mining, vol. 6118, pp. 91–98, 2010. [21] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [22] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, “Combining geometry and combinatorics: a unified approach to sparse signal recovery,” in 46th Annual Allerton Conference on Communication, Control, and Computing, Sep. 2008, pp. 798–805. [23] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [24] E. J. Candes, “Near-optimal signal recovery from random projections: Universal encoding strategies,” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, Dec. 2006. [25] D. Donoho and J. Tanner, “Sparse nonnegative solution of underdetermined linear equations by linear programming,” Natl. Acad. Sci. U.S.A., vol. 102, no. 27, pp. 9446–9451, Mar. 2005. [26] M. Davenport, M. Duarte, Y. Eldar, and G. Kutyniok, “Introduction to compressed sensing, chapter in compressed sensing: Theory and applications,” Cambridge University Press, 2012. [27] A. C. Sankaranarayanan, P. K. Turaga, R. Chellappa, and R. G. Baraniuk, “Compressive acquisition of dynamic scenes,” CoRR, abs/1201.4895, pp. 3747– 3752, 2012. [28] H. Mahyar, H. R. Rabiee, and Z. S. Hashemifar, “UCS-NT: An Unbiased Compressive Sensing Framework for Network Tomography,” in IEEE International [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] 909 n 2 n 5 Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, Canada, May 2013, pp. 4534–4538. H. Mahyar, H. R. Rabiee, Z. S. Hashemifar, and P. Siyari, “UCS-WN: An Unbiased Compressive Sensing Framework for Weighted Networks,” in Conference on Information Sciences and Systems, CISS 2013, Baltimore, USA, Mar. 2013. K. Avrachenkov, N. Litvak, M. Sokol, and D. Towsley, “Quick detection of nodes with large degrees,” in Proc. 9th Workshop on Algorithms and Models for the Web Graph, 2012, pp. 54–65. K. Okamoto, W. Chen, and X. Y. Li, “Ranking of closeness centrality for large-scale social networks,” Frontiers in Algorithmics, vol. 5059, pp. 186–195, 2008. N. Kourtellis, T. Alahakoon, R. Simha, A. Iamnitchi, and R. Tripathi, “Identifying high betweenness centrality nodes in large social networks,” Soc. Netw. Anal. and Mining, pp. 1–16, 2012. P. Babarczi, J. Tapolcai, and P. H. Ho, “Adjacent link failure localization with monitoring trails in all-optical mesh networks,” IEEE/ACM Trans. Netw., vol. 19, no. 3, pp. 907–920, Jun. 2011. M. Cheraghchi, A. Karbasi, S. Mohajer, and V. Saligrama, “Graph constrained group testing,” IEEE Trans. Inf. Theory, vol. 58, no. 1, pp. 248–262, Jan. 2012. N. Harvey, M. Patrascu, Y. Wen, S. Yekhanin, and V. Chan, “Nonadaptive fault diagnosis for all-optical networks via combinatorial group testing on graphs,” in IEEE INFOCOM, May 2007, pp. 697–705. J. Tapolcai, B. Wu, P. H. Ho, and L. Rnyai, “A novel approach for failure localization in all-optical mesh networks,” IEEE/ACM Trans. Netw., vol. 19, no. 1, pp. 275–285, Feb. 2011. B. Wu, P. H. Ho, J. Tapolcai, and X. Jiang, “A novel framework of fast and unambiguous link failure localization via monitoring trails,” in IEEE INFOCOM, Mar. 2010, pp. 1–5. M. Wang, W. Xu, E. Mallada, and A. Tang, “Sparse recovery with graph constraints: Fundamental limits and measurement construction,” in IEEE INFOCOM, Mar. 2012, pp. 1871–1879. D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998. R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society B, vol. 58, pp. 267–288, 1994. E. J. Candes, M. Rudelson, T. Tao, and R. Vershynin, “Error correction via linear programming,” in 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Oct. 2005, pp. 668–681. M. E. J. Newman, “Finding community structure in networks using the eigenvectors of matrices,” Preprint physics/0605087, 2006. W. W. Zachary, “An information flow model for conflict and fission in small groups,” Anthropological Research, vol. 33, no. 4, pp. 452–473, 1977. D. Lusseau, “The emergent properties of a dolphin social network,” Proceedings of the Royal Society of London. Series B: Biological Sciences, vol. 270, pp. S186– S188, Nov. 2003. D. E. Knuth, “The stanford graphbase: A platform for combinatorial computing,” Addison-Wesley, Reading, MA, 1993. Mark Newman, A collection of network data sets, August 2013, http://wwwpersonal.umich.edu/ mejn/netdata/. SPArse Modeling Software (SPAM), http://spams-devel.gforge.inria.fr/index.html. [Online]. Available: http://spams-devel.gforge.inria.fr/index.html W. Xu, E. Mallada, and A. Tang, “Compressive sensing over graphs,” in IEEE INFOCOM, Apr. 2011, pp. 2087–2095.
© Copyright 2026 Paperzz