Hierarchies in directed networks Nikolaj Tatti HIIT, Aalto University, Espoo, Finland, [email protected] Abstract—Interactions in many real-world phenomena can be explained by a strong hierarchical structure. Typically, this structure or ranking is not known; instead we only have observed outcomes of the interactions, and the goal is to infer the hierarchy from these observations. Discovering a hierarchy in the context of directed networks can be formulated as follows: given a graph, partition vertices into levels such that, ideally, there are only edges from upper levels to lower levels. The ideal case can only happen if the graph is acyclic. Consequently, in practice we have to introduce a penalty function that penalizes edges violating the hierarchy. A practical variant for such penalty is agony, where each violating edge is penalized based on the severity of the violation. Hierarchy minimizing agony can be discovered in O(m2 ) time, and much faster in practice. In this paper we introduce several extensions to agony. We extend the definition for weighted graphs and allow a cardinality constraint that limits the number of levels. While, these are conceptually trivial extensions, current algorithms cannot handle them, nor they can be easily extended. We provide an exact algorithm of O(m2 log n) time by showing the connection of agony to the capacitated circulation problem. We also show that this bound is in fact pessimistic and we can compute agony for large datasets. In addition, we show that we can compute agony in polynomial time for any convex penalty, and, to complete the picture, we show that minimizing hierarchy with any concave penalty is an NP-hard problem. I. I NTRODUCTION Interactions in many real-world phenomena can be explained by a strong hierarchical structure. As an example, it is more likely that a line manager in a large, conservative company will write emails to her employees than the other way around. Typically, this structure or ranking is not known; instead we only have observed outcomes of the interactions, and the goal is to infer the hierarchy from these observations. Discovering hierarchies or ranking has applications in various domains, such as, ranking players [3], discovering dominant animals [8], hierarchy discovery in social networks [11], and summarizing browsing behaviour [10]. We consider the following problem of discovering hierarchy in the context of directed networks: given a directed graph, partition vertices into ranked groups such that there are only edges from upper groups to lower groups. Unfortunately, such a partitioning is only possible when the input graph has no cycles. Consequently, a more useful problem definition is to define a penalty function p on the edges. This function should penalize edges that are violating a hierarchy. Given a penalty function, we are then asked to find the hierarchy that minimizes the total penalty. The feasibility of the optimization problem depends drastically on the choice of the penalty function. If we attach a constant penalty to any edge that violates the hierarchy, that is, the target vertex is ranked higher or equal than the source vertex, then this problem corresponds to a feedback arc set problem, a well-known NP-hard problem [1], even without a known constant-time approximation algorithm [4]. A more practical variant is to penalize the violating edges by the severity of their violation. That is, given an edge (u, v) we compare the ranks of the vertices r(u) and r(v) and assign a penalty of max(r(u) − r(v) + 1, 0). Here, the edges that respect the hierarchy receive a penalty of 0, edges that are in the same group receive a penalty of 1, and penalty increases linearly as the violation becomes more severe. This particular score is referred as agony. Minimizing agony was introduced by Gupte et al. [6] where the authors provide an exact O(nm2 ) algorithm, where n is the number of vertices and m is the number of edges. A faster discovery algorithm with the computational complexity of O(m2 ) was introduced by Tatti [17]. In practice, the bound O(m2 ) is very pessimistic and we can compute agony for large graphs in reasonable time. In this paper we focus on agony, and provide the following main extensions for discovering hierarchies in graphs. weighted graphs: We extend the notion of the agony to graphs with weighted edges. Despite being a conceptually trivial extension, current algorithms [6, 17] for computing agony are specifically design to work with unit weights, and cannot be used directly or extended trivially. Consequently, we need a new approach to minimize the agony, and in order to do so, we demonstrate that we can transform the problem into a capacitated circulation, a classic graph task known to have a polynomial-time algorithm. cardinality constraint: The original definition of agony does not restrict the number of groups in the resulting partition. Here, we introduce a cardinality constraint k and we are asking to find the optimal hierarchy with at most k groups. This constraint works both with weighted and non-weighted graphs. Current algorithms for solving agony cannot handle cardinality constraints. Luckily, we can enforce the constraint when we transform the problem into a capacitated circulation problem. convex penalties: Minimizing agony uses linear penalty for edges. We show that if we replace the linear penalty with a convex penalty, we can still solve the problem in polynomial time. However, this extension increases the computational complexity. concave penalties: To complete the picture, we also study concave edge penalties. We show that in this case discovering the optimal hierarchy is an NP-hard problem. This provides a stark difference between concave and convex penalties. The proofs are in the complete version of the paper.1 1 http://research.ics.aalto.fi/dmg/ II. P RELIMINARIES AND PROBLEM DEFINITION We begin with establishing preliminary notation and then defining the main problem. The main input to our problem is a weighted directed graph which we will denote by G = (V, E, w), where w is a function mapping an edge to real positive number. If w is not provided we assume that each edge has a weight of 1. We will often denote n = |V | and m = |E|. As mentioned in the introduction, our goal is to partition vertices V . We express this partition with a rank assignment r, a function mapping a vertex to an integer. To obtain the groups from the rank assignment we simply group the vertices having the same rank. Given a graph G = (V, E) and a rank assignment r, we will say that an edge (u, v) is forward if r(u) < r(v), otherwise edge is backward, even if r(u) = r(v). Ideally, rank assignment r should not have backward edges, that is, for any (u, v) ∈ E we should have r(u) < r(v). However, this is only possible when G is a DAG. For a more general case, we assume that we are given a penalty function p, mapping an integer to a real number. The penalty for a single edge (u, v) is then equal to p(d), where d = r(u) − r(v). If p(d) = 0, whenever d < 0, then the forward edges will receive 0 penalty. We highlight two penalty functions. The first one assigns a constant penalty to each backward edge, the second penalty function assigns a linear penalty to each backward edge, ( 1 if d ≥ 0 pc (d) = pl (d) = max(0, d + 1) . 0 otherwise , For example, an edge (u, v) for which r(u) = r(v) is penalized by pl (r(u) − r(v)) = 1, the penalty is equal to 2 if r(u) = r(v) + 1, and so on. Given a penalty function and a rank assignment we define the the score to be the sum of the weighted penalties. Definition 1. Assume a weighted directed graph G = (V, E, w) and a rank assignment r. Assume also a cost function p mapping an integer to a real number. We define a score for a rank assignment to be X q(G, r, p) = w(u, v)p(r(u) − r(v)) . (u,v)∈E We will refer the score q(G, r, pl ) as agony. Example 2. Consider the left ranking r1 of a graph G given in Figure 1. This ranking has 5 backward edges: the penalty is q(G, r1 , pc ) = 5. On the other hand, there are with 2 edges, (i, a) and (e, g), with the agony of 1, 2 edges has agony of 2, and (d, b) has agony of 3. Hence, agony is equal to q(G, r1 , pl ) = 2 × 1 + 2 × 2 + 1 × 3 = 10 . The agony for the right ranking r2 is q(G, r2 , pl ) = 7. Consequently, r2 yields a better ranking in terms of agony. i a f i g e b c b c h d d g f e h Figure 1. Toy graphs. Backward edges are represented by dotted lines, while the forward edges are represented by solid lines. Ranks are represented by dashed grey horizontal lines. Problem 1. Given a graph G = (V, E, w), a cost function p, and an integer k, find a rank assignment r minimizing q(r, G) such that 0 ≤ r(v) ≤ k − 1 for every v ∈ V . We will denote the optimal score by q(G, k, p). We should point out that we have an additional constraint by demanding that the rank assignment may have only k distinct values, that is, we want to find at most k groups. Note that if we assume that the penalty function is non-decreasing and does not penalize the forward edges, then setting k = |V | is equivalent of ignoring the constraint. This is the case since there are at most |V | groups and we can safely assume that these groups obtain consecutive ranks. However, an optimal solution may have less than k groups, for example, if G has no edges and we use pl (or pc ), then a rank assigning each vertex to 0 yields the optimal score of 0. We should also point out that unlike with pc , it is possible that the ranking minimizing agony must yield a partition which contains non-singleton groups. It is easy to see that minimizing q(G, pc ) is equivalent to finding a directed acyclic subgraph with as many edges as possible. This is known as F EEDBACK A RC S ET (FAS) problem, which is NP-complete [1]. On the other hand, if we assume that G has unit weights, and set k = |V |, then minimizing agony has a polynomial-time O(m2 ) algorithm [6, 17]. III. C OMPUTING AGONY In this section we present a technique for minimizing agony, that is, solving Problem 1 using pl as a penalty. In order to do this we show that this problem is in fact a dual problem of the known graph problem, closely related to the minimum cost max-flow problem. A. Agony is a dual problem of Circulation Minimizing agony is closely related to a circulation problem, where the goal is to find a circulation with a minimal cost satisfying certain balance equations. Problem 2 (C IRCULATION). Assume a directed graph G = (V, E, s, c) with weights s and capacities c on edges. Find a flow f such that 0 ≤ f (e) ≤ c(e) for every e ∈ E and X X f (v, u) = f (u, v), for every v ∈ V (v,u)∈E (u,v)∈E minimizing X s(u, v)f (u, v) . (u,v)∈E We can now state our main optimization problem. a We denote the above sum as circ(G). α ) c d H (G, 4) ω IV. A LTERNATIVE PENALTY FUNCTIONS This problem is known as capacitated circulation problem, and can be solved in O(m log n(m + n log n)) time with an algorithm presented by Orlin [13]. We should stress that we allow s to be negative, otherwise the optimal solution would be a zero flow. We also allow capacities for certain edges to be infinite, that is, f (e) ≤ c(e) is not enforced, if c(e) = ∞. In order to transform the problem of minimizing agony to the capacitated circulation problem, assume that we are given a graph G = (V, E, w) and an integer k. We define a graph H = (W, F, s, c) as follows. The vertex set W consists of 2 groups: (i) |V | vertices, each vertex corresponding to a vertex in G (ii) 2 additional vertices α and ω. For each edge e = (u, v) ∈ E, we add an edge f = (u, v) to F . We set c(f ) = w(e) and s(e) = −1. We add edges (v, ω) and (α, v) for every v ∈ V with a weight of 0 and a capacity of ∞, and finally we add (ω, α) with s(ω, α) = k − 1 and c(ω, α) = ∞. We will denote this graph by H (G, k) = H. Example 3. Consider G = (V, E), a graph with 4 vertices and 4 edges, given in Figure 2. Set cardinality constraint k = 4. In order to construct H (G, k) we add two additional vertices α and ω to enforce the cardinality constraint k. We set edge costs to −1 and edges capacities to be the weights of the input graph. We connect α and ω with a, b, c, and d, and finally we connect ω to α. The resulting graph is given in Figure 2. The following proposition shows the connection between the agony and the capacitated circulation problem. Proposition 4. Assume a weighted directed graph G = (V, E, w) and an integer k. Then q(G, k, pl ) = −circ(H (G, k)) . We have shown that we can find ranking minimizing edge penalties pl in polynomial time. In this section we consider alternative penalties. More specifically, we consider convex penalties which are solvable in polynomial time, and concave penalties which are NP-complete. A. Convex penalty function We say that the penalty function is convex if p(x) ≤ (p(x − 1) + p(x + 1))/2 for every x ∈ Z. Let us consider a penalty function that can be written as ps (x) = ` X max(0, αi (x − βi )), i=1 where αi > 0 and βi ∈ Z for 1 ≤ i ≤ `. This penalty function is convex. On the other hand, if we are given a convex penalty function p such that p(x) = 0 for x < 0, then we can safely assume that an optimal rank assignment will have values between 0 and |V | − 1. We can define a penalty function ps with ` ≤ |V | terms such that ps (x) = p(x) for x < |V |. Consequently, finding an optimal rank assignment using ps will also yield an optimal rank assignment with respect to p. Note that pl is a special case of ps . This hints that we can solve q(G, k, ps ) with a technique similar to the one given in Section III. In order to do this, assume a graph G = (V, E, w) and an integer k. Set n = |V | and m = |E|. We define a graph H = (W, F, c, b) as follows. The vertex set W consists of 2 groups: (i) n vertices, each vertex corresponding to a vertex in G (ii) 2 additional vertices α and ω. For each edge e = (v, w) ∈ E, we add ` edges fi = (u, v) to F . We set s(fi ) = βi and c(fi ) = αi w(e). We add edges (v, ω) and (ω, v) for every v ∈ V with 0 weight and infinite capacity. Finally we add (ω, α) with c(t, s) = k − 1. We denote this graph by H (G, k, ps ) = H. B. Algorithm for minimizing agony a 1 2 −1(1) 6 a b − 1 2) 1( G c H (G, 3, ps ) b3 3(2) 4) 3( Proposition 4 states that we can compute agony but it does not provide direct means to discover an optimal rank assignment. However, a closer look at the proof reveals that minimizing agony is a dual problem of C IRCULATION. Luckily, the algorithms for solving C IRCULATION by Edmonds and Karp [2] or by Orlin [13] solve the dual problem, and are guaranteed to have integer-valued solution as long as the capacities s(u, v) are integers, which is the case for us. If we are not enforcing the cardinality constraint, that is, we are solving q(G, k) with k = |V |, we can obtain a 1) Figure 2. Toy graph G and the related circulation graph H (G, 4). Edge costs and capacities for (α, v) and (v, ω) are omitted to avoid clutter. 2) d 1( c 3( G Proposition 5. Assume a graph G, and set k = |V |. Let {Ci } be the strongly connected components of G, ordered in a topological order. Let ri be the ranking minimizing q(G(Ci ), |Ci |). Pi−1 Let bi = j=1 |Cj |. Then the ranking r(v) = ri (v) + bi , where Ci is the component containing v, yields the optimal score q(G, k). − 1) 3(∞) 2) (2 1 b ( −1 1 −1 2 −1(1) a b 1 −1 ( a significant speed-up by decomposing G to strongly connected components, and solve ranking for individual components. c 3 Figure 3. Toy graph G and the related circulation graph H (G, 3, ps ). To avoid clutter the vertices α and ω and the adjacent edges are omitted. Example 6. Consider a graph G given in Figure 3 and a penalty function ps (d) = max(0, d+1)+2 max(0, d−3). The graph H = H (G, 3, ps ) has 5 vertices, the original vertices and the two additional vertices. Each edge in G results in two edges in H. This gives us 6 edges plus the 7 edges adjacent to α or ω. The graph H without α and ω is given in Figure 3. Proposition 7. Assume a weighted directed graph G = (V, E, w), an integer k, and a penalty ps cost. Then q(G, k, ps ) = −circ(H (G, k, ps )) . Note that this proposition is a generalization of Proposition 4. If we set ` = 1, α1 = 1, and β1 = −1, then H (G, k, ps ) = H (G, k), and Proposition 7 is equivalent to Proposition 4. The proof of Proposition 7 is essentially equivalent to the proof of Proposition 4, and we omit the proof for brevity. More importantly, we can discover the optimal ranking by solving the capacitated circulation problem, similarly as we did when minimizing agony. Finally, let us address the computational complexity of the problem. The circulation graph H (G, k, ps ) will have n + 2 vertices and `m + n. If the penalty function p is convex, then we need at most ` = n functions to represent p between the range of [0, n − 1]. Moreover, if we enforce the cardinality constraint k, we need only ` = k components. Consequently, we will have at most dm + n, edges where d = min(k, `, n) for ps , and d = min(k, n) for a convex penalty p. This gives us computational time of O(dm log n(dm + n log n)). B. Concave penalty function We have shown that we can solve Problem 1 for any convex penalty. Let us consider concave penalties, that is penalties for which p(x) ≥ (p(x − 1) + p(x + 1))/2. There is a stark difference compared to the convex penalties as the minimization problem becomes computationally intractable. Proposition 8. Assume a monotonic penalty function p : Z → R such that p(x) = 0 for x < 0, p(2) > p(1), and there is an integer t such that p(t) > and p(t − 1) + p(t + 1) 2 (1) p(s) p(y) ≥ , s+1 y+1 for every 0 ≤ s ≤ y and y ∈ [t−1, t, t+1]. Then, determining whether q(G, k, p) ≤ σ for a given graph G, integer k, and threshold σ is an NP-complete problem. While the conditions in Proposition 8 seem overly complicated, they are quite easy to satisfy. Assume that we are given a penalty function that is concave in [−1, ∞], and p(−1) = 0. Then due to concavity we have p(x + 1) p(x) ≥ , for x+1 x+2 This leads to the following corollary. x≥0 . Corollary 9. Assume a monotonic penalty function p : Z → R such that p(x) = 0 for x < 0, p(2) > p(1), and p is concave and non-linear in [−1, s] for some s ≥ 1. Then, determining whether q(G, k, p) ≤ σ for a given graph G, integer k, and threshold σ is NP-complete problem. Note that, due to proper inequality in Equation 1, p must be non-linear. This condition is needed since pl satisfies every other requirement.√Corollary 9 covers many penalty functions such as p(x) = x + 1 or p(x) = log(x + 2), for x ≥ 0. Note that function needs to be convex only in [−1, s] for some s ≥ 1. At extreme, s = 1 in which case t = 0 satisfies the conditions in Proposition 8. V. R ELATED WORK The problem of ranking an object based on its dominating relationships to other objects is a classic problem. Perhaps the most known ranking method is Elo rating devised by Elo [3], used to rank chess players. In similar fashion, Jameson et al. [8] introduced a statistical model, where the likelihood of the the vertex dominating other is based on the difference of their ranks, to animal dominance data. Maiya and Berger-Wolf [11] suggested an approach for discovering hierarchies, directed trees from weighted graphs such that parents tend to dominate the children. To score such a hierarchy they propose a statistical model where the probability of an edge is high between a parent and a child. To find a good hierarchy the authors employ a greedy heuristic. The technical relationship between our approach and the previous studies on agony by Gupte et al. [6] and Tatti [17] is a very natural one. The authors of both papers demonstrate that minimizing agony in a unweighted graph is a dual problem to finding a maximal eulerian subgraph, a subgraph in which, for each vertex, the same number of outdoing edges and the number of incoming edges is the same. Discovering the maximum eulerian subgraph is a special case of the capacitated circulation problem, where the capacities are set to 1. However, the algorithms in [6, 17] are specifically designed to work with unweighted edges. Consequently, if our input graph has weighed edges or we wish to enforce the cardinality constraint, we need to use the capacitated circulation solver. The stark difference of computational complexities for different edge penalties is intriguing: while we can compute agony and any other convex score in polynomial-time, minimizing the concave penalties is NP-hard. Minimizing the score q(G, k, pc ) is equivalent to FEEDBACK ARC SET (FAS), which is known to be APX-hard with a coefficient of c = 1.3606 [1]. Moreover, there is no known constantratio approximation algorithm for FAS, and the best known approximation algorithm has ratio O(log n log log n) [4]. In this paper we have shown that minimizing concave penalty is NP-hard. An interesting theoretical question is whether this optimization problem is also APX-hard, and is it possible to develop an approximation algorithm. Role mining, where vertices are assigned different roles based on their adjacent edges, and other features, has received some attention. Henderson et al. [7] studied assigning roles to vertices based on its features while McCallum et al. [12] assigned topic distributions to individual vertices. A potential direction for a future work is to study whether we can use the obtained ranking as a feature in role discovery. Table I BASIC CHARACTERISTICS OF THE DATASETS AND THE EXPERIMENTS . T HE 6 TH COLUMN IS THE NUMBER OF GROUPS IN THE OPTIMAL RANKING . T HE COLUMNS NAMED iter. REPRESENT THE NUMBER OF ITERATIONS NEEDED BY THE CAPACITATED CIRCULATION SOLVER . VI. E XPERIMENTS 2 The source code is available at http://research.ics.aalto.fi/dmg/ max SCC 403k 63k 265k 76k 82k 876k 7k 32 258 37k 425k 303k |E| |V with SCC |E 0 | |r| iter. w/o SCC time iter. time base 3m 395k 3m 17 9k 6h24m 3k 7h29m 4h27m 148k 14k 51k 24 83 8s 136 45s 45s 419k 34k 151k 9 1k 10m 8k 3h8m 2m 509k 32k 444k 10 4k 49m 4k 1h5m 20m 870k 71k 841k 9 4k 1h38m 4k 2h21m 1h5m 5m 435k 3m 31 52k 8h50m 6k 26h43m 2h32m 104k 1k 39k 12 738 43s 1k 5m 7s 205 32 205 6 157 22ms 157 22ms – 4232 1 0 19 0 10ms 454 1.2s – 31k 263 569 11 3k 0.2s 7k 10m – 734k 13k 64k 22 5k 10m 45k 31h34m – 445k 5k 20k 21 12k 2m 45k 19h38m – ×104 4 2 0 5 10 15 20 constraint k number of iterations Amazon Gnutella EmailEU Epinions Slashdot Google WikiVote Nfl Reef HiggsRep. HiggsRet. HiggsM. 0| time (in 103 sec) |V | Name q(G, k) In this section we present our experimental evaluation. Datasets and setup For our experiments we took 10 large networks from SNAP repository [9]. In addition, for illustrative purposes, we used two small datasets: Nfl, consisting of National Football League teams. We created an edge (x, y) if team x has scored more points against team y during 2014 regular season, we assign the weight to be the difference between the points. Since not every team plays against every team, the graph is not a tournament. Reef, a food web of guilds of species [15], available at [16]. The dataset consisted of 3 food webs of coral reef systems: The Cayman Islands, Jaimaica, and Cuba. An edge (x, y) appears if a guild x is known to prey on a guild y. Since the guilds are common among all 3 graphs, we combined the food webs into one graph, and weighted the edges accordingly, that is, each edge received a weight between 1 and 3. The sizes of the graphs, along with the sizes of the largest strongly connected component, are given in the first 4 columns of Table I. The 3 Higgs and Nfl graphs had weighted edges, and for the remaining graphs we assigned a weight of 1 for each edge. We removed any self-loops as they have no effect on the ranking, as well as any singleton vertices. For each dataset we computed the agony. To solve the C IRCULATION problem we used algorithm by Edmonds and Karp [2]. While the algorithm by Orlin [13] has better theoretical bounds, the pathological case that is solved by Orlin [13] does not occur in practice for our datasets. For the unweighted graphs we also computed the agony using R ELIEF, an algorithm suggested by Tatti [17]. Note that this algorithm, nor the algorithm by Gupte et al. [6], does not work for weighted graphs nor when the cardinality constraint k given in Problem 1 is enforced. We implement both algorithms in C++ and performed experiments using a Linux-desktop equipped with a Opteron 2220 SE processor.2 Results We begin with running times given in Table I. We report the running times of our approach with and without the strongly connected component decomposition as suggested by Proposition 5, and compare it against the baseline R ELIEF, whenever possible. Note that we can use the decomposition only if we do not enforce the cardinality constraint. Our first observation is that the decomposition always helps to speed up the algorithm. In fact, this speed-up may be dramatic, if the size of the strongly connected component is significantly smaller than the size of the input graph, for example, with HiggsRetweet. The number of iterations of iterations needed to solve the circulation problem is significantly smaller than the worst case scenario of O(m log n), making this algorithm feasible even for large graphs. Interestingly enough, this number increases in some cases, when using 6 4 2 0 5 10 15 20 constraint k ×104 2 1.5 1 0.5 0 5 10 15 20 constraint k Figure 4. Agony, execution time, and number of iterations needed by capacitated circulation solver as a function of the constraint k for Gnutella. the decomposition, but a single iteration may be significantly faster when the decomposition is used. The baseline R ELIEF is almost always faster than the approach, even though the computational complexity guarantees, O(m2 ) for R ELIEF and O(m2 log n + mn log2 n) for CANON, are practically the same. Nevertheless, the difference between the computation times is less than the order of magnitude. Our next step is to study the effect of the constraint k, the maximum number of different rank values. We see in the 6th column in Table I that despite having large number of vertices, that the optimal rank assignment has low number of groups, typically around 10–20 groups, even if the cardinality constraint is not enforced. This suggests the following strategy to compute the agony when the cardinality constraint is enforced and the input graph has small connected components: First compute the agony quickly without the constraint using the strongly connected component decomposition. If the obtained ranking has less than k components, then we are done. Otherwise, enforce the constraint k and solve the problem without the decomposition. Next, we consider agony as a function of k, which we have plotted in Figure 4 for Gnutella. We see that agony remains relatively constant as we decrease k, and starts to increase more prominently once we consider assignments with k ≤ 5. Enforcing the constraint k has an impact on running time. In Figure 4 we plotted the running time as a function of k. As we can see lower values of k are computationally more taxing to solve even though they have the same theoretical bounds on running time. The main reason for more additional computational burden is that solving agony with small values of k requires more iterations. Table II R ANK ASSIGNMENT DISCOVERED FOR Nfl DATASET WITH k = 3 GROUPS Rank Teams 1. 2. 3. DEN BAL NE DAL SEA PHI KC GB PIT STL NYG MIA CAR NO SD MIN CIN BUF DET IND HOU SF ARI WSH OAK TB JAX TEN CLE ATL NYJ CHI Let us look on the ranking that we obtained from Nfl dataset using k = 3 groups, given in Table II. The obtained ranking is very sensible: 7 of 8 teams in the top group consists of playoff teams of 2014 season, while the bottom group consists of teams that have a significant losing record. Finally, let us look on rankings obtained Reef dataset. The graph is in fact a DAG with 19 groups. To reduce the number of groups we rank the guilds into k = 4 groups. The condensed results are given in Table III. We see that the top group consists of large fishes and sharks, the second group contains mostly smaller fishes, a large portion of the third group are crustacea, while the last group contains the bottom of the food chain, planktons and algae. We should point out that this ranking is done purely on food web, and not on type of species. For example, cleaner crustacea is obviously very different than plankton. Yet cleaner crustacea only eats planktonic bacteria and micro-detritivores while being eaten by many other guilds. Consequently, it is ranked in the bottom group. VII. C ONCLUDING REMARKS In this paper we studied the problem of discovering a hierarchy in a directed graph that minimizes agony. We introduced several natural extensions: (i) we demonstrated how to compute the agony for weighted edges, and (ii) how to limit the number of groups in a hierarchy. Both extensions cannot be handled with current algorithms, hence we provide a new technique by demonstrating that minimizing agony can be solved by solving a capacitated circulation problem, a wellknown graph problem with a polynomial solution. We can further generalize the setup by allowing each edge to have its own individual penalty function. As long as the penalty functions are convex, the construction done in Section IV-A can be used to solve the optimization problem. We can further generalize the cardinality constraint by requiring that only a subset of vertices must have ranks within some range. We can have multiple such constraints. R EFERENCES [1] I. Dinur and S. Safra. On the hardness of approximating vertex cover. Ann. Math., 162(1):439–485, 2005. [2] J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of ACM, 19(2):248–264, 1972. [3] A. E. Elo. The rating of chessplayers, past and present. Arco Pub., 1978. [4] G. Even, J. (Seffi) Naor, B. Schieber, and M. Sudan. Approximating minimum feedback sets and multicuts in directed graphs. Algorithmica, 20(2):151–174, 1998. Table III R ANKED GUILDS OF Reef DATASET, WITH k = 4. F OR SIMPLICITY, WE REMOVED THE DUPLICATE GUILDS IN THE SAME GROUP, AND GROUPED SIMILAR GUILDS ( MARKED AS ITALIC , THE NUMBER IN PARENTHESES INDICATING THE NUMBER OF GUILDS ). Sharks (6), Amberjack, Barracuda, Bigeye, Coney grouper, Flounder, Frogfish, Grouper, Grunt, Hind, Lizardfish, Mackerel, Margate, Palometa, Red hind, Red snapper, Remora, Scorpionfish, Sheepshead, Snapper, Spotted eagle ray Angelfish, Atlantic spadefish, Ballyhoo, Barracuda, Bass Batfish, Blenny Butterflyfish, Caribbean Reef Octopus, Caribbean Reef Squid, Carnivorous fish II-V, Cornetfish, Cowfish, Damselfish, Filefish, Flamefish, Flounder, Goatfish, Grunt, Halfbeak, Hamlet, Hawkfish, Hawksbill turtle, Herring, Hogfish, Jack, Jacknife fish, Jawfish, Loggerhead sea turtle, Margate, Moray, Needlefish Porcupinefish I-II, Porkfish, Pufferfish, Scorpionfish, Seabream, Sergeant major, Sharptail eel, Slender Inshore Squid, Slippery dick, Snapper, Soldierfish, Spotted drum, Squirrelfish, Stomatopods II, Triggerfish, Trumpetfish, Trunkfish, Wrasse, Yellowfin mojarra Crustacea (31), Ahermatypic benthic corals, Ahermatypic gorgonians, Anchovy, Angelfish, Benthic carnivores II, Blenny, Carnivorous fish I, Common Octopus, Corallivorous gastropods IV, Deep infaunal soft substrate suspension feeders, Diadema, Echinometra, Goby, Green sea turtle, Herbivorous fish IIV, Herbivorous gastropods I, Hermatypic benthic carnivores I, Hermatypic corals, Hermatypic gorgonians, Herring, Infaunal hard substrate suspension feeders, Lytechinus, Macroplanktonic carnivores II-IV, Macroplanktonic herbivores I, Molluscivores I, Omnivorous gastropod, Parrotfish, Pilotfish, Silverside, Stomatopods I, Tripneustes, Zooplanktivorous fish I-II, Planktons (7), Algae (6), Sponges (2), Feeders (11), Benthic carnivores I, Carnivorous ophiuroids, Cleaner crustacea I, Corallivorous polychaetes, Detritivorous gastropods I, Echinoid carnivores I, Endolithic polychaetes, Epiphyte grazer I, Epiphytic autotrophs, Eucidaris, Gorgonian carnivores I, Herbivorous gastropod carnivores I, Herbivorous gastropods II-IV, Holothurian detritivores, Macroplanktonic carnivores I, Micro-detritivores, Molluscivores II-III, Planktonic bacteria, Polychaete predators (gastropods), Seagrasses, Spongeanemone carnivores I, Spongivorous nudibranchs [5] M. Garey and D. Johnson. Computers and intractability: a guide to the theory of NP-completeness. WH Freeman & Co., 1979. [6] M. Gupte, P. Shankar, J. Li, S. Muthukrishnan, and L. Iftode. Finding hierarchy in directed online social networks. In WWW, pages 557–566, 2011. [7] K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li. Rolx: Structural role extraction & mining in large graphs. In ACM SIGKDD, pages 1231– 1239, 2012. [8] K. A. Jameson, M. C. Appleby, and L. C. Freeman. Finding an appropriate order for a hierarchy based on probabilistic dominance. Anim. Behav., 57:991–998, 1999. [9] J. Leskovec and A. Krevl. SNAP Datasets. http://snap.stanford.edu/data, Jan. 2015. [10] L. Macchia, F. Bonchi, F. Gullo, and L. Chiarandini. Mining summaries of propagations. In ICDM, pages 498–507, 2013. [11] A. S. Maiya and T. Y. Berger-Wolf. Inferring the maximum likelihood hierarchy in social networks. In ICSE, pages 245–250, 2009. [12] A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Int. Res., 30(1):249–272, 2007. [13] J. B. Orlin. A faster strongly polynomial minimum cost flow algorithm. Operations Research, 41(2), 1993. [14] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., 1982. [15] P. D. Roopnarine and R. Hertog. Detailed food web networks of three Greater Antillean Coral Reef systems: The Cayman Islands, Cuba, and Jamaica. Dataset Papers in Ecology, 2013, 2013. [16] H. R. Roopnarine PD. Data from: Detailed food web networks of three Greater Antillean Coral Reef systems: The Cayman Islands, Cuba, and Jamaica, 2012. Dryad Digital Repository, http://dx.doi.org/10.5061/ dryad.c213h. [17] N. Tatti. Faster way to agony—discovering hierarchies in directed graphs. In ECML PKDD, pages 163–178, 2014.
© Copyright 2025 Paperzz