On Defining and Computing Communities

Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
On Defining and Computing Communities
Martin Olsen
AU Herning
Aarhus University
[email protected]
UPF, Barcelona, November 29th 2012
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Zacharys Karate Club
25
12
26
5
17
7
11
28
6
32
24
20
29
27
1
22
9
30
2
3
13
34
4
15
33
14
16
31
18
8
21
10
23
19
A community structure consisting of two communities
How do we compute communities/community structures?
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Dolphins in Doubtful Sound
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Market Basket Network from Seminar November 15th
Objective: Find big cliques!
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Are Cliques Communities?
Are the green triangles communities?
Could you imagine the three green nodes forming a group
at a reception where the connections indicate friendship
(and people are shy)?
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Background Quotes
The above snippet is from a presentation by Santo
Fortunato
"... there is still no agreement among scholars on what a
network with communities looks like ..." (Lancichinetti and
Fortunato, 2009)
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Outline
1
Introduction
2
Implicit Definitions
3
Explicit Definitions
Absolute Type
Relative Type
4
www Application (Relative Type)
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Notation
G(V, E) is an unweighted undirected graph
Ni (T) is the neighbours of i ∈ V in T ⊆ V :
Ni (T) = {j ∈ T : {i, j} ∈ E}
d(i) is the degree of i ∈ V
A partition Π of V is a collection of non-empty disjoint
subsets of V with union V
Πi denotes the set in Π containing i.
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Iteratively Removing Edges (Newman and Girvan, 2004)
Intersection = node
Road = edge
Island = community
A bridge hosts many
shortest paths
Find a bridge and remove
it – and repeat. Evaluate
when a new island
appears using modularity
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Building Link Partitions "Bottom-up" (Ahn et al., 2010)
There are methods partitioning the edges and not the
vertices ⇒ overlapping communities
In this talk we partition the vertices!
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
A Formal Definition of Communities (Flake et al., 2000)
Community structure Π: ∀i ∈ V : |Ni (Πi )| ≥ |Ni (V \ Πi )|
Computing Π is NP-hard in general but there are "easy"
cases (for example cubic graphs and graphs with girth ≥ 5
and minimum degree at least 3) (Bazgan et al.)
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Alliances
Alliance A: ∀i ∈ A : |Ni (A)| + 1 ≥ |Ni (V \ A)|
Any planar graph with minimum degree at least 4 can be
efficiently partitioned into two alliances
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
A Potential Problem
The clique in the upper left corner is not a community
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Dolphins in Doubtful Sound ... revisited
A weaker community structure definition:
∀i ∈ V, ∀C ∈ Π : |Ni (Πi )| ≥ |Ni (C)|
Π = Nash stable partition in additive hedonic game
NP-hard in general but there are "easy" cases
... but can we ignore the cardinality of the sets?
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
The Planted l-Partition Model (Condon and Karp, 2001)
p
q
q
p
u and v connect with probability p if Πu = Πv and they
connect with probability q if Πu 6= Πv
p>q
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Empirical Justification
"... that strong communities form naturally in BT, with users
inside a typical community being 5 to 25 times more likely
to connect to each other than with outside users." (Choffnes
et al., 2010)
Guimera et al. (2006) study an e-mail network of approximately
1700 users at the University Rovira i Virgili, Tarragona, Spain,
and make similar observations wrt. "centers" of the university
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Formal Definitions
Definition
A Community Structure is a partition Π of V such that |Π| ≥ 2
and |C| ≥ 2 for all C ∈ Π and
∀i ∈ V, ∀C ∈ Π :
|Ni (Πi )|
|Ni (C)|
≥
|Πi | − 1
|C|
Definition
A community is a member of a community structure.
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Are Cliques Communities? ... revisited
Are the green triangles communities?
Could you imagine the three green nodes forming a group
at a reception where the connections indicate friendship
(and people are shy)?
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
A positive result
Theorem
A community structure can be computed in polynomial time for
any connected undirected graph G(V, E) containing at least four
nodes except Sn .
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Sketch of Proof I
Local search on all partitions Π with all members having a
center
Maximize objective function:
f (Π) = |Π| + #crossing edges in G ≤ n +
Martin Olsen
n(n−1)
2
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Sketch of Proof II
Unthinkable situations for the result Π(∗) of our local search
Objective function:
f (Π) = |Π| + #crossing edges in G ≤ n +
Martin Olsen
n(n−1)
2
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
A Graph with Partial Membership Information
How do we compute a community containing the two
members in the upper right corner?
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
Computing Communities With Partial Information
Definition
The COMMUNITY problem:
Instance: An undirected graph G(V, E) and a subset of
nodes S ⊆ V.
Question: Does a community C ⊂ V exist such that S ⊆ C?
Theorem
COMMUNITY is NP-complete.
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Absolute Type
Relative Type
COMMUNITY ≥p 3SAT
w1
w2
fully connected
4 nodes
y1
x1
z1
!x 1
z2
x2
!x 2
y2
c1
c2
The COMMUNITY instance for (x1 ∨ x2 ) ∧ (!x1 ∨!x2 )
Simple 2SAT-example – paper shows how to deal with
3SAT
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Computing Members of a www-community
Thickness of arrow from u =
Nu (C)
d(u)
Looking for community C:
∀u ∈ C, ∀v ∈ V \ C :
Martin Olsen
Nu (C)
Nv (C)
≥
d(u)
d(v)
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
CS Web Site Ranking in Denmark (April 2005 Crawl)
1
2
3
4
5
6
7
11
13
17
556 members
www.daimi.au.dk 267
www.diku.dk 655
www.itu.dk 918
www.cs.auc.dk 1022
www.brics.dk 1132
www.imm.dtu.dk 1124
www.dina.kvl.dk 1153
www.it-c.dk 2313
www.cs.aau.dk 2010
www.imada.sdu.dk 2998
1460 members
www.au.dk 109
www.sdu.dk 108
www.daimi.au.dk 267
www.hum.au.dk 221
www.diku.dk 655
www.ifa.au.dk 681
www.itu.dk 918
www.cs.auc.dk 1022
www.imm.dtu.dk 1124
www.bsd-dk.dk 1895
Localized PageRank used on the found communities.
Representatives are marked with bold
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Link Building
Google uses PageRank values to rank search results
What new incoming links will give a maximum PageRank
increase?
Maybe focusing on your community is wise?
Martin Olsen
On Defining and Computing Communities
Introduction
Implicit Definitions
Explicit Definitions
www Application (Relative Type)
Thank You!
We have looked at several implicit and explicit ways of
defining communities
There are many hard problems in this area but we can
efficiently compute a community structure for (almost) any
graph for the most natural definition
Suggestions for future work:
Heuristics/algorithms with quantitative guarantees of the
number of "satisfied" vertices
Deciding the computational complexity of computing
alliances in general
...
Martin Olsen
On Defining and Computing Communities