Community Detection based on Distance Dynamics Reporter

Community Detection based on
Distance Dynamics
Reporter: Yi Liu Student ID: 015033910017
Department of Computer Science and Engineering
Shanghai Jiao Tong University
Outline
We consider the community detection problem from a
new point of view: distance dynamics.
• Background and challenges
• Problem Statement
• Existing Solutions and Their Drawbacks
• Basic idea and Attractor algorithm
• Experimental evaluation
• Conclusion
Yi Liu
SJTU
2015/12/03
Background
Social network is a hot research topic in the area of Internet of
Things.
Community Detection is a complex and meaningful process in social
network. Resent years detecting community structure of networks
has won widely attention.
 Network of Sensors
 Food Chain
 World-Wide-Web
 The Urban Traffic Network
Yi Liu
SJTU
2015/12/03
Background
Papadopoulos S etc.Data Mining and Knowledge Discovery, 2012, 24(3): 515-554.
How can we find intrinsic community structure in networks?
Yi Liu
SJTU
2015/12/03
What is community?
From the view of sociology, a “community” can be perceived as a group of
persons who are connected to each other
by relatively durable social relations to form a tight and cohesive social
entity, due to the presence of a “unity of will” or “sharing common values” .
Problem in a Group
Opinion Formation
Discussion
Local
Opinion 1
Opinion 2
Global
Three
Communities
Opinion 3
Yi Liu
SJTU
2015/12/03
What is community?
Useful Information:
Content
Position
Graduate institutions
Friends
Yi Liu
SJTU
2015/12/03
Quiz 1
What is community From the view of sociology?
Yi Liu
SJTU
2015/12/03
Graph Model
A graph G=(V, E) consists of a set V of vertices, and a set E
of edges. Each edge is a pair (v, u), where v, u  V
Directed Graph
Undirected Graph
Bob
Bob
Jill
Ann
Ann
Ted
Ted
V={Ann, Jill, Bob, Ted}
E={(Ann, Bob), ( Bob, Jill), (Bod, Ted),
(Ted, Jill)}
Yi Liu
Jill
V={Ann, Jill, Bob, Ted}
E={(Ann, Bob), (Bob, Jill), (Ted, Bob), (Ted,
Jill), (Ted, Ann), (Ann, Jill)}
SJTU
2015/12/03
Weighted Graph
 Sometimes edges have a third component, weight or cost,
the semantics of which is specific to the graph.
 A graph that has values associated with its edges is called
a weighted graph. The graph can be either directed or
undirected. The weights can represent things like:
1. Physical distance between two vertices.
2. Time it takes to get from one vertex to another.
3. How much it costs to travel from vertex to vertex.
Yi Liu
SJTU
2015/12/03
Weighted Graph
An Example of Weighted Undirected Graph
Bob
20
2
60
Jill
Ann
1.2
Ted
Yi Liu




Bob and Ann meet each other 2 times a year
Bob and Ann meet each other 20 times a year
Bob and Ted meet each other 60 times a year
Ted and Jill meet each other only 1.2 times a year
SJTU
2015/12/03
Graph Model
The degree of vertex v is the number of edges link to v,
noted as TD(v)
Bob
20
2
60
Jill
Ann
1.2
Ted
Subgraph Let G = (V, E) be a graph with vertex set V and edge set E.
A subgraph of G is a graph G' = (V', E') where
1. V' is a subset of V.
2. E' consists of edges (v, w) in E such that both v and w are in V'.
Yi Liu
SJTU
2015/12/03
Graph Model
Definition 1 (Neighbors of node u)
Given an undirected graph G = (V,E,W), the neighborhood of a node u ∈ V is
the set (u ) containing node u and its adjacent nodes.
Bob
20
2
60
Ann
Jill
1.2
Ted
Definition 2 (Jaccard Distance)
Given an undirected graph G = (V,E,W), the Jaccard distance of two nodes u
and v is defined as:
| (u )  (v) |
d (u , v)  1 
| (u )  (v) |
Yi Liu
SJTU
2015/12/03
Community In Graph
A community is a subgraph containing nodes which are more densely linked
to each other than to the rest of graph or equivalently.
A graph has a community structure if the number of links into any subgraph is
higher than the number of links between those subgraphs.
Quiz 2
How many subgraphs are there?
Yi Liu
SJTU
2015/12/03
Challenges
Challenges:
 Large-scale network
• time constraints
• memory limitation
 High-quality communities
• user-defined criteria
How can we intuitively
detect natural communities
with high quality in large
networks?
 Parametrization
• The outliners
• sensitive to parameter(s)
Yi Liu
SJTU
2015/12/03
Existing Solutions and Their Drawbacks
Cut-Criteria Based Community Detection
Ncut
is a well-known algorithm for graph clustering by optimizing the
normalized cut criterion. As the eigen-value decomposition is applied to
speed up finding the optimal cut, it is also usually called as spectral clustering.
consider the connection between groups relative to the density of each group:
Yi Liu
SJTU
2015/12/03
Existing Solutions and Their Drawbacks
Drawbacks of Ncut
 Although this type of community detection usually allows
identifying the communities with high quality, it is not capable of
handling large-scale networks.
 In addition, it is a non-trivial task to determine the suitable
number of communities without prior knowledge.
Yi Liu
SJTU
2015/12/03
Existing Solutions and Their Drawbacks
Modularity
is the current most popular community detection
algorithm based on the modulairty measure, which uses the expected
cut to measure clustering quality.
Drawbacks
 Modularity-based community detection algorithms tend to fail on
many real-world networks due to the “resolution limit”.
 The situation becomes worse especially when the network size
increases.
Yi Liu
SJTU
2015/12/03
Basic idea
• Dynamic point of view
• Consider a given network as a dynamic system, and each node
interacts with its local neighbors.
• Distance Dynamics vs Node Dynamics
• Investigate dynamics of edges instead of dynamics of nodes.
Yi Liu
SJTU
2015/12/03
Interaction model – three interaction patterns
Assumption: If two nodes are linked, each node attracts the other and
makes the opposite node move to itself.
Direct interaction: makes u and v closer.
Common neighbors: make u and v closer.
Exclusive neighbors: make u and v closer or further.
Yi Liu
SJTU
2015/12/03
Interaction model – three interaction patterns
Pattern 1 Influence from direct linked nodes.
Formally, to characterize the change of the distance d(u, v), we define DI,
indicating the influence from the interactions of direct linked nodes, as
follows:
where deg(u) is the degree of the node u, f(·) is a coupling function and sin(·) is used in
this study. 1−d(u, v) indicates the similarity between u and v
Yi Liu
SJTU
2015/12/03
Interaction model – three interaction patterns
Pattern 2 Influence from common neighbors
We define the change of d(u, v) from the influence of common neighbors,
CI, as follows:
Here the two terms (1 − d(x, v)) and (1 − d(x, u)) for each common neighbor are used to
further quantify the degree of influence compared to the influence from direct linked nodes.
Yi Liu
SJTU
2015/12/03
Interaction model – three interaction patterns
Pattern 3 Influence from exclusive neighbors
Yi Liu
SJTU
2015/12/03
Interaction model – interaction pattern
(1)
Yi Liu
SJTU
2015/12/03
Attractor Algorithm
Initialization
Compute Jaccard distance
for each edge
(a) T=0
Dynamics
Investigate each edge
based on equation (1)
(b) T=1
Yi Liu
Communities Detection
Cut off the edges with
distance of 1
(c) T=9
SJTU
2015/12/03
Parameterization: cohesion parameter ƛ
Cohesion parameter ƛ
• determines the coarseness of communities. Large ƛ produces deliberate
communities and small ƛ yields large communities.
• experiments demonstrate Attractor can achieve high-quality result when
cohesion parameter ƛ is within [0.4 0.6].
Yi Liu
SJTU
2015/12/03
10
Evaluation – comparison on synthetic
network
Comparison on Synthetic Data
noise
Yi Liu
density
SJTU
2015/12/03
Evaluation – comparison on real network
Table. Statistics of real-world data sets, where AD: average degree;
CC: clustering coefficient.
Yi Liu
SJTU
2015/12/03
Evaluation – comparison on real network
Labeled networks
Unlabeled networks
Yi Liu
SJTU
2015/12/03
Evaluation – comparison on real network
Figure. Attractor on karate club network.
Colors of nodes indicate different detected communities.
Yi Liu
SJTU
2015/12/03
Evaluation – comparison on real network
Figure. Attractor on American football network.
Yi Liu
SJTU
2015/12/03
Evaluation - small communities and
anonalies
We can find that Attractor finds many small communities and the local
noise level shows that Attractor could detect anomalies effectively.
Yi Liu
SJTU
2015/12/03
Evaluation - time complexity
Attractor has low time complexity — O(|E|) and it can handle large
networks in real word.
Yi Liu
SJTU
2015/12/03
Further Application
Fail alarm
Autistic patients
Precision marketing
Yi Liu
SJTU
2015/12/03
Conclusion
Based on distance dynamics, Attarctor has several benefits:
• Intuitive community detection: In stead of optimising userdefined measures, Attractor investigates community structure from
a new point of view — distance dynamics.
• Small community and anomaly detection: Attractor allows
discovering arbitrary-size communities and anomalies that exist in
the real network.
• Scalability: Attractor has low time complexity O(|E|) and is easy
to speed up.
Yi Liu
SJTU
2015/12/03
Thanks for your attention!
Q&A