Social Media and Social Computing

3.3 Network-Centric Community Detection
 Network-Centric Community Detection
– consider the global topology of a network.
– It aims to partition nodes of a network into a number of disjoint sets
• A group in this case is not defined independently
 Classification of Network-Centric Community Detection
– Vertex Similarity
• Structurally equivalence, Jaccard similarity, Cosine similarity
– Latent Space Model
• Multi Dimensional Scaling
– Block Model Approximation
– Spectral Clustering
– Modularity Maximization
1
3.3 Network-Centric Community Detection
 Vertex Similarity
– Vertex similarity is defined in terms of the similarity of their social circles,
e.g., the number of friends they share in common
– Various Definitions and Methods for vertex similarity
•
•
•
•
•
•
Structurally equivalence
Automorphic equivalence
Regular equivalence
k-means algorithm with connection feature
Jaccard similarity
Cosine similarity
2
3.3 Network-Centric Community Detection
 Structurally Equivalence
– Actors 𝑣𝑖 and 𝑣𝑗 are structurally equivalent,
if for any actor 𝑣𝑘 that 𝑣𝑘 ≠ 𝑣𝑖 and 𝑣𝑘 ≠ 𝑣𝑗 ,
𝑒 𝑣𝑖 , 𝑣𝑘 ∈ 𝐸 iff 𝑒 𝑣𝑗 , 𝑣𝑘 ∈ 𝐸.
• actors 𝑣𝑖 and 𝑣𝑗 are connecting to exactly
the same set of actors in a network
• For example,
 nodes 1 and 3 are structurally equivalent
 nodes 5 and 6 are structurally equivalent
– Nodes of the same equivalence class form a community
• But, it is too restrictive for practical use
• Other relaxed definitions of equivalence such as “automorphic equivalence”
and “regular equivalence” are proposed
 But no scalable approach exists to find them.
3
3.3 Network-Centric Community Detection
 Jaccard and Cosine Similarity
– For two nodes 𝑣𝑖 and 𝑣𝑗 in a network,
the similarity between the two are defined as
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 𝑣𝑖 , 𝑣𝑗 =
𝐶𝑜𝑠𝑖𝑛𝑒 𝑣𝑖 , 𝑣𝑗 =
𝑁𝑖 ∩ 𝑁𝑗
𝑁𝑖 ∪ 𝑁𝑗
=
𝑘 𝐴𝑖𝑘 𝐴𝑗𝑘
𝑁𝑖 + 𝑁𝑗 −
𝐴𝑖 ∙ 𝐴𝑗
=
| 𝐴𝑖 | × ||𝐴𝑗 ||
𝑘 𝐴𝑖𝑘 𝐴𝑗𝑘
𝑘 𝐴𝑖𝑘 𝐴𝑗𝑘
𝑠 𝐴𝑖𝑠
2
∙
𝑡 𝐴𝑗𝑡
=
2
𝑁𝑖 ∩ 𝑁𝑗
|𝑁𝑖 | ∙ |𝑁𝑗 |
• where | * | is the cardinality of the set, “∙” denote the inner product of
vectors, || * || is the norm (or magnitude) of the vector.
– For example, 𝑁4 = {1,3,5,6} and 𝑁6 = {4,5,7,8}
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 4, 6 =
{5}
1
=
{1,3,4,5,6,7,8}
7
𝐶𝑜𝑠𝑖𝑛𝑒 4, 6 =
{5}
4∙4
=
1
4
4
3.3 Network-Centric Community Detection
 Jaccard and Cosine Similarity
– For example, 𝑁7 = {5,6,8,9} and 𝑁9 = {7}
• 𝑁7 ∩ 𝑁9 = 𝜙
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 7, 9 = 0
𝐶𝑜𝑠𝑖𝑛𝑒 7, 9 = 0
• However, two nodes are likely to share some
similarity if they are connected
– A modification
 include node 𝑣 when we compute 𝑁𝑣
 𝑁7 = {5,6,7,8,9} and 𝑁9 = {7,9}
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 7, 9 =
{7,9}
2
=
{5,6,7,8,9}
5
𝐶𝑜𝑠𝑖𝑛𝑒 4, 6 =
{7,9}
5∙2
=
2
10
5
3.3 Network-Centric Community Detection
 Jaccard and Cosine Similarity
– Similarity-based community detection method
• 1) Set a community threshold 𝜎
• 2) the similarity for each pair of nodes in the given graph
• 3) If the similarity of two nodes is over the threshold 𝜎, the two nodes form
the same community.
0.9
𝜎 = 0.5
0.2
0.9
0.7
0.4
0.2
0.4
0.7
– Compute the similarity for each pair of nodes
 time-consuming when n is very large
• [Note] Shinging Algorithm
 Discovering Large Dense Subgraphs in Massive Graphs, David Gibson Ravi
Kumar Andrew Tomkins, IBM Almaden Research Center
 Finding dense clusters in web graph, Prof. Donald J. Patterson Scribe: Minh Doan,
Ching-wei Huang, Siripen Pongpaichet
6
3.3 Network-Centric Community Detection
 Latent Space Model
– It maps nodes into a “low-dimensional Euclidean space”
• Proximity between nodes are kept in the new space
• Nodes are clustered in the new space using methods like k-means
 MDS (multi-dimensional scaling)
– It requires the input of a proximity matrix 𝑃 𝜖 ℝ𝑛×𝑛 , with each entry
𝑝𝑖𝑗 denoting the distance between a pair of nodes 𝑖 and 𝑗 in the
network.
– 𝑆 𝜖 ℝ𝑛×𝑙 denote the coordinates of nodes in the 𝑙-dimensional space.
– The relationship between 𝑆 and 𝑃
• I is the identity matrix
• 1 is an n-dimensional column vector with each entry being 1
• ◦ is the element-wise matrix multiplication
I. Borg and P. Groenen. Modern Multidimensional Scaling:
theory and applications. Springer, 2005.
7
3.3 Network-Centric Community Detection
 MDS (multi-dimensional scaling) (cont.)
– Suppose that…
• 𝑉 contains the top 𝑙 eigenvectors of 𝑃 with largest eigenvalues
• Λ is a diagonal matrix of top 𝑙 eigenvalues
 That is, Λ = 𝑑𝑖𝑎𝑔(𝜆1 , 𝜆2 , … , 𝜆𝑙 )
– The optimal 𝑆 is 𝑆 = 𝑉Λ1/2
– The classical k-means algorithm can be applied to S to find
community partitions.
 MDS Example (1/3)
8
3.3 Network-Centric Community Detection
 MDS Example (2/3)
9
3.3 Network-Centric Community Detection
 MDS Example (3/3)
K-means
algorithm
10