Slides of the presentation

1
GRAPH PARTITIONING
AND CLUSTERING FOR
COMMUNITY DETECTION
Presented By: Group One
Outline
2





Introduction: Hong Hande
Graph Partitioning: Muthu Kumar C and Xie
Shudong
Partitional Clustering: Agus Pratondo
Spectral Clustering: Li Furong and Song
Chonggang
Summary and Applications of Community
Detection: Aleksandr Farseev
3
INTRODUCTION
-BY HONG HANDE
Facebook Group
4
https://www.facebook.com/thebeatles?rf=111113312246958
Flickr group
5
http://www.flickr.com/groups/49246928@N00/pool/with/417646359/#photo_417646359
CS6234 Advanced Algorithms
6
Whole class as a
community
Sub-community
Graph construction from web data(1)
7
Webpage www.x.com
href = “www.y.com”
href = “www.z.com”
x
Webpage www.y.com
href = “www.x.com”
href = “www.a.com”
href = “www.b.com”
Webpage www.z.com
href = “www.a.com”
y
z
a
b
Graph construction from web data(2)
8
Web pages as a graph
9
Cnn.com
Lots of links, lots of images. (1316 tags)
http://www.aharef.info/2006/05/websites_as_graphs.htm
Internet as a graph
10
nodes = service providers
edges = connections
hierarchical structure
S. Carmi,S. Havlin, S. Kirkpatrick, Y.
Shavitt, E. Shir. A model of Internet
topology using k-shell
decomposition. PNAS 104 (27), pp.
11150-11154, 2007
Emerging structures
11

Graph (from web, daily life) present certain
structural characteristics

Group of nodes interacting with each other
Dense inter-connections
functional/topical associations
Community
a.k.a. group, subgroup, module, cluster
Community Types
12

Explicit


The result of conscious human decision
Implicit
Emerging from the interactions & activities of users
 Need special methods to be discovered

Defining Communities
13

Often communities are defined with respect to
a graph, G = (V,E) representing a set of
objects (V) and their relations (E).

Even if such graph is not explicit in the raw
data, it is usually possible to construct, e.g.
feature vectors distances graph
Communities and graphs
14

Given a graph, a community is defined as a set of nodes
that are more densely connected to each other than to
the rest of the network nodes
Internal edge
External edge
Graph cuts
15

A cut is a partition of the vertices of a graph into two
disjoint subsets.

The cut-set of the cut is the set of edges whose end
points are in different subsets of the partition.
Community detection methods
16

Graph partitioning

Node clustering
 K-means
clustering
 Spectral clustering
17
GRAPH PARTITIONING
MUTHU KUMAR C
Graph Partitioning
18
 Dividing
vertices into groups of predefined size.
 Given
a graph G = (V, E, WE), with vertices V, edges
E and edge weights WE.
Choose a partition such that:


V = V1 U V2 U … U VP
V1∩ V2 …. ∩ Vp = Ø
 Bisectioning:
of vertices.
Partitioning into two equal sized groups
How many partitions?
19


There exists many possible partitioning to search.
Just to divide into 2 partitions there are:
𝑛
=
𝑛
2

𝑛!
𝑛 2
!
2
which is exponential in n.
Choosing optimal partitioning is NP-complete.
1
5
1
5
1
5
1
5
2
6
2
6
2
6
2
6
3
7
3
7
3
7
3
7
4
8
4
8
4
8
4
8
Kernighan/Lin Algorithm1
20

An iterative, 2-way, balanced partitioning (bi-sectioning)
heuristic.

The algorithm can also be extended to solve more general
partitioning problems.

Given 𝐺 = 𝑉, 𝐸, 𝑊𝐸 and find a partition such that:

𝑉 = 𝐴 ∪ 𝐵, 𝐴 ∩ 𝐵 = ∅, 𝐴 = 𝐵

Cutsize T between A and B is minimized.
T = 𝑎 ∊𝐴,𝑏∊𝐵, 𝑤(𝑎, 𝑏) , where 𝑤 . , . ∊ 𝑊𝐸
1. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for
partitioning graphs. Bell system technical journal, 49(2), 291-307.
Kernighan-Lin: Definitions
21






Let 𝑎 ∊ 𝐴 and 𝑏 ∊ 𝐵 be two vertices.
External Cost 𝐸𝑎 = 𝑏∊𝐵 𝑤(𝑎, 𝑏)
Internal Cost 𝐼𝑎 = 𝑎′∊𝐴/𝑎 𝑤(𝑎, 𝑎′)
Moving a node from A to B increases T by 𝐼𝑎
and decreases T by 𝐸𝑎 .
This is measured as 𝐷𝑎 = 𝐸𝑎 − 𝐼𝑎
𝐸𝑏 , 𝐼𝑏 and 𝐷𝑏 are defined analogously for
b in B.
K/L Algorithm: Swap
22
a
a
b
A
B
Cutsize 𝑇 = 4
b
A
a
B
b
A
B
Cutsize 𝑇 = 3
Kernighan-Lin Algorithm
25
// KERNIGHAN-LIN Page 1 of 2
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT //
SWEEP BEGINS
Compute costs D(v) for all v in V
Unmark all vertices in V
Each sweep greedily
computes |V|/2 possible
X  A, Y  B to swap,
picks a sequence of
best such swaps.
While there are unmarked nodes
Find an unmarked pair (ai,bi) with maximal gai,bi(i)
Mark ‘a’ and ‘b’ but do not swap them.
Update D(v) for all unmarked v
as though ‘a’ and ‘b’
Endwhile
had been swapped.
𝒈𝒂𝒃 = 𝑫𝒂 + 𝑫𝒃 − 𝟐𝒘(𝒂, 𝒃)
 (1)
𝒏𝒆𝒘𝑫𝒂′ = 𝑫𝒂′ + 𝟐𝒘 𝒂′ , 𝒂 − 𝟐𝒘 𝒂′ , 𝒃  (2)
𝒏𝒆𝒘𝑫𝒃′ = 𝑫𝒃′ + 𝟐𝒘 𝒃′, 𝒃 − 𝟐𝒘(𝒃′, 𝒂)
Kernighan-Lin Algorithm
26
// KERNIGHAN-LIN Page 2 of 2
We have now computed:
*) a sequence of pairs(a1,b1), … , (ak,bk) and
*) gains g(1),…., g(k) where k = |V|/2,
numbered in the order in which we marked them
Pick m ≤ k, which maximizes gain.
Gain is reduction
𝒎
in cost from
GAIN = 𝒊=𝟏 𝒈(𝒊)
swapping (a1,b1)
through (am,bm)
If Gain > 0 then // it is worth swapping
Update newA = A - { a1,…,am } U { b1,…,bm }
Update newB = B - { b1,…,bm } U { a1,…,am }
Update T = T – Gain
endif
UNTIL GAIN <= 0 // SWEEP ENDS
Kernighan-Lin Example
30
1
5
2
6
3
7
4
8
Cut cost: 9
Unmarked:
1,2,3,4,5,6,7,8
Edges are unweighted in this example
Kernighan/Lin Example
31
1
5
2
6
3
7
4
8
Cut cost: 9
Unmarked :
1,2,3,4,5,6,7,8
Calculate D values to find
best pair
Costs D(v) of each node:
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
Nodes that lead to
maximum gain
Kernighan/Lin Example
32
1
5
2
6
3
7
4
8
Cut cost: 9
Unmarked :
1,2,3,4,5,6,7,8
Mark the identified pair
as a candidate swap.
Costs D(v) of each node:
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
Nodes that lead to maximum
gain
Gain after node swapping
Gain in the current pass
Kernighan/Lin Example
33
1
5
1
5
2
6
2
6
3
7
3
7
4
8
4
8
Cut cost: 9
Unmarked:
1,2,3,4,5,6,7,8
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
Cut cost: 6
Unmarked:
1,2,4,6,7,8
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
New partitions and
cut cost
Kernighan/Lin Example
34
1
5
1
5
2
6
2
6
3
7
3
7
4
8
4
8
Cut cost: 9
Unmarked:
1,2,3,4,5,6,7,8
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
Cut cost: 6
Unmarked:
1,2,4,6,7,8
D(1) = -1
D(2) = -1
D(4) = 3
D(6) = 2
D(7)=-1
D(8)=-1
Kernighan/Lin Example
35
1
5
1
5
1
5
2
6
2
6
2
6
3
7
3
7
3
7
4
8
4
8
4
8
Cut cost: 9
Unmarked:
1,2,3,4,5,6,7,8
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
Cut cost: 6
Unmarked:
1,2,4,6,7,8
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
D(1) = -1
D(2) = -1
D(4) = 3
D(6) = 2
D(7)=-1
D(8)=-1
g2 = 3+2-0 = 5
Swap (4,6)
G2 = G1+g2 =8
Nodes that lead to maximum
gain
Gain after node swapping
Gain in the current pass
Kernighan/Lin Example
36
1
5
1
5
1
5
1
5
2
6
2
6
2
6
2
6
3
7
3
7
3
7
3
7
4
8
4
8
4
8
4
8
Cut cost: 9
Unmarked:
1,2,3,4,5,6,7,8
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
Cut cost: 6
Unmarked:
1,2,4,6,7,8
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
D(1) = -1
D(2) = -1
D(4) = 3
Cut cost: 1
Unmarked:
1,2,7,8
D(6) = 2
D(7)=-1
D(8)=-1
g2 = 3+2-0 = 5
Swap (4,6)
G2 = G1+g2 =8
D(1) = -3
D(2) = -3
Cut cost: 7
Unmarked:
2,8
D(7)=-3
D(8)=-3
g3 = -3-3-0 = -6
Swap (1,7)
G3= G2 +g3 = 2
Nodes that lead to maximum
gain
Gain after node swapping
Gain in the current pass
Kernighan/Lin Example
37
1
5
1
5
1
5
1
5
1
5
2
6
2
6
2
6
2
6
2
6
3
7
3
7
3
7
3
7
3
7
4
8
4
8
4
8
4
8
4
8
Cut cost: 6
Unmarked:
1,2,4,6,7,8
Cut cost: 9
Unmarked:
1,2,3,4,5,6,7,8
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
D(1) = -1
D(2) = -1
D(4) = 3
Cut cost: 1
Unmarked:
1,2,7,8
D(6) = 2
D(7)=-1
D(8)=-1
g2 = 3+2-0 = 5
Swap (4,6)
G2 = G1+g2 =8
D(1) = -3
D(2) = -3
Cut cost: 7
Unmarked:
2,8
D(7)=-3
D(8)=-3
g3 = -3-3-0 = -6
Swap (1,7)
G3= G2 +g3 = 2
Cut cost: 9
Unmarked:
–
D(2) = -1
D(8)=-1
g4 = -1-1-0 = -2
Swap (2,8)
G4 = G3 +g4 = 0
Kernighan/Lin Example
38
D(1) = 1
D(2) = 1
D(3) = 2
D(4) = 1
D(5) = 1
D(6) = 2
D(7) = 1
D(8) = 1
g1 = 2+1-0 = 3
Swap (3,5)
G1 = g1 =3
D(1) = -1
D(2) = -1
D(4) = 3
D(6) = 2
D(7)=-1
D(8)=-1
g2 = 3+2-0 = 5
Swap (4,6)
G2 = G1+g2 =8
D(1) = -3
D(2) = -3
D(7)=-3
D(8)=-3
D(2) = -1
g4 = -1-1-0 = -2
Swap (2,8)
G4 = G3 +g4 = 0
g3 = -3-3-0 = -6
Swap (1,7)
G3= G2 +g3 = 2
Maximum positive gain Gm = 8 with m = 2.
Since Gm > 0, the first m = 2 swaps
(3,5) and (4,6) are executed.
Since Gm > 0, more passes are needed until
Gm  0.
D(8)=-1
1
5
2
6
3
7
4
8
Escaping Local minima
39

Non monotonically increasing gains, that is,
in the sequence of m swaps chosen, some 𝑔𝑎𝑖 ,𝑏𝑖 (𝑘)
may be negative.

Possibly escape “local minima”.

But there is no guarantee of optimal solution.
Demerits
40

Bi-sectioning does not generalize well to k-way
partitioning.

Partition to predefined sizes limits utility to niche
applications.
41
ANALYSIS OF K/L
ALGORITHM
XIE SHUDONG
K/L Algorithm: Analysis
42
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
Compute costs D(v) for all v in V
Unmark all vertices in V
While there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b)
Mark ‘a’ and ‘b’ (but do not swap them)
Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then … it is worth swapping
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
43
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
Compute costs D(v) for all v in V
Unmark all vertices in V
While there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b)
Mark ‘a’ and ‘b’ (but do not swap them)
Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then … it is worth swapping
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
A
B
endif
Edges * Nodes = All Ext Edges
UNTIL GAIN <= 0
|V|/2 * |V|/2
=
|V|²/4
K/L Algorithm: Analysis
44
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V
For one node a:
Unmark all vertices in V
O(|V|)
D(a) = E(a) – I(a)
While there are unmarked nodes
a
Find an unmarked pair (a, b) with maximal g(a, b)
Mark ‘a’ and ‘b’ (but do not swap them) For all |V| nodes
O(|V|²)
Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
b
endif
A
B
UNTIL GAIN <= 0
K/L Algorithm: Analysis
45
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V
O(|V|) Unmark all vertices in V
While there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b)
Mark ‘a’ and ‘b’ (but do not swap them)
Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
46
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B The (i+1)-th Candidate Pair
REPEAT
A
B
Pairs
O(|V|²) Compute costs D(v) for all v in V
|𝑉|
|𝑉|
𝑉
𝑉
O(|V|) Unmark all vertices in V
−𝑖
−𝑖
(
− 𝑖)(
− 𝑖)
2
2
2
2
While there are unmarked nodes
O(|V|²) Find an unmarked pair (a, b) with maximal g(a, b)
Mark ‘a’ and ‘b’ (but do not swap them)
Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
47
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V
O(|V|) Unmark all vertices in V
While there are unmarked nodes
O(|V|²) Find an unmarked pair (a, b) with maximal g(a, b)
O(1) Mark ‘a’ and ‘b’ (but do not swap them)
Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
48
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V newD(a’) = D(a’) + 2*w(a’, a) - 2*w(a’, b) O(1)
O(|V|) Unmark all vertices in V
(i+1)-th loop: |V|-2i Unmarked Nodes
While there are unmarked nodes
O(|V|²) Find an unmarked pair (a, b) with maximal g(a, b)
O(1) Mark ‘a’ and ‘b’ (but do not swap them)
O(|V|) Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
49
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V
|V|/2 pairs to be found
O(|V|) Unmark all vertices in V
O(|V|³) While there are unmarked nodes
O(|V|²) Find an unmarked pair (a, b) with maximal g(a, b)
O(1) Mark ‘a’ and ‘b’ (but do not swap them)
O(|V|) Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
50
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
g(1)
O(|V|²) Compute costs D(v) for all v in V
g(1) + g(2)
…
O(|V|) Unmark all vertices in V
O(|V|³) While there are unmarked nodes
g(1) + g(2) + … + g(m) → G
O(|V|)
…
O(|V|²) Find an unmarked pair (a, b) with maximal
g(a, b)
g(1) + g(2) + … + g(m) + … + g(|V|/2)
O(1) Mark ‘a’ and ‘b’ (but do not swap them)
O(|V|²)
O(|V|) Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
O(|V|) Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
Update newA = A - { a1, …, am } ∪ { b1, …, bm }
Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
51
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V
O(|V|) Unmark all vertices in V
O(|V|³) While there are unmarked nodes
O(|V|²) Find an unmarked pair (a, b) with maximal g(a, b)
O(1) Mark ‘a’ and ‘b’ (but do not swap them)
O(|V|) Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
O(|V|) Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
If Gain > 0 then
O(|V|) Update newA = A - { a1, …, am } ∪ { b1, …, bm }
O(|V|) Update newB = B - { b1, …, bm } ∪ { a1, …, am }
Update T = T – G
endif
UNTIL GAIN <= 0
A
K/L Algorithm: Analysis
52
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT
O(|V|²) Compute costs D(v) for all v in V
O(|V|) Unmark all vertices in V
O(|V|³) While there are unmarked nodes
O(|V|²) Find an unmarked pair (a, b) with maximal g(a, b)
O(1) Mark ‘a’ and ‘b’ (but do not swap them)
O(|V|) Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
O(|V|) Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
O(|V|) If Gain > 0 then
O(|V|) Update newA = A - { a1, …, am } ∪ { b1, …, bm }
O(|V|) Update newB = B - { b1, …, bm } ∪ { a1, …, am }
O(1) Update T = T – G
endif
UNTIL GAIN <= 0
K/L Algorithm: Analysis
53
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B
How many Iterations?
REPEAT (p iterations) O(p |V|³)
O(|V|²) Compute costs D(v) for all v in V
O(|V|) Unmark all vertices in V
O(|V|³) While there are
unmarked
nodesby Kernighan and Lin
Empirical
testing
O(|V|²) Find an unmarked
(a, b)
with maximal
g(a, b)
on small pair
graphs
(|V|<=360)
showed
O(1) Mark ‘a’ and
convergence
afterswap
2 to them)
4 passes
‘b’ (but do not
O(|V|) Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been swapped
Endwhile
O(|V|) Pick m maximizing 𝐆𝐚𝐢𝐧 = 𝒎
𝒊=𝟏 𝒈(𝒊)
O(|V|) If Gain > 0 then
O(|V|) Update newA = A - { a1, …, am } ∪ { b1, …, bm }
O(|V|) Update newB = B - { b1, …, bm } ∪ { a1, …, am }
O(1) Update T = T – G
endif
UNTIL GAIN <= 0
54
K-MEANS CLUSTERING
by Agus Pratondo
Graph in Rn
55
a
2
x
c
2
5
b
y
1
e
3
d
1
1
Graph in Rn
56
Adjacency matrix
a
2
x
c
2
5
b
y
1
e
3
d
1
1
a
b
c
d
e
x
y
a
0
2
0
0
0
0
0
b
2
0
0
0
0
1
5
c
0
0
0
0
0
2
0
d
0
0
0
0
1
3
0
e
0
0
0
1
0
0
1
x
0
1
2
3
0
0
0
y
0
5
0
0
1
0
0
Graph in Rn
57
points in R7 :
Adjacency matrix
a
2
x
c
2
5
b
y
1
e
3
d
1
1
a
b
c
d
e
x
y
a
0
2
0
0
0
0
0
b
2
0
0
0
0
1
5
c
0
0
0
0
0
2
0
d
0
0
0
0
1
3
0
e
0
0
0
1
0
0
1
x
0
1
2
3
0
0
0
y
0
5
0
0
1
0
0
a
b
c
d
e
x
y
p
(0,2,0,0,0,0,0)
(2,0,0,0,0,1,5)
(0,0,0,0,0,2,0)
(0,0,0,0,1,3,0)
(0,0,0,1,0,0,1)
(0,1,2,3,0,0,0)
(0,5,0,0,1,0,0)
Algorithm
58
Algorithm: Basic K-means
1: Select K points as the initial centroids
2: repeat
3: Form K clusters by assigning all points to the closest centroids
4: Re-compute the centroids of each cluster
5: until the centroids do not change
K-means example, step 1
59
Let K = 3
Y
X
K-means example, step 1
60
Let K = 3
k1
Y
Pick 3
initial
cluster
centers
(randomly)
k2
k3
X
Algorithm
61
Algorithm: Basic K-means
1: Select K points as the initial centroids
2: repeat
3: Form K clusters by assigning all points to the closest centroids
4: Re-compute the centroids of each cluster
5: until the centroids do not change
K-means example, step 2-3
62
k1
Y
Assign
each point
to the closest
cluster
center
k2
k3
X
Algorithm
63
Algorithm: Basic K-means
1: Select K points as the initial centroids
2: repeat
3: Form K clusters by assigning all points to the closest centroids
4: Re-compute the centroids of each cluster
5: until the centroids do not change
K-means example, step 4
64
k1
k`1
Y
Move
each cluster
center
to the mean
of each
cluster
k2
k`3
k`2
k3
X
Algorithm
65
Algorithm: Basic K-means
1: Select K points as the initial centroids
2: repeat
3: Form K clusters by assigning all points to the closest centroids
4: Re-compute the centroids of each cluster
5: until the centroids do not change
K-means example, step 3-4(repeat)
66
Y
Reassign
points
closest to a
different new
cluster center
k1
k3
k2
X
K-means example, step 3-4(repeat)
67
Y
k1
Three points
change
k3
k2
X
K-means example, step 3-4(repeat)
68
Y
k1
re-compute
cluster
means
k3
k2
X
K-means example, step 3-4 (repeat)
69
Y
k1
move
cluster
centers to
cluster
means
k2
The centers change,
repeat step 3-4
k3
X
K-means example, step 5
70
Y
k1
No cluster
center
changes
converged
k2
k3
X
Time Complexity
71
Algorithm: Basic K-means
1: Select K points as the initial centroids
2: repeat
3: Form K clusters by assigning all points to the closest centroids
4: Re-compute the centroids of each cluster
5: until the centroids do not change
Loop is repeated i times.
 Step 3: There are n points. Each points, the distance to k clusters is evaluated.
 Step 4: There are k cluster center that should be re-computed.
If there are k clusters with n points , and the iteration is repeated i times then
the time complexity will be
O(kni)
Discussion
72

Result can vary significantly depending on the initial
choice of seeds (number and position)
 To increase chance of finding global optimum:
restart with different random seeds.
Problem with initializations
73
4 most top points
4 most bottom points
Problem with initializations
74
4 most left points
4 most right points
Discussion
75

Result can vary significantly depending on initial choice
of seeds (number and position)
 To increase chance of finding global optimum:
restart with different random seeds.
76
SPECTRAL
CLUSTERING
-BY LI FURONG
Motivation
77
Motivation
78

Two kinds of clusters
convex shaped
non-convex shaped
Motivation
79

Two kinds of clusters
 convex
shaped, compact
convex shaped
k-means
non-convex shaped
Motivation
80

Two kinds of clusters
 convex
k-means
shaped, compact
spectral clustering
 non-convex shaped, connected
convex shaped
non-convex shaped
Key Idea
81


Project the data points into a new space
Clusters can be trivially detected in the new
space
Key Idea
82

Project the data points into a new space
Clusters can be trivially detected in the new
space

Next, we will cover

 How
to find the new space
 How to represent data points in the space
Matrix Representations of Graphs
84
Matrix Representations of Graphs
85

Adjacency matrix W

Degree di of a node i

Degree matrix D
Matrix Representations of Graphs
86

Adjacency matrix W

Degree di of a node i

Degree matrix D
Graph Laplacian
87

Graph Laplacian
Graph Laplacian
88

Graph Laplacian

Next, we will see some properties of L, which
would be used for spectral clustering

We will work closely with linear algebra,
especially eigenvalues and eigenvectors
Properties of Graph Laplacian (1)
89
(1)
(2)
Properties of Graph Laplacian (1)
90
(1)
(2)
Proof:
apply Equation 2
Properties of Graph Laplacian (1)
91
(1)
(2)
Proof:
apply Equation 2
Properties of Graph Laplacian (1)
92
(1)
(2)
Proof:
apply Equation 2
apply Equation 1
Properties of Graph Laplacian (1)
93
(1)
(2)
Proof:
apply Equation 2
apply Equation 1
Properties of Graph Laplacian (2)
96
The smallest eigenvalue of L is 0, the
corresponding eigenvector is the constant
one vector .
(1)
(2)
Properties of Graph Laplacian (2)
97
The smallest eigenvalue of L is 0, the
corresponding eigenvector is the constant
one vector .
(1)
(2)
Properties of Graph Laplacian (2)
98
The smallest eigenvalue of L is 0, the
corresponding eigenvector is the constant
one vector .
Proof:
(1)
(2)
Properties of Graph Laplacian (2)
99
The smallest eigenvalue of L is 0, the
corresponding eigenvector is the constant
one vector .
Proof:
(1)
(2)
We Have Done So Many Works…
101
We Have Done So Many Works…
102

Transform the graph to Laplacian L
We Have Done So Many Works…
103

Transform the graph to Laplacian L

Study the properties of L, basically the
eigenvalues and eigenvectors
We Have Done So Many Works…
104

Transform the graph to Laplacian L

Study the properties of L, basically the
eigenvalues and eigenvectors

Finally, we can see the relationship between
the graph and the eigenvalues!
Number of Connected Components & Eigenvalues of L
105
Number of Connected Components & Eigenvalues of L
106
a connected component of an undirected graph is a subgraph in
which any two vertices are connected to each other by paths, and
which is connected to no additional vertices in the supergraph
Number of Connected Components & Eigenvalues of L
107
a connected component of an undirected graph is a subgraph in
which any two vertices are connected to each other by paths, and
which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear
independent eigenvectors corresponding to v
Number of Connected Components & Eigenvalues of L
108
a connected component of an undirected graph is a subgraph in
which any two vertices are connected to each other by paths, and
which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear
independent eigenvectors corresponding to v
indicator vector:
Number of Connected Components & Eigenvalues of L
109
a connected component of an undirected graph is a subgraph in
which any two vertices are connected to each other by paths, and
which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear
independent eigenvectors corresponding to v
indicator vector:
Number of Connected Components & Eigenvalues of L
110
a connected component of an undirected graph is a subgraph in
which any two vertices are connected to each other by paths, and
which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear
independent eigenvectors corresponding to v
indicator vector:
1
⋮
1
1
⋮
1
eigenvectors corresponding
to eigenvalue 0
Proof of Proposition 2
111
Proof of Proposition 2
112
1 connected component
Proof of Proposition 2
113
1 connected component
Proof of Proposition 2
114
1 connected component
i
j
Proof of Proposition 2
115
1 connected component
i
m
n
j
Proof of Proposition 2
116
1 connected component
i
m
n
j
1
𝑓= ⋮
1
Proof of Proposition 2
117
1 connected component
i
m
n
1
𝑓= ⋮
1
j
several connected components
1
⋮
1
1
⋮
1
Proof of Proposition 2
118
1 connected component
i
m
n
1
𝑓= ⋮
1
j
several connected components
1
⋮
1
1
⋮
1
119
SPECTRAL
CLUSTERING
-BY SONG CHONGGANG
Spectral Clustering Algorithm
120

Input: Graph


, number k of clusters to form
Compute adjacency matrix W and degree matrix D
Laplacian L = D – W
Spectral Clustering Algorithm
121

Input: Graph




, number k of clusters to form
Compute adjacency matrix W and degree matrix D
Laplacian L = D – W
Compute the first k eigenvectors
of L
Let
contain the vectors
as columns
Spectral Clustering Algorithm
122

Input: Graph




, number k of clusters to form
Compute adjacency matrix W and degree matrix D
Laplacian L = D – W
Compute the first k eigenvectors
of L
Let
contain the vectors
as columns
New space found!
Spectral Clustering Algorithm
123

Input: Graph






, number k of clusters to form
Compute adjacency matrix W and degree matrix D
Laplacian L = D – W
Compute the first k eigenvectors
of L
Let
contain the vectors
as columns
Let
be the vector corresponding to the i-th row of U
Cluster the points
into k clusters using k-means
Spectral Clustering Algorithm
124

Input: Graph






, number k of clusters to form
Compute adjacency matrix W and degree matrix D
Laplacian L = D – W
Compute the first k eigenvectors
of L
Let
contain the vectors
as columns
Let
be the vector corresponding to the i-th row of U
Cluster the points
into k clusters using k-means
Representing data in the new space!
Spectral Clustering Algorithm
125

Input: Graph






, number k of clusters to form
Compute adjacency matrix W and degree matrix D
Laplacian L = D – W
Compute the first k eigenvectors
of L
Let
contain the vectors
as columns
Let
be the vector corresponding to the i-th row of U
Cluster the points
into k clusters using k-means
Time Complexity: O(n3)
Example(1)
126


Now let’s go through an example.
n = 6, k=2
Example(2)
127

Step 1: Weighted adjacency matrix W and
degree matrix D
0
0
0
0
0
0
0
0
0
0
0
Adjacency Matrix W
0
0
0
0
0
0
Degree Matrix D
Example(3)
128

Step 2: Laplacian matrix
 L=D-W
Laplacian Matrix L
Example(4)
129

Step 3: Eigen-decomposition
 Eigenvalues
=
Laplacian Matrix L
 Eigenvectors=
Example(5)
130

Step 3: Eigen-decomposition
 Eigenvalues
=
 Eigenvectors=
U
Example(6)
131

Step 4: Embedding
 U=
Example(6)
132

Step 4: Embedding
 U=
Each row represents a data point
Example(7)
133

Step 4: Embedding
 U=
0.5
0.25
0
-0.25
-0.5
-0.5
 Map
it to a two-dimensional space
-0.25
0
0.25
0.5
Example(8)
134

Step 5: Clustering
 K-means
clustering
0.5
0.5
0.25
0.25
0
0
-0.25
-0.25
-0.5
-0.5
Cluster A
Cluster B
-0.5
-0.25
0
0.25
0.5
-0.5
-0.25
0
0.25
0.5
Example(8)
135

Step 5: Clustering
 K-means
clustering
0.5
0.5
0.25
0.25
0
0
-0.25
-0.25
-0.5
-0.5
Cluster A
Cluster B
-0.5
-0.25
0
0.25
0.5
-0.5
-0.25
0
0.25
0.5
Why Spectral Clustering Works?(1)
136

Consider an ideal case
There are no similarities between any nodes in
different connected components
 This conforms to Proposition 2:

Why Spectral Clustering Works?(1)
137

Consider an ideal case
There are no similarities between any nodes in
different connected components
 Compute the weighted adjacency matrix W and
degree matrix D.
 L = D - W; compute L’s 3 eigenvectors of eigenvalue
0.

Why Spectral Clustering Works?(1)
138

Consider an ideal case
There are no similarities between any nodes in
different connected components
 Compute the weighted adjacency matrix W and
degree matrix D.
 L = D - W; compute L’s 3 eigenvectors of eigenvalue
0.

1
⋮
1
1
⋮
1
Why Spectral Clustering Works?(2)
139

Consider an ideal case
 Let
the three eigenvectors be three columns of
matrix U.
1
⋮
1
1
⋮
1
U=
1
⋮
1
1
⋮
1
Why Spectral Clustering Works?(2)
140

Consider an ideal case
 Let
the three eigenvectors be three columns of a
matrix U.
 Project the rows in U to a 3-dimensional space.
1
⋮
1
U=
1
⋮
1
Why Spectral Clustering Works?(3)
141

Consider an ideal case
 Now
we use K-Means in this space, we can have
very good results.
 # of 0 eigenvalues = # of connected components
Why Spectral Clustering Works?(4)
142

What if not the ideal case?
 We
need to introduce Perturbation Theory.
Ideal Case
Why Spectral Clustering Works?(4)
143

What if not the ideal case?
 We
need to introduce Perturbation Theory.
 Perturbation
is like noise.
Perturbation
Ideal Case
Nearly ideal Case
Why Spectral Clustering Works?(5)
144

What if not the ideal case?
 Perturbation
Theory will not be formally discussed
here.
 References will be offered on IVLE.
Why Spectral Clustering Works?(5)
145

What if not the ideal case?
 Perturbation
Theory will not be formally discussed
here.
 What you need to know is:
 For
ideal case, the between-cluster similarity is 0.
 The first k eigenvectors of Laplacian matrix L are
indicators of clusters.
 For real case, L’ = L + H, where H is the perturbation.
 Perturbation theory tells us the eigenvectors
generated from L’ will be very close to the ideal
vectors from L, bounded by a small value.
APPLICATIONS AND
SUMMARY
-BY ALEKSANDR FARSEEV
Applications: VLSI
147
Very-large-scale integration (VLSI)
- the process of creating integrated
circuits by combining thousands
of transistors into a single chip.
Applications: VLSI design
148
System Specification
Partitioning
Architectural Design
ENTITY test is
port a: in bit;
end ENTITY test;
Functional Design
and Logic Design
Chip Planning
Circuit Design
Placement
Physical Design
DRC
LVS
ERC
Physical Verification
and Signoff
Clock Tree Synthesis
Signal Routing
Fabrication
Timing Closure
Packaging and Testing
Chip
Applications: VLSI design(2)
149
Circuit:
Cut cb
1
2
3
7
4
5
8
6
Cut ca
Block A
8
7
Block B
Block A
3
4
1
6
5
2
Cut ca: four external connections
8
7
Block B
5
4
1
6
3
2
Cut cb: two external connections
Applications: Social Media
150
One-modality (one-mode) network - type
of networks where all vertices are of the
same kind.
Multi-modality or multi-mode network type of networks where vertices are of
different kinds.
Hypergraph - is a generalization of a
graph, where an edge (hyperedge) can
connect any number of vertices.
(k, k) - (hyper) network - a network with k
modalities and hyper edges involving
exactly k vertices with each vertex from
one unique modality.
(3, 3) – network
Applications: Social Media (2)
151
Graph
representation
(3, 3) – network
Applications: Social Media (3)
152
Venues
1
2
3
4
1
2
Users
Users
1
2
3
4
𝐴𝑚 × 𝑛
1
1
0
0
1
0
1
1
0
1
0
1
4
5
0
0
0
1
5
(1,2) User – User
similarity network
(2, 2) User - Venue network
1
0
=
1
1
3
𝐴𝑇 𝐴
= 𝑊,
𝑎𝑖 𝑇 𝑎𝑗 = 𝑤𝑖,𝑗 𝑓𝑜𝑟 𝑖 ≠ 𝑗
𝑤𝑖,𝑗 = 0
𝑊𝑛 × 𝑛
0
1
= 3
1
1
1
0
1
1
0
3
1
0
1
1
1
1
1
0
1
1
0
1
1
0
Applications: Social Media (4)
153
(3, 3) User - Venue - Photo network
(1, 2) User – User network
Applications: Social Media (5)
154
Applications: Social Media (6)
155
Applications: Social Media (7)
156
http://next.comp.nus.edu.sg
Other applications
157

Parallel processing


Parallel Graph Computations
Complex Networks

Power Grids

Geographically Embedded Networks

Road Networks

Image Processing
Summary: KL Graph Partitioning
158

Time complexity – O 𝑁 3 𝑖 , where 𝑁 is # objects, 𝑖 is #
iterations.

Able to perform only Bipartition.

Can not detect overlapping communities.
Summary: K-Means
159

Fast: O 𝑁𝑘𝑖 , where 𝑁 is # objects, 𝑘 is # clusters, and 𝑖
is # iterations.

Easy to implement.

Need to specify k, the number of clusters, in advance

Not suitable to discover clusters with non-convex shapes

Can not detect overlapping communities
Summary: Spectral clustering
160

Time complexity – O 𝑁 3 𝑖 , where 𝑁 is # objects, 𝑖 is #
iterations.

Able to discover clusters with non-convex shapes

Need to specify k, the number of clusters, in advance

Can not detect overlapping communities.
161
Sources
162
1. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure
for partitioning graphs. Bell system technical journal, 49(2), 291-307
2. James Demmel, CS 267: Applications of Parallel Computers ,
Graph Partitioning,
http://www.cs.berkeley.edu/~demmel/cs267_Spr09
3. Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu, VLSI
Physical Design: From Graph Partitioning to Timing Closure
4. Sadiq M. Sait & Habib Youssef King, Chapter 2: Partitioning, Fahd
University of Petroleum & Minerals College of Computer Sciences
& Engineering Department of Computer Engineering September
2003.
5. http://shabal.in/visuals/kmeans/2.html
6. www.cs.ucr.edu/~eamonn/205/MachineLearning3.ppt
7. info.psu.edu.sa/psu/cis/asameh/cs-500/dm13-clustering.ppt

Download Report

Slides of the presentation

Paperzz.com

Your Paperzz