Planted partition models
Daniel Hsu
COMS 4772
1
Planted partition models
I
I
I
I
Also called “stochastic block models” in statistics.
Regarded as model for “community structure” in networks.
Extremely fashionable, not very realistic.
Interesting to study.
2
Planted bisection
I
I
n people, partition into two groups of n/2 each.
Appearance of edges (e.g., links, friendship, interaction)
between people are random and independent.
I
I
I
I
Two people in same group have edge with probability p.
Two people in different groups have edge with probability q < p.
Only observe edges (adjacency matrix); partition is “hidden”.
Goal: recover the groups.
3
Random adjacency matrix
I
I
Random adjacency matrix A in {0, 1}n×n
Expected value:
E(A) =
p
p
p
q
q
q
p
p
p
q
q
q
p
p
p
q
q
q
q
q
q
p
p
p
q
q
q
p
p
p
q
q
q
p
p
p
(Assuming people are ordered so first group is 1, 2, . . . , n/2.)
4
Spectral analysis
I
E(A) has rank 2:
p+q
E(A) =
2
p−q
+
2
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
−1
−1
−1
+1
+1
+1
−1
−1
−1
+1
+1
+1
−1
−1
−1
−1
−1
−1
+1
+1
+1
−1
−1
−1
+1
+1
+1
−1
−1
−1
+1
+1
+1
.
5
Spectral clustering
I
Top eigenvalue and eigenvector of E(A):
λ1 =
I
1
v1 = √ 1 .
n
Second eigenvalue and eigenvector of E(A):
λ2 =
I
p+q
·n,
2
p−q
·n,
2
v2,i =
+ √1
n
1
− √
n
if person i in group 1 ,
if person i in group 2 .
Spectral clustering: extract second eigenvector v̂ 2 of A, and
partition people based on sign of corresponding entry in v̂ 2 .
6
Noise
I
I
A = E(A) + Z for some zero-mean random matrix Z .
Using Matrix Bernstein inequality: with high probability,
kZ k2 ≤ O
I
p
pn log n + log n .
Sharper result (Vu, 2007): with high probability,
√
kZ k2 ≤ C pn
4
n
whenever p ≥ C log
.
n
Now relate eigenvectors of A to that of E(A).
0
I
7
Perturbation analysis
I
I
Pretend we already know (p + q)/2.
Let v ∗ be top eigenvector of E(A) −
I
I
vi∗ = ± √1n , corresponding eigenvalue λ∗ =
Let v̂ be top eigenvector of A −
I
Assume
· n.
√
p−q
· n − C pn .
2
1
p−q
√
√
p
n
so
λ̂ ≥
I
p+q
>
2 11 .
p−q
2
Using Weyl’s inequality: corresponding eigenvalue
λ̂ ≥
I
p+q
>
2 11
Using Davis-Kahan:
√
p−q
p−q
· n − C pn ≥
·n.
2
4
ε := k(I − v̂ v̂ )v k2
>
∗
√
√
C pn
p
4C
≤ p−q
=
· √ 1.
p−q
n
4 ·n
8
Comparing unit vectors
I
k(I − v̂ v̂ > )v ∗ k22 = 1 − hv̂, v ∗ i2 , so
n
min kv −
I
I
∗
v̂k22 ,
∗
kv −
(−v̂)k22
o
= 2(1 −
p
1 − ε2 ) ≤ 2ε2 .
(WLOG assume min achieved by kv ∗ − v̂k22 .)
Classification error rate: since vi∗ = ± √1n
n
n
1X
1X
∗
1{sign(vi ) 6= sign(v̂i )} ≤
(1 − nvi∗ v̂i )2
n i=1
n i=1
=
n
X
(vi∗ − v̂i )2
i=1
∗
= kv − v̂k22
≤ 2ε2 .
9
Boosting accuracy
I
I
I
Suppose 2ε2 ≈ 1/3, but you really want perfect partitioning.
Say Ŝ ⊆ {1, 2, . . . , n} is estimate of first group; about 1/3 of
them actually belong to second group.
People who are really in first group will have more edges with
people in Ŝ than people who are really in second group.
I
I
Use this fact to very accurately classify people.
(Technically, need independence, but can achieve this by
“sample splitting”.)
10
© Copyright 2026 Paperzz