Iterative Consensus Clustering

Introduction
Methodology
Adjustments
Results
Conclusions
Iterative Consensus Clustering
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
May 21, 2013
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Two Problems
There are countless algorithms in the literature for
clustering data, however there is generally no best method
even for a given class of data.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Two Problems
There are countless algorithms in the literature for
clustering data, however there is generally no best method
even for a given class of data.
To demonstrate we take a benchmark text dataset with
11,000 documents. 1,000 for each of 11 clusters and test 6
different algorithms on 3 subsets of documents.
Algorithm
PDDP
k -means
Subset 1 (ABCF)
0.333
0.462
Subset 2 (BCFG)
0.451
0.607
Subset 3 (GHI)
0.887
0.616
NMF
NCut
MinCut
PIC
0.450
0.498
0.503
0.495
0.595
0.269
0.273
0.266
0.893
0.871
0.869
0.651
Table: Accuracy of Algorithms on Subsets of Documents
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Two Problems
The vast majority (almost all) of these algorithms require
the user to input the number of clusters for the algorithm to
create.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Two Problems
The vast majority (almost all) of these algorithms require
the user to input the number of clusters for the algorithm to
create.
In an applied setting, this information is unlikely to be
known.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Two Problems
The vast majority (almost all) of these algorithms require
the user to input the number of clusters for the algorithm to
create.
In an applied setting, this information is unlikely to be
known.
We present a flexible framework which aims to solve both
these problems. Rather than focusing our energy on a new
algorithm to partition the existing data, we focus on
creating a new data structure which better reflects the
associative patterns of the data.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Similarity Matrices
Many clustering algorithms, particularly those of the “spectral”
variety, rely on a similarity matrix to draw cluster connections
between points.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Similarity Matrices
Many clustering algorithms, particularly those of the “spectral”
variety, rely on a similarity matrix to draw cluster connections
between points.
Matrix of pairwise similarities, S where Si,j measures some
notion of similarity between observations xi and xj .
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Similarity Matrices
Many clustering algorithms, particularly those of the “spectral”
variety, rely on a similarity matrix to draw cluster connections
between points.
Matrix of pairwise similarities, S where Si,j measures some
notion of similarity between observations xi and xj .
The most common similarity function for spectral clustering is
called the Gaussian similarity function:
Si,j = exp(−
kxi − xj k22
)
2σ 2
Where σ is a tuning parameter.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Similarity Matrices
Many clustering algorithms, particularly those of the “spectral”
variety, rely on a similarity matrix to draw cluster connections
between points.
Matrix of pairwise similarities, S where Si,j measures some
notion of similarity between observations xi and xj .
The most common similarity function for spectral clustering is
called the Gaussian similarity function:
Si,j = exp(−
kxi − xj k22
)
2σ 2
Where σ is a tuning parameter.
Our method uses a consensus matrix to describe similarity.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Similarity Matrix –> Adjacency Matrix
Any similarity matrix can be viewed as an adjacency matrix for nodes on an
undirected graph. The n data points act as nodes on the graph and edges are
drawn between nodes with weights from the similarity matrix.
A
B
C
Here, the thickness of the edge corresponds to the weight of the similarity.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Taking a Walk on the Graph
A
!%
B
!"
İ
İ
İ
!#
İ
İ
!$
"#
İ
"$
"#
#$
"$
#$
C
We can induce a random walk on the vertices of the graph by creating a
transition probability matrix, P from the similarity matrix, S as P = D−1 S where D
is a diagonal matrix containing the row sums of S.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Motivation
Graphs and Markov Chains
Counting the Number of Blocks
We can extract information about the number of blocks in the similarity matrix
from the eigenvalues of this transition probability matrix.
In fact, if there are exactly k blocks on the diagonal, we can expect to find exactly
k eigenvalues close to 1. Furthermore, if there is no “subcluster structure,”
meaning that none of the diagonal blocks further break down into meaningful
clusters, we should expect to see a relatively large gap between the magnitude
of the k th eigenvalue and the k + 1th eigenvalue.
İ
İ
İ
!#
İ
İ
!$
"#
"$
1
λi
!%
!"
"#
İ
"$
#$
#$
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
k=3
index, i
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
The Consensus Similarity Matrix
We’ll take an ensemble approach to clustering by using
many, say N, different clustering algorithms.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
The Consensus Similarity Matrix
We’ll take an ensemble approach to clustering by using
many, say N, different clustering algorithms.
These algorithms require the user to input the number of
clusters, k .
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
The Consensus Similarity Matrix
We’ll take an ensemble approach to clustering by using
many, say N, different clustering algorithms.
These algorithms require the user to input the number of
clusters, k .
We will choose 1 or more value for k , denoted
k̃ = [k̃1 , k̃2 , . . . , k̃J ], and use each of the N algorithms to
partition the data into k̃i clusters, i = 1 : J.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
The Consensus Similarity Matrix
We’ll take an ensemble approach to clustering by using
many, say N, different clustering algorithms.
These algorithms require the user to input the number of
clusters, k .
We will choose 1 or more value for k , denoted
k̃ = [k̃1 , k̃2 , . . . , k̃J ], and use each of the N algorithms to
partition the data into k̃i clusters, i = 1 : J.
The result is a set of JN clusterings.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
The Consensus Similarity Matrix
We’ll take an ensemble approach to clustering by using
many, say N, different clustering algorithms.
These algorithms require the user to input the number of
clusters, k .
We will choose 1 or more value for k , denoted
k̃ = [k̃1 , k̃2 , . . . , k̃J ], and use each of the N algorithms to
partition the data into k̃i clusters, i = 1 : J.
The result is a set of JN clusterings.
We will record these clusterings in a consensus matrix,
M, by setting Mij equal to the number of times observation i
was clustered with observation j.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Consensus Matrix Example
2
2
1
1
4
4
3
3
5
5
7
7
6
6
9
9
8
8
10
10
11
11
1
2
3
4
5
6
7
8
9
10
11
1
2
1

1

0

0

0

0

0

0

0
0

2
1
2
0
1
0
0
0
0
0
0
0
3
1
0
2
1
0
0
0
0
0
0
0
4
0
1
1
2
0
0
0
0
0
0
0
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
5
0
0
0
0
2
1
1
1
0
0
0
6
0
0
0
0
1
2
0
2
1
0
0
7
0
0
0
0
1
0
2
0
1
0
0
8
0
0
0
0
1
2
0
2
1
0
0
9
0
0
0
0
0
1
1
1
2
0
0
10
0
0
0
0
0
0
0
0
0
2
2
11

0
0

0

0

0

0

0

0

0

2
2
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Consensus Matrix Eigenvalues
Take a look at the eigenvalues of the transition probability matrix of the random walk
induced by the consensus matrix of our toy example.
2
1
8
6
4
2
0
ï2
1
2
3
4
5
6
7
8
9
10
11
As you can see, using k̃ = 5 and two clusterings, we have recovered the correct value
of k = 3 by counting the multiplicity of the eigenvalue 1.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Our Algorithm - Base version
Cluster the data matrix, X, using N different algorithms. For each
algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Our Algorithm - Base version
Cluster the data matrix, X, using N different algorithms. For each
algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters.
Choosing the k̃i ’s > k (over-estimating) may be best.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Our Algorithm - Base version
Cluster the data matrix, X, using N different algorithms. For each
algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters.
Choosing the k̃i ’s > k (over-estimating) may be best.
Form a Consensus Matrix, M, such that Mij is the number of
times xi was clustered with xj .
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Our Algorithm - Base version
Cluster the data matrix, X, using N different algorithms. For each
algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters.
Choosing the k̃i ’s > k (over-estimating) may be best.
Form a Consensus Matrix, M, such that Mij is the number of
times xi was clustered with xj .
Examine the eigenvalues of the probability transition matrix
P = D−1 M where D = diag(Me).
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Our Algorithm - Base version
Cluster the data matrix, X, using N different algorithms. For each
algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters.
Choosing the k̃i ’s > k (over-estimating) may be best.
Form a Consensus Matrix, M, such that Mij is the number of
times xi was clustered with xj .
Examine the eigenvalues of the probability transition matrix
P = D−1 M where D = diag(Me).
For computational considerations, eigenvalues can be computed using the
symmetric matrix D−1/2 MD−1/2 , which has the same eigenvalues as P.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Consensus Clustering
A Motivating Example
Algorithm
Our Algorithm - Base version
Cluster the data matrix, X, using N different algorithms. For each
algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters.
Choosing the k̃i ’s > k (over-estimating) may be best.
Form a Consensus Matrix, M, such that Mij is the number of
times xi was clustered with xj .
Examine the eigenvalues of the probability transition matrix
P = D−1 M where D = diag(Me).
For computational considerations, eigenvalues can be computed using the
symmetric matrix D−1/2 MD−1/2 , which has the same eigenvalues as P.
Count the number of eigenvalues near 1 (called the Perron
cluster of eigenvalues), observing a gap between λk and λk +1
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Drop Tolerance, τ
Many datasets contain a lot of noise so we can expect our
clustering algorithms to make mistakes.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Drop Tolerance, τ
Many datasets contain a lot of noise so we can expect our
clustering algorithms to make mistakes.
It is reasonable to expect that the majority of algorithms will
not make the same mistake.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Drop Tolerance, τ
Many datasets contain a lot of noise so we can expect our
clustering algorithms to make mistakes.
It is reasonable to expect that the majority of algorithms will
not make the same mistake.
We introduce a drop tolerance 0 ≤ τ < 1 for which we may
drop (set to zero) entries Mij in the consensus matrix if
Mij < τ JN.
i.e. to interpret τ = 0.1, we’d say “if xi and xj are clustered
together in fewer than 10% of the clusterings, then
disconnect them in the graph.”
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Medlars-Cranfield-CISI Document Collection
n = 3891 documents m = 11, 001 terms in dictionary. Using 9 different dimension
reductions each paired with 3 different clustering algorithms, we clustered this data into
k̃ = [2, 3, . . . , 10] clusters. The following eigenvalue plots show the “uncoupling” effect
of using a drop tolerance parameter of τ = 0.1.
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
λ
λ
i 0.5
i 0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2
4
6
8
10
12
14
16
18
20
0
0
2
4
6
8
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
10
i
i
Iterative Consensus Clustering
12
14
16
18
20
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Clustering the Consensus Matrix
The consensus matrix (M) is a similarity matrix. Thus we can
use it as input to our clustering algorithms.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Clustering the Consensus Matrix
The consensus matrix (M) is a similarity matrix. Thus we can
use it as input to our clustering algorithms.
In fact, the accuracy of the algorithms is generally higher when
clustering the consensus matrix as opposed to the raw data.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Clustering the Consensus Matrix
The consensus matrix (M) is a similarity matrix. Thus we can
use it as input to our clustering algorithms.
In fact, the accuracy of the algorithms is generally higher when
clustering the consensus matrix as opposed to the raw data.
Here we see evidence of this using the Medlars-Cranfield-CISI
collection and the consensus matrix from the previous slide.
Algorithm
PDDP
PDDP-kmeans
Raw Data
0.83
0.70
Cosine Similarity
0.85
0.70
Consensus Matrix
0.85
0.97
NMFCluster
0.51
0.65
0.97
kmeans
PIC
NCUT
NJW
0.71
-
0.70
0.89
0.96
0.85
0.97
0.73
0.96
0.96
Table: Accuracy of Algorithms on Different Input Matrices
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Drop Tolerance
Iteration
Iterated Consensus Clustering
We can iterate our approach as follows:
1
2
3
4
5
Cluster the original data matrix X using N different
algorithms, each to find k̃ = [k̃1 , k̃2 , . . . , k̃J ] clusters.
Form a consensus matrix, M1 with the JN clusterings,
setting entries M1ij = 0 if M1ij < τ JN.
Cluster the consensus matrix, M1 using N different
algorithms, each to find k̃ = [k̃1 , k̃2 , . . . , k̃J ] clusters.
Form a consensus matrix, M2 with the JN clusterings
setting entries M2ij = 0 if M2ij < τ JN.
Repeat steps 3-4 until a Perron cluster of eigenvalues
appears.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Using only k -means: 20 Newsgroups Subset
Subset of 20 Newsgroups document collection. 300 documents from each of 6 clusters
(1800 total). We used 3 dimension reductions and 10 iterations of k -means to find
k̃ = 20, 21, . . . , 30 clusters. Using no drop tolerance and iterating the procedure once,
we see a clear Perron cluster with k = 6 eigenvalues.
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
λ
0.6
λ
i 0.5
i 0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0
0.1
0
5
10
15
20
25
i
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
0
0
5
10
15
i
Iterative Consensus Clustering
20
25
Introduction
Methodology
Adjustments
Results
Conclusions
Conclusions
Our method succeeds at determining the number of
clusters in a wide range of datasets.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Conclusions
Our method succeeds at determining the number of
clusters in a wide range of datasets.
The user’s confidence in the final solution may be greater
when algorithms return the same answer.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Conclusions
Our method succeeds at determining the number of
clusters in a wide range of datasets.
The user’s confidence in the final solution may be greater
when algorithms return the same answer.
Using a drop tolerance and iteration, the method seems to
work quite well in the presence of noise.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Conclusions
Our method succeeds at determining the number of
clusters in a wide range of datasets.
The user’s confidence in the final solution may be greater
when algorithms return the same answer.
Using a drop tolerance and iteration, the method seems to
work quite well in the presence of noise.
The underlying ideas apply to any clustering algorithms the
user would prefer to use, thus the method is flexible for
scalability.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Conclusions
Our method succeeds at determining the number of
clusters in a wide range of datasets.
The user’s confidence in the final solution may be greater
when algorithms return the same answer.
Using a drop tolerance and iteration, the method seems to
work quite well in the presence of noise.
The underlying ideas apply to any clustering algorithms the
user would prefer to use, thus the method is flexible for
scalability.
The consensus matrix is a favorable alternative to
traditional similarity matrices for spectral clustering.
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering
Introduction
Methodology
Adjustments
Results
Conclusions
Thank you!
Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy
Iterative Consensus Clustering