Introduction Methodology Adjustments Results Conclusions Iterative Consensus Clustering Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy May 21, 2013 Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Two Problems There are countless algorithms in the literature for clustering data, however there is generally no best method even for a given class of data. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Two Problems There are countless algorithms in the literature for clustering data, however there is generally no best method even for a given class of data. To demonstrate we take a benchmark text dataset with 11,000 documents. 1,000 for each of 11 clusters and test 6 different algorithms on 3 subsets of documents. Algorithm PDDP k -means Subset 1 (ABCF) 0.333 0.462 Subset 2 (BCFG) 0.451 0.607 Subset 3 (GHI) 0.887 0.616 NMF NCut MinCut PIC 0.450 0.498 0.503 0.495 0.595 0.269 0.273 0.266 0.893 0.871 0.869 0.651 Table: Accuracy of Algorithms on Subsets of Documents Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Two Problems The vast majority (almost all) of these algorithms require the user to input the number of clusters for the algorithm to create. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Two Problems The vast majority (almost all) of these algorithms require the user to input the number of clusters for the algorithm to create. In an applied setting, this information is unlikely to be known. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Two Problems The vast majority (almost all) of these algorithms require the user to input the number of clusters for the algorithm to create. In an applied setting, this information is unlikely to be known. We present a flexible framework which aims to solve both these problems. Rather than focusing our energy on a new algorithm to partition the existing data, we focus on creating a new data structure which better reflects the associative patterns of the data. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Similarity Matrices Many clustering algorithms, particularly those of the “spectral” variety, rely on a similarity matrix to draw cluster connections between points. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Similarity Matrices Many clustering algorithms, particularly those of the “spectral” variety, rely on a similarity matrix to draw cluster connections between points. Matrix of pairwise similarities, S where Si,j measures some notion of similarity between observations xi and xj . Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Similarity Matrices Many clustering algorithms, particularly those of the “spectral” variety, rely on a similarity matrix to draw cluster connections between points. Matrix of pairwise similarities, S where Si,j measures some notion of similarity between observations xi and xj . The most common similarity function for spectral clustering is called the Gaussian similarity function: Si,j = exp(− kxi − xj k22 ) 2σ 2 Where σ is a tuning parameter. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Similarity Matrices Many clustering algorithms, particularly those of the “spectral” variety, rely on a similarity matrix to draw cluster connections between points. Matrix of pairwise similarities, S where Si,j measures some notion of similarity between observations xi and xj . The most common similarity function for spectral clustering is called the Gaussian similarity function: Si,j = exp(− kxi − xj k22 ) 2σ 2 Where σ is a tuning parameter. Our method uses a consensus matrix to describe similarity. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Similarity Matrix –> Adjacency Matrix Any similarity matrix can be viewed as an adjacency matrix for nodes on an undirected graph. The n data points act as nodes on the graph and edges are drawn between nodes with weights from the similarity matrix. A B C Here, the thickness of the edge corresponds to the weight of the similarity. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Taking a Walk on the Graph A !% B !" İ İ İ !# İ İ !$ "# İ "$ "# #$ "$ #$ C We can induce a random walk on the vertices of the graph by creating a transition probability matrix, P from the similarity matrix, S as P = D−1 S where D is a diagonal matrix containing the row sums of S. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Motivation Graphs and Markov Chains Counting the Number of Blocks We can extract information about the number of blocks in the similarity matrix from the eigenvalues of this transition probability matrix. In fact, if there are exactly k blocks on the diagonal, we can expect to find exactly k eigenvalues close to 1. Furthermore, if there is no “subcluster structure,” meaning that none of the diagonal blocks further break down into meaningful clusters, we should expect to see a relatively large gap between the magnitude of the k th eigenvalue and the k + 1th eigenvalue. İ İ İ !# İ İ !$ "# "$ 1 λi !% !" "# İ "$ #$ #$ Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy k=3 index, i Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm The Consensus Similarity Matrix We’ll take an ensemble approach to clustering by using many, say N, different clustering algorithms. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm The Consensus Similarity Matrix We’ll take an ensemble approach to clustering by using many, say N, different clustering algorithms. These algorithms require the user to input the number of clusters, k . Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm The Consensus Similarity Matrix We’ll take an ensemble approach to clustering by using many, say N, different clustering algorithms. These algorithms require the user to input the number of clusters, k . We will choose 1 or more value for k , denoted k̃ = [k̃1 , k̃2 , . . . , k̃J ], and use each of the N algorithms to partition the data into k̃i clusters, i = 1 : J. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm The Consensus Similarity Matrix We’ll take an ensemble approach to clustering by using many, say N, different clustering algorithms. These algorithms require the user to input the number of clusters, k . We will choose 1 or more value for k , denoted k̃ = [k̃1 , k̃2 , . . . , k̃J ], and use each of the N algorithms to partition the data into k̃i clusters, i = 1 : J. The result is a set of JN clusterings. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm The Consensus Similarity Matrix We’ll take an ensemble approach to clustering by using many, say N, different clustering algorithms. These algorithms require the user to input the number of clusters, k . We will choose 1 or more value for k , denoted k̃ = [k̃1 , k̃2 , . . . , k̃J ], and use each of the N algorithms to partition the data into k̃i clusters, i = 1 : J. The result is a set of JN clusterings. We will record these clusterings in a consensus matrix, M, by setting Mij equal to the number of times observation i was clustered with observation j. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Consensus Matrix Example 2 2 1 1 4 4 3 3 5 5 7 7 6 6 9 9 8 8 10 10 11 11 1 2 3 4 5 6 7 8 9 10 11 1 2 1 1 0 0 0 0 0 0 0 0 2 1 2 0 1 0 0 0 0 0 0 0 3 1 0 2 1 0 0 0 0 0 0 0 4 0 1 1 2 0 0 0 0 0 0 0 Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy 5 0 0 0 0 2 1 1 1 0 0 0 6 0 0 0 0 1 2 0 2 1 0 0 7 0 0 0 0 1 0 2 0 1 0 0 8 0 0 0 0 1 2 0 2 1 0 0 9 0 0 0 0 0 1 1 1 2 0 0 10 0 0 0 0 0 0 0 0 0 2 2 11 0 0 0 0 0 0 0 0 0 2 2 Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Consensus Matrix Eigenvalues Take a look at the eigenvalues of the transition probability matrix of the random walk induced by the consensus matrix of our toy example. 2 1 8 6 4 2 0 ï2 1 2 3 4 5 6 7 8 9 10 11 As you can see, using k̃ = 5 and two clusterings, we have recovered the correct value of k = 3 by counting the multiplicity of the eigenvalue 1. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Our Algorithm - Base version Cluster the data matrix, X, using N different algorithms. For each algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Our Algorithm - Base version Cluster the data matrix, X, using N different algorithms. For each algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters. Choosing the k̃i ’s > k (over-estimating) may be best. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Our Algorithm - Base version Cluster the data matrix, X, using N different algorithms. For each algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters. Choosing the k̃i ’s > k (over-estimating) may be best. Form a Consensus Matrix, M, such that Mij is the number of times xi was clustered with xj . Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Our Algorithm - Base version Cluster the data matrix, X, using N different algorithms. For each algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters. Choosing the k̃i ’s > k (over-estimating) may be best. Form a Consensus Matrix, M, such that Mij is the number of times xi was clustered with xj . Examine the eigenvalues of the probability transition matrix P = D−1 M where D = diag(Me). Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Our Algorithm - Base version Cluster the data matrix, X, using N different algorithms. For each algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters. Choosing the k̃i ’s > k (over-estimating) may be best. Form a Consensus Matrix, M, such that Mij is the number of times xi was clustered with xj . Examine the eigenvalues of the probability transition matrix P = D−1 M where D = diag(Me). For computational considerations, eigenvalues can be computed using the symmetric matrix D−1/2 MD−1/2 , which has the same eigenvalues as P. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Consensus Clustering A Motivating Example Algorithm Our Algorithm - Base version Cluster the data matrix, X, using N different algorithms. For each algorithm, partition the data into k̃1 , k̃2 , . . . , k̃J clusters. Choosing the k̃i ’s > k (over-estimating) may be best. Form a Consensus Matrix, M, such that Mij is the number of times xi was clustered with xj . Examine the eigenvalues of the probability transition matrix P = D−1 M where D = diag(Me). For computational considerations, eigenvalues can be computed using the symmetric matrix D−1/2 MD−1/2 , which has the same eigenvalues as P. Count the number of eigenvalues near 1 (called the Perron cluster of eigenvalues), observing a gap between λk and λk +1 Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Drop Tolerance, τ Many datasets contain a lot of noise so we can expect our clustering algorithms to make mistakes. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Drop Tolerance, τ Many datasets contain a lot of noise so we can expect our clustering algorithms to make mistakes. It is reasonable to expect that the majority of algorithms will not make the same mistake. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Drop Tolerance, τ Many datasets contain a lot of noise so we can expect our clustering algorithms to make mistakes. It is reasonable to expect that the majority of algorithms will not make the same mistake. We introduce a drop tolerance 0 ≤ τ < 1 for which we may drop (set to zero) entries Mij in the consensus matrix if Mij < τ JN. i.e. to interpret τ = 0.1, we’d say “if xi and xj are clustered together in fewer than 10% of the clusterings, then disconnect them in the graph.” Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Medlars-Cranfield-CISI Document Collection n = 3891 documents m = 11, 001 terms in dictionary. Using 9 different dimension reductions each paired with 3 different clustering algorithms, we clustered this data into k̃ = [2, 3, . . . , 10] clusters. The following eigenvalue plots show the “uncoupling” effect of using a drop tolerance parameter of τ = 0.1. 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 λ λ i 0.5 i 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 14 16 18 20 0 0 2 4 6 8 Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy 10 i i Iterative Consensus Clustering 12 14 16 18 20 Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Clustering the Consensus Matrix The consensus matrix (M) is a similarity matrix. Thus we can use it as input to our clustering algorithms. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Clustering the Consensus Matrix The consensus matrix (M) is a similarity matrix. Thus we can use it as input to our clustering algorithms. In fact, the accuracy of the algorithms is generally higher when clustering the consensus matrix as opposed to the raw data. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Clustering the Consensus Matrix The consensus matrix (M) is a similarity matrix. Thus we can use it as input to our clustering algorithms. In fact, the accuracy of the algorithms is generally higher when clustering the consensus matrix as opposed to the raw data. Here we see evidence of this using the Medlars-Cranfield-CISI collection and the consensus matrix from the previous slide. Algorithm PDDP PDDP-kmeans Raw Data 0.83 0.70 Cosine Similarity 0.85 0.70 Consensus Matrix 0.85 0.97 NMFCluster 0.51 0.65 0.97 kmeans PIC NCUT NJW 0.71 - 0.70 0.89 0.96 0.85 0.97 0.73 0.96 0.96 Table: Accuracy of Algorithms on Different Input Matrices Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Drop Tolerance Iteration Iterated Consensus Clustering We can iterate our approach as follows: 1 2 3 4 5 Cluster the original data matrix X using N different algorithms, each to find k̃ = [k̃1 , k̃2 , . . . , k̃J ] clusters. Form a consensus matrix, M1 with the JN clusterings, setting entries M1ij = 0 if M1ij < τ JN. Cluster the consensus matrix, M1 using N different algorithms, each to find k̃ = [k̃1 , k̃2 , . . . , k̃J ] clusters. Form a consensus matrix, M2 with the JN clusterings setting entries M2ij = 0 if M2ij < τ JN. Repeat steps 3-4 until a Perron cluster of eigenvalues appears. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Using only k -means: 20 Newsgroups Subset Subset of 20 Newsgroups document collection. 300 documents from each of 6 clusters (1800 total). We used 3 dimension reductions and 10 iterations of k -means to find k̃ = 20, 21, . . . , 30 clusters. Using no drop tolerance and iterating the procedure once, we see a clear Perron cluster with k = 6 eigenvalues. 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 λ 0.6 λ i 0.5 i 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 0.1 0 5 10 15 20 25 i Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy 0 0 5 10 15 i Iterative Consensus Clustering 20 25 Introduction Methodology Adjustments Results Conclusions Conclusions Our method succeeds at determining the number of clusters in a wide range of datasets. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Conclusions Our method succeeds at determining the number of clusters in a wide range of datasets. The user’s confidence in the final solution may be greater when algorithms return the same answer. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Conclusions Our method succeeds at determining the number of clusters in a wide range of datasets. The user’s confidence in the final solution may be greater when algorithms return the same answer. Using a drop tolerance and iteration, the method seems to work quite well in the presence of noise. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Conclusions Our method succeeds at determining the number of clusters in a wide range of datasets. The user’s confidence in the final solution may be greater when algorithms return the same answer. Using a drop tolerance and iteration, the method seems to work quite well in the presence of noise. The underlying ideas apply to any clustering algorithms the user would prefer to use, thus the method is flexible for scalability. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Conclusions Our method succeeds at determining the number of clusters in a wide range of datasets. The user’s confidence in the final solution may be greater when algorithms return the same answer. Using a drop tolerance and iteration, the method seems to work quite well in the presence of noise. The underlying ideas apply to any clustering algorithms the user would prefer to use, thus the method is flexible for scalability. The consensus matrix is a favorable alternative to traditional similarity matrices for spectral clustering. Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering Introduction Methodology Adjustments Results Conclusions Thank you! Shaina Race, Dr. Carl Meyer, and Kevin Valakuzhy Iterative Consensus Clustering
© Copyright 2026 Paperzz