A General Model for Relational Clustering Bo Long and Zhongfei (Mark) Zhang Computer Science Dept./Watson School SUNY Binghamton Xiaoyun Wu Yahoo! Inc. Philip S.Yu IBM Watson Research Center Multi-type Relational Data (MTRD) is Everywhere! Bibliometrics Papers, authors, journals Key words Social networks People, institutions, friendship links Biological data Papers Authors Genes, proteins, conditions Corporate databases Customers, products, suppliers, shareholders Challenges for Clustering! Data objects are not identically distributed: Heterogeneous data objects (papers, authors). Data objects are not independent Heterogeneous data objects are related to each other. No IID assumption Relational Data Flat Data? Paper ID word1 word2 …… author1 author2 …… ………… ………. ……. 1 1 3 …… 1 0 …… ……… ……….. …….. …… …… ……. ……. …… ……. …… ……… ………. …….. Author ID Paper 1 Paper 2 …… ………… ………. ……. 1 1 0 …… ……… ……….. …….. …… …… ……. …… ……… ………. …….. Paper 2 …… ………… ………. ……. Key words Authors Word ID Paper 1 1 1 3 …… ……… ……….. …….. …… …… ……. ……. ……… ………. …….. High dimensional and sparse data Data redundancy Papers Relational Data Flat Data? No interactions of hidden structures of different types of data objects Difficult to discover the global community structure. users queries Web pages A General Model: Collective Factorization on Related Matrices Formulate multi-type relational data as a set of related matrices; cluster different types of objects simultaneously by factorizing the related matrices simultaneously. Make use of the interaction of hidden structures of different types of objects. Data Representation Represent a MTRD set as a set of related matrices: Relation matrix, R(ij), denotes the relations between ith type of objects and jth type of objects. Feature matrix, F(i), denotes the feature values for ith type of objects. F(1) Authors f Users (12) R (12) R Papers R (23) Movies Words Matrix Factorization Exploring the hidden structure of the data matrix by Cluster Feature its factorization: F C B (i ) . (i ) association matrix basis matrix (i ) R C A (C ) ( ij ) (i ) ( ij ) ( j) T Model: Collective Factorization on Related Matrices (CFRM) C min (i ) ( ij ) ,B , A( ij ) w 1i j m b (i ) w 1i j m a ( ij ) || R (i ) ( ij ) C A (C ) || || F C B || (i ) (i ) (i ) 2 ( ij ) ( j) T 2 CFRM Model: Example 1 2 3 f wb || F C B || (1) wa (12) wa ( 23) (1) (1) (1) || R (12) C A || R ( 23) C A (1) ( 2) 2 (12) (C ) || ( 23) ( 2) T ( 3) T 2 (C ) || 2 Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the data Obtain data representation in the lowdimensional space that can be easily clustered Traditional spectral clustering focuses on homogeneous data Main Theorem: C (i ) min ( ij ) ,B ,A ( ij ) w 1i j m wb || F (i ) (i ) ( ij ) a || R ( ij ) C A (C ) || (i ) ( ij ) ( j) T (i ) C B (i ) ||2 1i j m (C max( i ) (i ) T ) C w 1i j m a I ki ( ij ) w 1i m b (i ) (i ) T (i ) (i ) T tr ((C ) F ( F ) C (i ) T ( ij ) ( j) ( j) T (i ) tr ((C ) R C (C ) C ) (i ) 2 Algorithm Derivation: Iterative Updating (C max ( p) T ( p) ) C I p tr((C ( p) T ) M ( p) C ( p) ) where, M ( p) wb wa ( p) ( F ( p ) ( F ( p ) )T ) ( pj ) ( R ( pj ) C ( j ) (C ( j ) )T ( R ( pj ) )T ) ( jp ) (( R ( jp ) )T C ( j ) (C ( j ) )T R ( pj ) ) p j m wa 1 j p Spectral Relaxation (C max ( p) T ( p) ) C I p tr((C ( p) T ) M ( p) C ( p) ) Apply real relaxation to C(p) to let it be an arbitrary orthornormal matrix. By Ky-Fan Theorem, the optimal solution is given by the leading kp eigenvectors of M(p). Spectral Relational Clustering (SRC) Spectral Relational Clustering: Example Update C (1) as k1 leading eigenvectors of M 1 (1) wa (12) wa 3 ( 2) ( 2) T C (C ) ( R (12) T ) Update C (2) as k2 leading eigenvectors of M ( 2 ) wa 2 R (12) ( 23) (12) ( R (12) )T C (1) (C (1) )T R (12) R ( 23)C (3) (C (3) )T ( R ( 23) )T Update C (3) as k3 leading eigenvectors of M (3) wa ( 23) ( R( 23) )T C ( 2) (C ( 2) )T R( 23) Advantages of Spectral Relational Clustering (SRC) Simple as traditional spectral approaches Applicable to relational data with various structures. Adaptive low dimension embedding Efficient: O(tmn2k). For sparse data, it is reduced to O(tmzk) where z denotes the number of non-zero elements Special case 1: k-means and spectral clustering Flat data: a special MTRD with only one feature matrix F, min || F CB ||2 C ,B By the main theorem, k-means is equivalent to the trace maximization, T T max tr ( C FF C ) T c c Ik Special case 2: Bipartite Spectral Graph Partitioning (BSGP) Bipartite graph: a special case of MTRD with one relation matrix R, min(1) ( C ) C I k1 ( C ( 2 ) ) T C ( 2 ) I k2 (1) T || R C A(C ) || (1) ( 2) T 2 BSGP restricts the clusters of different types of objects to have one-to-one associations, i.e., diagonal constraints on A. Experiments Bi-type relational data: Tri-type relational data: Document-word data Category-document-word data. Comparing algorithms: Normalized Cut (NC), Bipartite Spectral Graph Partitioning (BSGP), Mutual Reinforcement K-means (MRK) Consistent Bipartite Graph Co-partitioning (CBGC). Experimental Results on Bi-type Relational Data NMI Comparisons on Bi-type Relational Data 0.8 Normalized Mutual Information 0.7 0.6 SRC NC BSGP 0.5 0.4 0.3 0.2 0.1 0 Multi2 Multi3 Multi5 Multi8 Multi10 Eigenvectors of a multi2 data set u 1 NC BSGP 0 1 -0.5 0.5 -1 -1 0 u2 SRC 1 Objective Value u 1 0 -0.5 -1 -1 0 0 u2 1 0 0.2 0.4 u2 Convergence 2 1 0 0 5 10 Number of iterations Experimental Results on Tri-type Relational Data NMI Comparisons on Tri-type Relational Data Normalized Mutual Information 1 SRC MRK CBGC 0.8 0.6 0.4 0.2 0 BRM TM1 TM2 TM3 Summary Collective Factorization on Related Matrices– a general model for MTRD clustering. Spectral Relational Clustering– A novel spectral approach Simple and applicable to relational data with various structures. Adaptive low dimension embedding Efficient Theoretic analysis and experiments demonstrate the effectiveness and the promise of the model and of the algorithm.
© Copyright 2026 Paperzz