Nonnegative Matrix Factorization: Algorithms and Applications Haesun Park [email protected] School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, USA SIAM International Conference on Data Mining, April, 2011 This work was supported in part by the National Science Foundation. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Co-authors Jingu Kim CSE, Georgia Tech Yunlong He Math, Georgia Tech Da Kuang CSE, Georgia Tech Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Outline Overview of NMF Fast algorithms with Frobenius norm Theoretical results on convergence Multiplicative updating Alternating nonnegativity constrained least Squares: Active-set type methods, ... Hierarchical alternating least squares Variations/Extensions of NMF : sparse NMF, regularized NMF, nonnegative PARAFAC Efficient adaptive NMF algorithms Applications of NMF, NMF for Clustering Extensive computational results Discussions Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Nonnegative Matrix Factorization (NMF) (Lee&Seung 99, Paatero&Tapper 94) Given A ∈ R+ m×n and a desired rank k << min(m, n), find W ∈ R+ m×k and H ∈ R+ k ×n s.t. A ≈ WH. minW ≥0,H≥0 kA − WHkF Nonconvex W and H not unique ( e.g. Ŵ = WD ≥ 0, Ĥ = D −1 H ≥ 0) Notation: R+ : nonnegative real numbers Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Nonnegative Matrix Factorization (NMF) (Lee&Seung 99, Paatero&Tapper 94) Given A ∈ R+ m×n and a desired rank k << min(m, n), find W ∈ R+ m×k and H ∈ R+ k ×n s.t. A ≈ WH. minW ≥0,H≥0 kA − WHkF NMF improves the approximation as k increases: If rank+ (A) > k , min Wk +1 ≥0,Hk +1 ≥0 kA − Wk +1 Hk +1 kF < min Wk ≥0,Hk ≥0 kA − Wk Hk kF , Wi ∈ R+ m×i and Hi ∈ R+ i×n But SVD does better: if A = UΣV T , then kA − Uk Σk VkT kF ≤ minkA − WHkF , W ∈ R+ m×k and H ∈ R+ k ×n So Why NMF? Dimension Reduction with Better Interpretation/Lower Dim. Representation for Nonnegative Data. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Nonnegative Rank of A ∈ R+ m×n (J. Cohen and U. Rothblum, LAA, 93) rank+ (A), is the smallest integer k for which there exist V ∈ R+ m×k and U ∈ R+ k ×n such that A = VU. Note: rank(A) ≤ rank+ (A) ≤ min(m, n) If rank(A) ≤ 2, then rank+ (A) = rank(A). If either m ∈ {1, 2, 3} or n ∈ {1, 2, 3}, then rank+ (A) = rank(A). Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Nonnegative Rank of A ∈ R+ m×n (J. Cohen and U. Rothblum, LAA, 93) rank+ (A), is the smallest integer k for which there exist V ∈ R+ m×k and U ∈ R+ k ×n such that A = VU. Note: rank(A) ≤ rank+ (A) ≤ min(m, n) If rank(A) ≤ 2, then rank+ (A) = rank(A). If either m ∈ {1, 2, 3} or n ∈ {1, 2, 3}, then rank+ (A) = rank(A). (Perron-Frobenius) There are nonnegative left and right singular vectors u1 and v1 of A associated with the largest singular value σ1 . rank 1 SVD of A = best rank-one NMF of A Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Applications of NMF Text mining Topic model: NMF as an alternative way for PLSI ( Gaussier et al., 05; Ding et al., 08) Document clustering (Xu et al., 03; Shahnaz et al., 06) Topic detection and trend tracking, email analysis (Berry et al., 05; Keila et al., 05; Cao et al., 07) Image analysis and computer vision Feature representation, sparse coding (Lee et al., 99; Guillamet et al., 01; Hoyer et al., 02; Li et al. 01) Video tracking (Bucak et al., 07) Social network Community structure and trend detection ( Chi et al., 07; Wang et al., 08) Recommendation system (Zhang et al., 06) Bioinformatics-microarray data analysis (Brunet et al., 04, H. Kim and Park, 07) Acoustic signal processing, blind source separating (Cichocki et al., 04) Financial data (Drakakis et al., 08) Chemometrics (Andersson and Bro, 00) and SO MANY MORE... Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Algorithms for NMF Multiplicative update rules: Lee and Seung, 99 Alternating least squares (ALS): Berry et al 06 Alternating nonnegative least squares (ANLS) Lin, 07, Projected gradient descent D. Kim et al., 07, Quasi-Newton H. Kim and Park, 08, Active-set J. Kim and Park, 08, Block principal pivoting Other algorithms and variants Cichocki et al., 07, Hierarchical ALS (HALS) Ho, 08, Rank-one Residue Iteration (RRI) Zdunek, Cichocki, Amari 06, Quasi-Newton Chu and Lin, 07, Low dim polytope approx. Other rank-1 downdating based algorithms (Vavasis,..) C. Ding, T. Li, tri-factor NMF, orthogonal NMF, ... Cichocki, Zdunek, Phan, Amari: NMF and NTF: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, Wiley, 09 Andersson and Bro, Nonnegative Tensor Factorization, 00 And SO MANY MORE... Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Block Coordinate Descent (BCD) Method A constrained nonlinear problem: min f (x)(e.g., f (W , H) = kA − WHkF ) subject to x ∈ X = X1 × X2 × · · · × Xp , where x = (x1 , x2 , . . . , xp ), xi ∈ Xi ⊂ Rni , i = 1, . . . , p. Block Coordinate Descent method generates (k +1) (k +1) x (k +1) = (x1 , . . . , xp ) by (k +1) xi (k +1) = arg min f (x1 ξ∈Xi Haesun Park [email protected] (k +1) (k ) (k ) , . . . , xi−1 , ξ, xi+1 , . . . , xp ). Nonnegative Matrix Factorization: Algorithms and Applications Block Coordinate Descent (BCD) Method A constrained nonlinear problem: min f (x)(e.g., f (W , H) = kA − WHkF ) subject to x ∈ X = X1 × X2 × · · · × Xp , where x = (x1 , x2 , . . . , xp ), xi ∈ Xi ⊂ Rni , i = 1, . . . , p. Block Coordinate Descent method generates (k +1) (k +1) x (k +1) = (x1 , . . . , xp ) by (k +1) xi (k +1) = arg min f (x1 ξ∈Xi (k +1) (k ) (k ) , . . . , xi−1 , ξ, xi+1 , . . . , xp ). Th. (Bertsekas, 99): Suppose f is continuously differentiable over the Cartesian product of closed, convex sets X1 , X2 , . . . , Xp and suppose for each i and x ∈ X , the minimum for (k +1) min f (x1 ξ∈Xi (k +1) (k ) (k ) , . . . , xi−1 , ξ, xi+1 , . . . , xp ) is uniquely attained. Then every limit point of the sequence generated by the BCD method {x (k ) } is a stationary point. NOTE: Uniqueness not required when p = 2 (Grippo and Sciandrone, 00). Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications BCD with k (m + n) Scalar Blocks H A W Minimize functions of wij or hij while all other components in W and H are fixed: X wik hkT ) − wij hjT k2 wij ← arg min k(riT − wij ≥0 hij k 6=j ← arg min k(aj − hij ≥0 X wk hkj ) − wi hij k2 k 6=i h1T where W = w1 · · · wk , H = ... and hkT T r1 A = a1 · · · an = ... rmT Scalar quadratic function, closed form solution. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications BCD with k (m + n) Scalar Blocks Lee and Seung (01)’s multiplicative updating (MU) rule wij ← wij (AH T )ij (W T A)ij , hij ← hij (WHH T )ij (W T WH)ij Derivation based on gradient-descent form: h i wij T T wij ← wij + (AH ) − (WHH ) ij ij (WHH T )ij h i hij (W T A)ij − (W T WH)ij hij ← hij + T (W WH)ij Rewriting of the solution of coordinate descent: 1 T T wij ← wij + (AH )ij − (WHH )ij (HH T )jj + 1 T T (W A)ij − (W WH)ij hij ← hij + (W T W )ii + In MU, conservative steps are taken to ensure nonnegativity. Bertsekas’ Th. on convergence is not applicable to MU. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications BCD with 2k Vector Blocks H A W Minimize functions of wi or hi while all other components in W and H are fixed: kA − k X wj hjT kF j=1 = k(A − k X wj hjT ) − wi hiT kF = kR (i) − wi hiT kF j=1 j6=i wi ← arg min kR (i) − wi hiT kF hi ← arg min kR (i) − wi hiT kF wi ≥0 hi ≥0 Each subproblem has the form minx≥0 kcx T − GkF and T has a closed form solution x = [ Gc T cc ]+ ! Hierarchical Alternating Least Squares (HALS) (Cichocki et al, 07, 09), (actually HA-NLS) Rank-one Residue Iteration (RRI) (Ho, 08) Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications BCD with Scalar Blocks vs. 2k Vector Blocks H A W H W A In scalar BCD, w1j , w2j , · · · , wmj can be computed independently. Also, hi1 , hi2 , · · · , hin can be computed independently. → scalar BCD ⇔ 2k vector BCD in NMF Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Successive Rank-1 Deflation in SVD and NMF Successive rank-1 deflation works for SVD but not for NMF A − σ1 u1 v1T ≈ σ2 u2 v2T ? A − w1 h1T ≈ w2 h2T ? 0 4 @ 6 0 6 4 0 0 1 0 B A 0 =@ 1 1 √ 2 1 √ 2 0 − √1 1 √ 2 0 2 0 10 C 0 A@ 1 10 0 0 0 2 0 10 0 B 0 A@ 1 1 √ 2 1 √ 2 0 1 √ 2 − √1 2 0 0 1 C 0 A 1 The best nonnegative approx. is sum of two successive rank-1 4 6 0 5 5 0 0 0 0 6 4 0 ≈ 5 5 0 + 0 0 0 0 0 1 0 0 0 0 0 1 The best rank-2 nonnegative approx. is 4 6 0 4 6 1 0 0 6 4 0 6 4 WH = = 0 1 0 0 0 0 0 0 NOTE: 2k vector BCD 6= successive rank-1 deflation for NMF Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications BCD with 2 Matrix Blocks H A W Minimize functions of W or H while the other is fixed: W ← arg min kH T W T − AT kF W ≥0 H ← arg min kWH − AkF H≥0 Alternating Nonnegativity-constrained Least Squares (ANLS) No closed form solution. Projected gradient method (Lin, 07) Projected quasi-Newton method (D. Kim et al., 07) Active-set method (H. Kim and Park, 08) Block principal pivoting method (J. Kim and Park, 08) ALS (M. Berry et al. 06) ?? Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications NLS : minX ≥0 kCX − Bk2F = P minxi kCxi − bi k22 Nonnegativity-constrained Least Squares (NLS) problem Projected Gradient method (Lin, 07) x (k +1) ← P+ (x (k ) − αk ∇f (x (k ) )) * P+ (·): Projection operator to the nonnegative orthant * Back-tracking selection of step αk ProjectedQuasi-Newton (Kim et al., 07) method (k ) − αD (k ) ∇f (y (k ) ) y y P + (k +1) x ← = zk 0 * Gradient scaling only for nonzero variables These do not fully exploit the structure of the NLS problems in NMF Active Set method (H. Kim and Park, (08) Lawson and Hanson (74), Bro and De Jong (97), Van Benthem and Keenan (04) ) Block principal pivoting method (J. Kim and Park, 08) linear complementarity problems (LCP) (Judice and Pires, 94) Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Active-set type Algorithms for minx≥0 kCx − bk2 , C : m × k KKT conditions: y = C T Cx − C T b y ≥ 0, x ≥ 0, xi yi = 0, i = 1, · · · , k If we know P = {i|xi > 0} in the solution in advance then we only need to solve min kCP xP − bk2 , and the rest of xi = 0, where CP : columns of C with the indices in P C * x b + + 0 0 + Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Active-set type Algorithms for minx≥0 kCx − bk2 , C : m × k KKT conditions: y = C T Cx − C T b y ≥ 0, x ≥ 0, xi yi = 0, i = 1, · · · , k Active set method (Lawson and Hanson 74) E = {1, · · · , k } (i.e. x = 0 initially), P = null Repeat while E not null and yi < 0 for some i Exchange indices between E and P while keeping feasibility and reducing the objective function value Block Principal Pivoting method (Portugal et al. 94 MathComp): Lacks any monotonicity or feasibility but finds a correct active-passive set partitioning. Guess two index sets P and E that partition {1, · · · , k } Repeat Let xE = 0 and xP = arg minxP kCP xP − bk22 Then yE = CET (CP xP − b) and yP = 0 If xP ≥ 0 and yE ≥ 0, then optimal values are found. Otherwise, update P and E. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works k = 10, Initially P = {1, 2, 3, 4, 5}, E = {6, 7, 8, 9, 10} Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x y P 0 P 0 P 0 P 0 P 0 E 0 E 0 E 0 E 0 E 0 Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x y P + 0 P - 0 P - 0 P + 0 P - 0 E 0 - E 0 + E 0 - E 0 + E 0 + Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x x y P + 0 P P - 0 E 0 P - 0 E 0 P + 0 P P - 0 E E 0 - P E 0 + E E 0 - P E 0 + E 0 E 0 + E 0 y 0 0 0 0 0 0 Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x y x y P + 0 P + 0 P - 0 E 0 + P - 0 E 0 - P + 0 P + 0 P - 0 E 0 + E 0 - P - 0 E 0 + E 0 + E 0 - P + 0 E 0 + E 0 + E 0 + E 0 + Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x y P + 0 P - P P x y x y P + 0 P 0 E 0 + E - 0 E 0 - P 0 + 0 P + 0 P 0 P - 0 E 0 + E 0 E 0 - P - 0 E 0 E 0 + E 0 + E 0 E 0 - P + 0 P E 0 + E 0 + E 0 E 0 + E 0 + E 0 Haesun Park [email protected] 0 0 0 Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x y P + 0 + E 0 + 0 - P + 0 P + 0 P + 0 0 E 0 + E 0 + - P - 0 E 0 + 0 + E 0 + E 0 + E 0 - P + 0 P + 0 E 0 + E 0 + E 0 + E 0 + E 0 + E 0 + x y x y P + 0 P + 0 P - 0 E 0 P - 0 E P + 0 P - E 0 E Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications How block principal pivoting works Update by CPT CP xP = CPT b, and yE = CET (CP xP − b) x y P + 0 + E 0 + 0 - P + 0 P + 0 P + 0 0 E 0 + E 0 + 0 - P - 0 E 0 + 0 + E 0 + E 0 + E 0 - P + 0 P + 0 E 0 + E 0 + E 0 + E 0 + E 0 + E 0 + x y x y P + 0 P + 0 P - 0 E 0 P - 0 E P + 0 P - E E Solved! Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Refined Exchange Rules Active set algorithm is a special instance of single principal pivoting algorithm (H. Kim and Park, SIMAX 08) Block exchange rule without modification does not always work. The residual is not guaranteed to monotonically decrease. Block exchange rule may cycle (although rarely). Modification: if the block exchange rule fails to decrease the number of infeasible variables, use a backup exchange rule With this modification, block principal pivoting algorithm finds the solution of NLS in a finite number of iterations. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Structure of NLS problems in NMF (J. Kim and Park, 08) Matrix is long and thin, solutions vectors short, many right hand side vectors. minH≥0 kWH − Ak2F 2 minW ≥0 H T W T − AT F Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Efficient Algorithm for minX ≥0 kCX − Bk2F (J. Kim and Park, 08) Precompute C T C and C T B Update xP and yE by CPT CP xP = CPT b and yE = CET CP xP − CET b All coefficients can be retrieved from C T C and C T B C T C and C T B is small. Storage is not a problem. → Exploit common P and E sets among col. in B in each iteration. X is flat and wide. → More common cases of P and E sets. Proposed algorithm for NMF (ANLS/BPP): ANLS framework + Block principal pivoting algorithm for NLS with improvements for multiple right-hand sides Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Sparse NMF and Regularized NMF Sparse NMF (for sparse H) (H. Kim and Park, Bioinformatics, 07) n X 2 2 2 min kA − WHkF + η kW kF + β kH(:, j)k1 , ∀ij, Wij , Hij ≥ 0 W ,H j=1 ANLS reformulation (H. Kim and Park, 07) : alternate the following 2 A √W min H− βe1×k 01×n F H≥0 2 AT HT WT − min √ ηIk 0k ×m F W ≥0 Regularized NMF (Pauca, et al. 06): n o min kA − WHk2F + η kW k2F + β kHk2F , ∀ij, Wij , Hij ≥ 0. W ,H ANLS reformulation : alternate the following 2 A √W min H− βI 0 H≥0 k k ×n F T T 2 A H T √ W − min ηIk 0k ×m F W ≥0 Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Nonnegative PARAFAC Consider a 3-way Nonnegative Tensor T ∈ Rm×n×p and + 2 its PARAFAC minA,B,C≥0 kT − [[ABC]]kF p×k where A ∈ Rm×k , B ∈ Rn×k + + , C ∈ R+ . The loading matrices (A,B, and C) can be iteratively estimated by an NLS algorithm such as block principal pivoting method. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Nonnegative PARAFAC (J. Kim and Park, in preparation) Iterate until a stopping criteria is satisfied: minA≥0 YBC AT − T(1) F minB≥0 YAC B T − T(2) F minC≥0 YAB C T − T(3) F where YBC = B C ∈ R(np)×k , T(1) ∈ R(np)×m , YAC = A C ∈ R(mp)×k , T(2) ∈ R(mp)×n , YAB = A B ∈ R(mn)×k , T(3) ∈ R(mn)×p unfolded matrices, and F G(mn)×(k ) = [f1 ⊗ g1 f2 ⊗ g2 · · · fk ⊗ gk ] is the Khatri-Rao product of F ∈ Rm×k and G ∈ Rn×k . Matrices are longer and thinner, ideal for ANLS/BPP. Can be similarly extended to higher order tensors. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Experimental Results (NMF) (J. Kim and Park, 2011) NMF Algorithms Compared Name ANLS-BPP ANLS-AS ANLS-PGRAD ANLS-PQN HALS MU ALS Description ANLS / block principal pivoting ANLS / active set ANLS / projected gradient ANLS / projected quasi-Newton Hierarchical ALS Multiplicative updating Alternating least squares Haesun Park [email protected] Author J. Kim and HP 08 H. Kim and HP 08 Lin 07 D. Kim et al. 07 Cichocki et al. 07 Lee and Seung 01 Berry et al. 06 Nonnegative Matrix Factorization: Algorithms and Applications Active-set vs. Block principal pivoting (J. Kim and Park, 2011) ATNT, k=10 TDT2, k=160 3 10 ANLS−BPP ANLS−AS−UPDATE ANLS−AS−GROUP ANLS−BPP ANLS−AS−UPDATE ANLS−AS−GROUP 2 elapsed elapsed 10 0 10 1 10 0 0 10 20 30 40 50 iter 60 70 80 90 10 100 ATNT, k=10 2 0 10 20 30 40 60 70 80 90 100 TDT2, k=160 4 10 50 iter 10 ANLS−BPP ANLS−AS−UPDATE ANLS−AS−GROUP 3 10 1 elapsed elapsed 10 ANLS−BPP ANLS−AS−UPDATE ANLS−AS−GROUP 2 10 0 10 1 10 −1 10 0 0 10 20 30 40 50 iter 60 70 80 90 100 10 0 10 20 30 40 50 iter 60 70 80 90 100 ATNT image data: 10, 304 × 400, k = 10, TDT2 text data:19, 009 × 3, 087, k = 160 Top:time per iteration, bottom:cumulative time Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Residual vs. Execution time (J. Kim and Park, 2011) TDT2, k=10 TDT2, k=160 HALS MU ALS ANLS−PGRAD ANLS−PQN ANLS−BPP 0.9 0.89 0.65 0.64 0.63 relative obj. value relative obj. value 0.88 0.87 0.86 HALS MU ALS ANLS−PGRAD ANLS−PQN ANLS−BPP 0.62 0.61 0.6 0.85 0.59 0.84 0.58 0 10 20 30 40 50 60 time(sec) 70 80 90 100 0 100 200 300 400 time(sec) 500 600 700 TDT2 text data: 19, 009 × 3, 087, k = 10 and k = 160 Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Residual vs. Execution time (J. Kim and Park, 2011) ATNT, k=10 20 Newsgroups, k=160 HALS MU ALS ANLS−PGRAD ANLS−PQN ANLS−BPP 0.3 0.29 0.27 0.54 0.26 0.25 0.24 HALS MU ALS ANLS−PGRAD ANLS−PQN ANLS−BPP 0.53 relative obj. value relative obj. value 0.28 0.55 0.23 0.52 0.51 0.5 0.49 0.22 0.48 0.21 0.2 0.47 0 10 20 30 40 50 60 time(sec) 70 80 90 100 0 100 200 300 400 time(sec) 500 600 700 ATNT image data: 10, 304 × 400, k = 10 and 20 Newsgroups text data: 26, 214 × 11, 314, k = 160 Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Residual vs. Execution time (J. Kim and Park, 2011) PIE 64, k=160 PIE 64, k=80 0.35 HALS MU ANLS−PGRAD ANLS−PQN ANLS−BPP 0.4 0.3 HALS MU ALS ANLS−PGRAD ANLS−PQN ANLS−BPP 0.3 relative obj. value relative obj. value 0.35 0.25 0.25 0.2 0.2 0.15 0.15 0 50 100 150 time(sec) 200 250 300 0 100 200 300 400 time(sec) 500 600 700 PIE 64 image data: 4, 096 × 11, 554, k = 80 and k = 160 Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications BPP vs HALS : Influence of Sparsity (J. Kim and Park, 2011) syn−90, k=160 10 syn−95, k=160 10 10 10 HALS ANLS−BPP relative obj. value relative obj. value HALS ANLS−BPP 0 10 −10 10 −20 10 0 10 −10 10 −20 0 100 200 10 300 0 time(sec) syn−90, k=160 proportion of elements proportion of elements 200 time(sec) syn−95, k=160 300 1 1 0.8 W sparsity H sparsity W change H change 0.6 0.4 0.2 0 100 0 20 40 60 0.8 W sparsity H sparsity W change H change 0.6 0.4 0.2 0 0 iter 20 40 60 iter Synthetic data 10, 000 × 2, 000 created by factors with different sparsities Left: 90% sparsity, Right: 95% sparsity Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Adaptive NMF for Varying Reduced Rank k → k̃ (He, Kim, Cichocki, and Park, in preparation) Given (W , H) with k , how to compute (W̃ , H̃) with k̃ fast? E.g., model selection for NMF clustering AdaNMF Initialize W̃ and H̃ using W and H If k̃ > k , compute NMF for A − WH ≈ ∆W ∆H. Set W̃ = [W ∆W ] and H̃ = [H; ∆H] If k̃ < k , initialize W̃ and H̃ with k̃ pairs of (wi , hi ) with largest kwi hiT kF = kwi k2 khi k2 Update W̃ and H̃ using HALS algorithm. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Model Selection in NMF Clustering (He, Kim, Cichocki, and Park, in preparation) Consensus matrix based on A ≈ WH: ( 0 max(H(:, i)) = max(H(:, j)) Cijt = , t = 1, . . . , l 1 max(H(:, i)) 6= max(H(:, j)) Dispersion coefficient ρ(k ) = P t C = 1l C 1 n2 Pn i=1 Pn − 21 )2 , where j=1 4(Cij Reordered Consensus matrix, k=4 Reordered Consensus Matrix, k=3 1 1 200 0.9 200 0.9 400 0.8 400 0.8 600 0.7 600 0.7 800 0.6 800 0.6 1000 0.5 1000 0.5 1200 0.4 1200 0.4 1400 0.3 1400 0.3 0.2 1600 0.1 1800 0 2000 0.95 0.9 1600 Dispersion coefficient 0.85 0.8 0.75 0.7 0.2 0.65 1800 2000 500 1000 1500 2000 Reordered Consensus Matrix, k=5 0.1 500 1000 1500 2000 0 3 3.5 4 4.5 5 Approximation Rank k 5.5 6 Reordered Consensus Matrix, k=6 1 1 12 200 0.9 200 0.9 400 0.8 400 0.8 600 0.7 600 0.7 800 0.6 800 0.6 1000 0.5 1000 0.5 1200 0.4 1200 0.4 1400 0.3 1400 0.3 1600 0.2 1600 0.2 1800 0.1 1800 0.1 0 2000 11 Execution time (seconds) 10 9 8 7 6 5 4 2000 500 1000 1500 2000 500 1000 1500 2000 0 AdaNMF Recompute Warm−restart Clustering results on MNIST digit images (784 × 2000) by AdaNMF with k = 3, 4, 5 and 6. Averaged consensus matrices, dispersion coefficient, execution time Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Adaptive NMF for Varying Reduced Rank (He, Kim, Cichocki, and Park, in preparation) 0.18 0.18 Recompute Warm−restart AdaNMF 0.16 Recompute Warm−restart AdaNMF 0.16 0.14 Relative error Relative error 0.14 0.12 0.12 0.1 0.08 0.1 0.06 0.08 0.04 0.06 0 1 2 3 4 Execution time 5 6 7 0.02 0 2 4 6 Execution time 8 10 12 Relative error vs. exec. time of AdaNMF and “recompute”. Given an NMF of 600 × 600 synthetic matrix with k = 60, compute NMF with k̃ = 50, 80. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Adaptive NMF for Varying Reduced Rank (He, Kim, Cichocki, and Park, in preparation) Theorem: For A ∈ Rm×n + , If rank+ (A) > k , then (k +1) (k +1) min kA − W H kF < min kA − W (k ) H (k ) kF , m×i (i) where W ∈ R+ and H (i) ∈ Ri×n + . 0.35 Training error Testing error rank=20 rank=40 rank=60 rank=80 0.3 0.3 0.25 0.25 Classification error Relative Objective Function Value of NMF 0.35 0.2 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 0 20 40 60 80 100 120 Approximation Rank k of NMF 140 160 0 5 10 15 20 25 30 35 Reduced rank k of NMF 40 45 50 Rank path on synthetic data set: relative residual vs. k ORL Face image (10304 × 400) classification errors (by LMNN) on training and testing set vs. k . k -dim rep. HT of training data T by BPP minHT ≥0 kWHT − T kF Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications NMF for Dynamic Data (DynNMF) (He, Kim, Cichocki, and Park, in preparation) Given an NMF (W , H) for A = [δA Â], how to compute NMF (W̃ , H̃) for à = [ ∆A] fast ? (Updating and Downdating) DynNMF (Sliding Window NMF) Initialize H̃ as follows: Let Ĥ be the remaining columns of H. Solve min∆H≥0 kW ∆H − ∆Ak2F using block principal pivoting Set H̃ = [Ĥ ∆H] Run HALS on à with initial factors W̃ = W and H̃ Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications DynNMF for Dynamic Data (He, Kim, Cichocki, and Park, in preparation) PET2001 data with 3064 images from a surveillance video. DynNMF on 110, 592 × 400 data matrix each time, with 100 new columns and 100 obsolete columns. The residual images track the moving vehicle in the video. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications NMF as a Clustering Method (Kuang and Park, in preparation) Clustering and Lower Rank Approximation are related. NMF for Clustering: Document (Xu et al. 03), Image (Cai et al. 08), Microarray (Kim & Park 07), etc. Equivalence of objective functions between k-means and NMF: (Ding, et al., 05; Kim & Park, 08) min n X kai − wSi k22 = min kA − WHk2F i=1 Si = j when i-th point is assigned to j-th cluster (j ∈ {1, · · · , k }) k-means: W : k cluster centroids, H ∈ E NMF: W : basis vectors for rank-k approximation, H: representation of A in W space (E: matrices whose columns are columns of an identity matrix ) NOTE: The equivalence of obj. functions holds when H ∈ E, A ≥ 0. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications NMF and K-means min kA − WHk2F s.t. H ∈ E Paths to solution: K-means: Expectation-Minimization NMF: Relax the condition on H to H ≥ 0 with orthogonal rows or H ≥ 0 with sparse columns - soft clustering TDT2 text data set: (clustering accuracy aver. among 100 runs) # clusters 2 6 10 14 18 K-means 0.8099 0.7295 0.7015 0.6675 0.6675 NMF/ANLS 0.9990 0.8717 0.7436 0.7021 0.7160 Sparsity constraint improves clusteringP result (J. Kim and Park, 08): minW ≥0,H≥0 kA − WHk2F + ηkW k2F + β nj=1 kH(:, j)k21 # of times achieving optimal assignment (a synthetic data set, with a clear cluster structure ): k NMF SNMF 3 69 100 6 65 100 9 74 100 12 68 100 15 44 97 NMF and SNMF much better than k-means in general. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications NMF, K-means, and Spectral Clustering (Kuang and Park, in preparation) Equivalence of objective functions is not enough to explain the clustering capability of NMF: NMF is more related to spherical k-means, than to k-means → NMF shown to work well in text data clustering Symmetric NMF: minS≥0 kA − SS T kF , A ∈ R+ n×n : affinity matrix Spectral clustering → Eigenvectors (Ng et al. 01), A normalized if needed, Laplacian,... Symmetric NMF (Ding et al.) → can handle nonlinear structure, and S ≥ 0 natually captures a cluster structure in S Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Summary/Discussions Overview of NMF with Frobenius norm and algorithms Fast algorithms and convergence via BCD framework Adaptive NMF algorithms Variations/Extensions of NMF : nonnegative PARAFAC and sparse NMF NMF for clustering Extensive computational comparisons Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Summary/Discussions Overview of NMF with Frobenius norm and algorithms Fast algorithms and convergence via BCD framework Adaptive NMF algorithms Variations/Extensions of NMF : nonnegative PARAFAC and sparse NMF NMF for clustering Extensive computational comparisons NMF for clustering and semi-supervised clustering NMF and probability related methods NMF and geometric understanding NMF algorithms for large scale problems, parallel implementation? GPU? Fast NMF with other divergences (Bregman and Csiszar divergences) NMF for blind source separation? Uniqueness? More theoretical study on NMF especially for foundations for computational methods NMF Matlab codes and papers available at http://www.cc.gatech.edu/∼hpark and http://www.cc.gatech.edu/∼jingu Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Collaborators Today’s talk Jingu Kim CSE, Georgia Tech Yunlong He Math, Georgia Tech Krishnakumar Balasubramanian Prof. Michael Berry Prof. Moody Chu Dr. Andrzej Cichocki Prof. Chris Ding Prof. Lars Elden Dr. Mariya Ishteva Dr. Hyunsoo Kim Anoop Korattikara Prof. Guy Lebanon Liangda Li Prof. Tao Li Prof. Robert Plemmons Andrey Puretskiy Prof. Max Welling Dr. Stan Young Da Kuang CSE, Georgia Tech CSE, Georgia Tech EECS, Univ. of Tennessee Math, North Carolina State Univ. Brain Science Institute, RIKEN, Japan CSE, UT Arlington Math, Linkoping Univ., Sweden CSE, Georgia Tech Wistar Inst. CS, UC Irvine CSE, Georgia Tech CSE, Georgia Tech CS, Florida International Univ. CS, Wake Forest Univ. EECS, Univ. of Tennessee CS, UC Irvine NISS Thank you! Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Related Papers by H. Park’s Group H. Kim and H. Park, Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares for Microarray Data Analysis. Bioinformatics, 23(12):1495-1502, 2007. H. Kim, H. Park, and L. Eldén. Non-negative Tensor Factorization Based on Alternating Large-scale Non-negativity-constrained Least Squares. Proc. of IEEE 7th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1147-1151, 2007. H. Kim and H. Park, Nonnegative Matrix Factorization Based on Alternating Non-negativity-constrained Least Squares and the Active Set Method. SIAM Journal on Matrix Analysis and Applications, 30(2):713-730, 2008. J. Kim and H. Park, Sparse Nonnegative Matrix Factorization for Clustering, Georgia Tech Technical Report GT-CSE-08-01, 2008. J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons. Proc. of the 8th IEEE International Conference on Data Mining (ICDM), pp. 353-362, 2008. B. Drake, J. Kim, M. Mallick, and H. Park, Supervised Raman Spectra Estimation Based on Nonnegative Rank Deficient Least Squares. In Proceedings of the 13th International Conference on Information Fusion, Edinburgh, UK, 2010. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications Related Papers by H. Park’s Group A. Korattikara, L. Boyles, M. Welling, J. Kim, and H. Park, Statistical Optimization of Non-Negative Matrix Factorization. Proc. of The Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR: W&CP 15, 2011. J. Kim and H. Park, Fast Nonnegative Matrix Factorization: an Active-set-like Method and Comparisons, Submitted for review, 2011. J. Kim and H. Park, Fast Nonnegative Tensor Factorization with an Active-set-like Method, In High-Performance Scientific Computing: Algorithms and Applications, Springer, in preparation. Y. He, J. Kim, A. Cichocki, and H. Park, Fast Adaptive NMF Algorithms for Varying Reduced Rank and Dynamic Data, in preparation. L. Li, G. Lebanon, and H. Park, Fast Algorithm for Non-Negative Matrix Factorization with Bregman and Csiszar Divergences, in preparation. D. Kuang and H. Park, Nonnegative Matrix Factorization for Spherical and Spectral Clustering, in preparation. Haesun Park [email protected] Nonnegative Matrix Factorization: Algorithms and Applications
© Copyright 2026 Paperzz