Nonnegative Matrix Factorization: Algorithms and

Nonnegative Matrix Factorization:
Algorithms and Applications
Haesun Park
[email protected]
School of Computational Science and Engineering
Georgia Institute of Technology
Atlanta, GA, USA
SIAM International Conference on Data Mining, April, 2011
This work was supported in part by the National Science Foundation.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Co-authors
Jingu Kim
CSE, Georgia Tech
Yunlong He
Math, Georgia Tech
Da Kuang
CSE, Georgia Tech
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Outline
Overview of NMF
Fast algorithms with Frobenius norm
Theoretical results on convergence
Multiplicative updating
Alternating nonnegativity constrained least Squares: Active-set
type methods, ...
Hierarchical alternating least squares
Variations/Extensions of NMF : sparse NMF, regularized NMF,
nonnegative PARAFAC
Efficient adaptive NMF algorithms
Applications of NMF, NMF for Clustering
Extensive computational results
Discussions
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Matrix Factorization (NMF)
(Lee&Seung 99, Paatero&Tapper 94)
Given A ∈ R+ m×n and a desired rank k << min(m, n),
find W ∈ R+ m×k and H ∈ R+ k ×n s.t. A ≈ WH.
minW ≥0,H≥0 kA − WHkF
Nonconvex
W and H not unique ( e.g. Ŵ = WD ≥ 0, Ĥ = D −1 H ≥ 0)
Notation: R+ : nonnegative real numbers
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Matrix Factorization (NMF)
(Lee&Seung 99, Paatero&Tapper 94)
Given A ∈ R+ m×n and a desired rank k << min(m, n),
find W ∈ R+ m×k and H ∈ R+ k ×n s.t. A ≈ WH.
minW ≥0,H≥0 kA − WHkF
NMF improves the approximation as k increases:
If rank+ (A) > k ,
min
Wk +1 ≥0,Hk +1 ≥0
kA − Wk +1 Hk +1 kF <
min
Wk ≥0,Hk ≥0
kA − Wk Hk kF ,
Wi ∈ R+ m×i and Hi ∈ R+ i×n
But SVD does better: if A = UΣV T , then
kA − Uk Σk VkT kF ≤ minkA − WHkF , W ∈ R+ m×k and H ∈ R+ k ×n
So Why NMF? Dimension Reduction with
Better Interpretation/Lower Dim. Representation for Nonnegative
Data.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Rank of A ∈ R+ m×n
(J. Cohen and U. Rothblum, LAA, 93)
rank+ (A), is the smallest integer k for which there exist
V ∈ R+ m×k and U ∈ R+ k ×n such that A = VU.
Note: rank(A) ≤ rank+ (A) ≤ min(m, n)
If rank(A) ≤ 2, then rank+ (A) = rank(A).
If either m ∈ {1, 2, 3} or n ∈ {1, 2, 3}, then rank+ (A) = rank(A).
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative Rank of A ∈ R+ m×n
(J. Cohen and U. Rothblum, LAA, 93)
rank+ (A), is the smallest integer k for which there exist
V ∈ R+ m×k and U ∈ R+ k ×n such that A = VU.
Note: rank(A) ≤ rank+ (A) ≤ min(m, n)
If rank(A) ≤ 2, then rank+ (A) = rank(A).
If either m ∈ {1, 2, 3} or n ∈ {1, 2, 3}, then rank+ (A) = rank(A).
(Perron-Frobenius) There are nonnegative left and right singular
vectors u1 and v1 of A associated with the largest singular value
σ1 .
rank 1 SVD of A = best rank-one NMF of A
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Applications of NMF
Text mining
Topic model: NMF as an alternative way for PLSI ( Gaussier et al.,
05; Ding et al., 08)
Document clustering (Xu et al., 03; Shahnaz et al., 06)
Topic detection and trend tracking, email analysis (Berry et al., 05;
Keila et al., 05; Cao et al., 07)
Image analysis and computer vision
Feature representation, sparse coding (Lee et al., 99; Guillamet et
al., 01; Hoyer et al., 02; Li et al. 01)
Video tracking (Bucak et al., 07)
Social network
Community structure and trend detection ( Chi et al., 07; Wang et
al., 08)
Recommendation system (Zhang et al., 06)
Bioinformatics-microarray data analysis (Brunet et al., 04, H. Kim
and Park, 07)
Acoustic signal processing, blind source separating (Cichocki et
al., 04)
Financial data (Drakakis et al., 08)
Chemometrics (Andersson and Bro, 00)
and SO MANY MORE...
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Algorithms for NMF
Multiplicative update rules: Lee and Seung, 99
Alternating least squares (ALS): Berry et al 06
Alternating nonnegative least squares (ANLS)
Lin, 07, Projected gradient descent
D. Kim et al., 07, Quasi-Newton
H. Kim and Park, 08, Active-set
J. Kim and Park, 08, Block principal pivoting
Other algorithms and variants
Cichocki et al., 07, Hierarchical ALS (HALS)
Ho, 08, Rank-one Residue Iteration (RRI)
Zdunek, Cichocki, Amari 06, Quasi-Newton
Chu and Lin, 07, Low dim polytope approx.
Other rank-1 downdating based algorithms (Vavasis,..)
C. Ding, T. Li, tri-factor NMF, orthogonal NMF, ...
Cichocki, Zdunek, Phan, Amari: NMF and NTF: Applications to
Exploratory Multi-way Data Analysis and Blind Source Separation,
Wiley, 09
Andersson and Bro, Nonnegative Tensor Factorization, 00
And SO MANY MORE...
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Block Coordinate Descent (BCD) Method
A constrained nonlinear problem:
min f (x)(e.g., f (W , H) = kA − WHkF )
subject to x ∈ X = X1 × X2 × · · · × Xp ,
where x = (x1 , x2 , . . . , xp ), xi ∈ Xi ⊂ Rni , i = 1, . . . , p.
Block Coordinate Descent method generates
(k +1)
(k +1)
x (k +1) = (x1
, . . . , xp
) by
(k +1)
xi
(k +1)
= arg min f (x1
ξ∈Xi
Haesun Park [email protected]
(k +1)
(k )
(k )
, . . . , xi−1 , ξ, xi+1 , . . . , xp ).
Nonnegative Matrix Factorization: Algorithms and Applications
Block Coordinate Descent (BCD) Method
A constrained nonlinear problem:
min f (x)(e.g., f (W , H) = kA − WHkF )
subject to x ∈ X = X1 × X2 × · · · × Xp ,
where x = (x1 , x2 , . . . , xp ), xi ∈ Xi ⊂ Rni , i = 1, . . . , p.
Block Coordinate Descent method generates
(k +1)
(k +1)
x (k +1) = (x1
, . . . , xp
) by
(k +1)
xi
(k +1)
= arg min f (x1
ξ∈Xi
(k +1)
(k )
(k )
, . . . , xi−1 , ξ, xi+1 , . . . , xp ).
Th. (Bertsekas, 99): Suppose f is continuously differentiable over the
Cartesian product of closed, convex sets X1 , X2 , . . . , Xp and suppose
for each i and x ∈ X , the minimum for
(k +1)
min f (x1
ξ∈Xi
(k +1)
(k )
(k )
, . . . , xi−1 , ξ, xi+1 , . . . , xp )
is uniquely attained. Then every limit point of the sequence generated
by the BCD method {x (k ) } is a stationary point.
NOTE: Uniqueness not required when p = 2 (Grippo and Sciandrone, 00).
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
BCD with k (m + n) Scalar Blocks
H
A
W
Minimize functions of wij or hij while all other components in W
and H are fixed:
X
wik hkT ) − wij hjT k2
wij ← arg min k(riT −
wij ≥0
hij
k 6=j
← arg min k(aj −
hij ≥0
X
wk hkj ) − wi hij k2
k 6=i

h1T


where W = w1 · · · wk , H =  ...  and
hkT
 T 
r1


A = a1 · · · an =  ... 

rmT
Scalar quadratic function, closed form solution.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
BCD with k (m + n) Scalar Blocks
Lee and Seung (01)’s multiplicative updating (MU) rule
wij ← wij
(AH T )ij
(W T A)ij
, hij ← hij
(WHH T )ij
(W T WH)ij
Derivation based on gradient-descent form:
h
i
wij
T
T
wij ← wij +
(AH
)
−
(WHH
)
ij
ij
(WHH T )ij
h
i
hij
(W T A)ij − (W T WH)ij
hij ← hij +
T
(W WH)ij
Rewriting of the solution of coordinate descent:
1
T
T
wij ← wij +
(AH )ij − (WHH )ij
(HH T )jj
+
1
T
T
(W A)ij − (W WH)ij
hij ← hij +
(W T W )ii
+
In MU, conservative steps are taken to ensure nonnegativity.
Bertsekas’ Th. on convergence is not applicable to MU.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
BCD with 2k Vector Blocks
H
A
W
Minimize functions of wi or hi while all other components in W
and H are fixed:
kA −
k
X
wj hjT kF
j=1
=
k(A −
k
X
wj hjT ) − wi hiT kF = kR (i) − wi hiT kF
j=1
j6=i
wi
← arg min kR (i) − wi hiT kF
hi
← arg min kR (i) − wi hiT kF
wi ≥0
hi ≥0
Each subproblem has the form minx≥0 kcx T − GkF and
T
has a closed form solution x = [ Gc T cc ]+ !
Hierarchical Alternating Least Squares (HALS) (Cichocki et al, 07, 09),
(actually HA-NLS)
Rank-one Residue Iteration (RRI) (Ho, 08)
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
BCD with Scalar Blocks vs. 2k Vector Blocks
H
A
W
H
W
A
In scalar BCD, w1j , w2j , · · · , wmj can be computed independently.
Also, hi1 , hi2 , · · · , hin can be computed independently.
→ scalar BCD ⇔ 2k vector BCD in NMF
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Successive Rank-1 Deflation in SVD and NMF
Successive rank-1 deflation works for SVD but not for NMF
A − σ1 u1 v1T ≈ σ2 u2 v2T ?
A − w1 h1T ≈ w2 h2T ?
0
4
@ 6
0
6
4
0
0
1
0
B
A
0
=@
1
1
√
2
1
√
2
0
− √1
1
√
2
0
2
0
10
C
0 A@
1
10
0
0
0
2
0
10
0
B
0 A@
1
1
√
2
1
√
2
0
1
√
2
− √1
2
0
0
1
C
0 A
1
The
best
nonnegative
approx. is
 sum of two
 successive

 rank-1


4 6 0
5 5 0
0 0 0
 6 4 0 ≈ 5 5 0 + 0 0 0 
0 0 1
0 0 0
0 0 1
The best
rank-2
nonnegative
approx.

 
 is
4 6 0
4 6 1 0 0




6 4 0
6 4
WH =
=
0 1 0
0 0 0
0 0
NOTE: 2k vector BCD 6= successive rank-1 deflation for NMF
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
BCD with 2 Matrix Blocks
H
A
W
Minimize functions of W or H while the other is fixed:
W
← arg min kH T W T − AT kF
W ≥0
H ← arg min kWH − AkF
H≥0
Alternating Nonnegativity-constrained Least Squares (ANLS)
No closed form solution.
Projected gradient method (Lin, 07)
Projected quasi-Newton method (D. Kim et al., 07)
Active-set method (H. Kim and Park, 08)
Block principal pivoting method (J. Kim and Park, 08)
ALS (M. Berry et al. 06) ??
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NLS : minX ≥0 kCX − Bk2F =
P
minxi kCxi − bi k22
Nonnegativity-constrained Least Squares (NLS) problem
Projected Gradient method (Lin, 07) x (k +1) ← P+ (x (k ) − αk ∇f (x (k ) ))
* P+ (·): Projection operator to the nonnegative orthant
* Back-tracking selection of step αk
ProjectedQuasi-Newton
(Kim et al., 07)
method
(k ) − αD (k ) ∇f (y (k ) )
y
y
P
+
(k
+1)
x
←
=
zk
0
* Gradient scaling only for nonzero variables
These do not fully exploit the structure of the NLS problems in
NMF
Active Set method (H. Kim and Park, (08)
Lawson and Hanson (74), Bro and De Jong (97), Van Benthem and Keenan (04) )
Block principal pivoting method (J. Kim and Park, 08)
linear complementarity problems (LCP) (Judice and Pires, 94)
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Active-set type Algorithms for
minx≥0 kCx − bk2 , C : m × k
KKT conditions: y = C T Cx − C T b
y ≥ 0, x ≥ 0, xi yi = 0, i = 1, · · · , k
If we know P = {i|xi > 0} in the solution in advance
then we only need to solve min kCP xP − bk2 , and the rest of
xi = 0, where CP : columns of C with the indices in P
C
*
x
b
+
+
0
0
+
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Active-set type Algorithms for
minx≥0 kCx − bk2 , C : m × k
KKT conditions: y = C T Cx − C T b
y ≥ 0, x ≥ 0, xi yi = 0, i = 1, · · · , k
Active set method (Lawson and Hanson 74)
E = {1, · · · , k } (i.e. x = 0 initially), P = null
Repeat while E not null and yi < 0 for some i
Exchange indices between E and P while keeping feasibility and
reducing the objective function value
Block Principal Pivoting method (Portugal et al. 94 MathComp):
Lacks any monotonicity or feasibility but finds a correct
active-passive set partitioning.
Guess two index sets P and E that partition {1, · · · , k }
Repeat
Let xE = 0 and xP = arg minxP kCP xP − bk22
Then yE = CET (CP xP − b) and yP = 0
If xP ≥ 0 and yE ≥ 0, then optimal values are found.
Otherwise, update P and E.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
k = 10, Initially P = {1, 2, 3, 4, 5}, E = {6, 7, 8, 9, 10}
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
y
P
0
P
0
P
0
P
0
P
0
E
0
E
0
E
0
E
0
E
0
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
y
P
+
0
P
-
0
P
-
0
P
+
0
P
-
0
E
0
-
E
0
+
E
0
-
E
0
+
E
0
+
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
x
y
P
+
0
P
P
-
0
E
0
P
-
0
E
0
P
+
0
P
P
-
0
E
E
0
-
P
E
0
+
E
E
0
-
P
E
0
+
E
0
E
0
+
E
0
y
0
0
0
0
0
0
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
y
x
y
P
+
0
P
+
0
P
-
0
E
0
+
P
-
0
E
0
-
P
+
0
P
+
0
P
-
0
E
0
+
E
0
-
P
-
0
E
0
+
E
0
+
E
0
-
P
+
0
E
0
+
E
0
+
E
0
+
E
0
+
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
y
P
+
0
P
-
P
P
x
y
x
y
P
+
0
P
0
E
0
+
E
-
0
E
0
-
P
0
+
0
P
+
0
P
0
P
-
0
E
0
+
E
0
E
0
-
P
-
0
E
0
E
0
+
E
0
+
E
0
E
0
-
P
+
0
P
E
0
+
E
0
+
E
0
E
0
+
E
0
+
E
0
Haesun Park [email protected]
0
0
0
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
y
P
+
0
+
E
0
+
0
-
P
+
0
P
+
0
P
+
0
0
E
0
+
E
0
+
-
P
-
0
E
0
+
0
+
E
0
+
E
0
+
E
0
-
P
+
0
P
+
0
E
0
+
E
0
+
E
0
+
E
0
+
E
0
+
E
0
+
x
y
x
y
P
+
0
P
+
0
P
-
0
E
0
P
-
0
E
P
+
0
P
-
E
0
E
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
How block principal pivoting works
Update by CPT CP xP = CPT b, and yE = CET (CP xP − b)
x
y
P
+
0
+
E
0
+
0
-
P
+
0
P
+
0
P
+
0
0
E
0
+
E
0
+
0
-
P
-
0
E
0
+
0
+
E
0
+
E
0
+
E
0
-
P
+
0
P
+
0
E
0
+
E
0
+
E
0
+
E
0
+
E
0
+
E
0
+
x
y
x
y
P
+
0
P
+
0
P
-
0
E
0
P
-
0
E
P
+
0
P
-
E
E
Solved!
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Refined Exchange Rules
Active set algorithm is a special instance of single principal
pivoting algorithm (H. Kim and Park, SIMAX 08)
Block exchange rule without modification does not always work.
The residual is not guaranteed to monotonically decrease.
Block exchange rule may cycle (although rarely).
Modification: if the block exchange rule fails to decrease the
number of infeasible variables, use a backup exchange rule
With this modification, block principal pivoting algorithm finds the
solution of NLS in a finite number of iterations.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Structure of NLS problems in NMF (J. Kim and Park, 08)
Matrix is long and thin, solutions vectors short, many right hand
side vectors.
minH≥0 kWH − Ak2F
2
minW ≥0 H T W T − AT F
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Efficient Algorithm for minX ≥0 kCX − Bk2F
(J. Kim and Park, 08)
Precompute C T C and C T B
Update xP and yE by CPT CP xP = CPT b and yE = CET CP xP − CET b
All coefficients can be retrieved from C T C and C T B
C T C and C T B is small. Storage is not a problem.
→
Exploit common P and E sets among col. in B in each iteration.
X is flat and wide. → More common cases of P and E sets.
Proposed algorithm for NMF (ANLS/BPP):
ANLS framework + Block principal pivoting algorithm for NLS
with improvements for multiple right-hand sides
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Sparse NMF and Regularized NMF
Sparse NMF (for sparse H) (H. Kim and Park, Bioinformatics, 07)


n


X
2
2
2
min kA − WHkF + η kW kF + β
kH(:, j)k1 , ∀ij, Wij , Hij ≥ 0

W ,H 
j=1
ANLS reformulation (H. Kim and Park, 07) : alternate the following
2
A
√W
min H−
βe1×k
01×n F
H≥0 2
AT
HT
WT −
min √
ηIk
0k ×m F
W ≥0 Regularized NMF (Pauca, et al. 06):
n
o
min kA − WHk2F + η kW k2F + β kHk2F , ∀ij, Wij , Hij ≥ 0.
W ,H
ANLS reformulation : alternate the following
2
A
√W
min H−
βI
0
H≥0
k
k ×n
F
T
T
2
A
H
T
√
W −
min
ηIk
0k ×m F
W ≥0 Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative PARAFAC
Consider a 3-way Nonnegative Tensor T ∈ Rm×n×p
and
+
2
its PARAFAC minA,B,C≥0 kT − [[ABC]]kF
p×k
where A ∈ Rm×k
, B ∈ Rn×k
+
+ , C ∈ R+ .
The loading matrices (A,B, and C) can be iteratively estimated
by an NLS algorithm such as block principal pivoting method.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Nonnegative PARAFAC (J. Kim and Park, in preparation)
Iterate until a stopping criteria is satisfied:
minA≥0 YBC AT − T(1) F
minB≥0 YAC B T − T(2) F
minC≥0 YAB C T − T(3) F where
YBC = B C ∈ R(np)×k , T(1) ∈ R(np)×m ,
YAC = A C ∈ R(mp)×k , T(2) ∈ R(mp)×n ,
YAB = A B ∈ R(mn)×k , T(3) ∈ R(mn)×p unfolded matrices,
and F G(mn)×(k ) = [f1 ⊗ g1 f2 ⊗ g2 · · · fk ⊗ gk ] is the
Khatri-Rao product of F ∈ Rm×k and G ∈ Rn×k .
Matrices are longer and thinner, ideal for ANLS/BPP.
Can be similarly extended to higher order tensors.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Experimental Results (NMF) (J. Kim and Park, 2011)
NMF Algorithms Compared
Name
ANLS-BPP
ANLS-AS
ANLS-PGRAD
ANLS-PQN
HALS
MU
ALS
Description
ANLS / block principal pivoting
ANLS / active set
ANLS / projected gradient
ANLS / projected quasi-Newton
Hierarchical ALS
Multiplicative updating
Alternating least squares
Haesun Park [email protected]
Author
J. Kim and HP 08
H. Kim and HP 08
Lin 07
D. Kim et al. 07
Cichocki et al. 07
Lee and Seung 01
Berry et al. 06
Nonnegative Matrix Factorization: Algorithms and Applications
Active-set vs. Block principal pivoting (J. Kim and Park, 2011)
ATNT, k=10
TDT2, k=160
3
10
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
2
elapsed
elapsed
10
0
10
1
10
0
0
10
20
30
40
50
iter
60
70
80
90
10
100
ATNT, k=10
2
0
10
20
30
40
60
70
80
90
100
TDT2, k=160
4
10
50
iter
10
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
3
10
1
elapsed
elapsed
10
ANLS−BPP
ANLS−AS−UPDATE
ANLS−AS−GROUP
2
10
0
10
1
10
−1
10
0
0
10
20
30
40
50
iter
60
70
80
90
100
10
0
10
20
30
40
50
iter
60
70
80
90
100
ATNT image data: 10, 304 × 400, k = 10, TDT2 text data:19, 009 × 3, 087, k = 160
Top:time per iteration, bottom:cumulative time
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Residual vs. Execution time (J. Kim and Park, 2011)
TDT2, k=10
TDT2, k=160
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0.9
0.89
0.65
0.64
0.63
relative obj. value
relative obj. value
0.88
0.87
0.86
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0.62
0.61
0.6
0.85
0.59
0.84
0.58
0
10
20
30
40
50
60
time(sec)
70
80
90
100
0
100
200
300
400
time(sec)
500
600
700
TDT2 text data: 19, 009 × 3, 087, k = 10 and k = 160
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Residual vs. Execution time (J. Kim and Park, 2011)
ATNT, k=10
20 Newsgroups, k=160
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0.3
0.29
0.27
0.54
0.26
0.25
0.24
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0.53
relative obj. value
relative obj. value
0.28
0.55
0.23
0.52
0.51
0.5
0.49
0.22
0.48
0.21
0.2
0.47
0
10
20
30
40
50
60
time(sec)
70
80
90
100
0
100
200
300
400
time(sec)
500
600
700
ATNT image data: 10, 304 × 400, k = 10 and
20 Newsgroups text data: 26, 214 × 11, 314, k = 160
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Residual vs. Execution time (J. Kim and Park, 2011)
PIE 64, k=160
PIE 64, k=80
0.35
HALS
MU
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0.4
0.3
HALS
MU
ALS
ANLS−PGRAD
ANLS−PQN
ANLS−BPP
0.3
relative obj. value
relative obj. value
0.35
0.25
0.25
0.2
0.2
0.15
0.15
0
50
100
150
time(sec)
200
250
300
0
100
200
300
400
time(sec)
500
600
700
PIE 64 image data: 4, 096 × 11, 554, k = 80 and k = 160
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
BPP vs HALS : Influence of Sparsity (J. Kim and Park, 2011)
syn−90, k=160
10
syn−95, k=160
10
10
10
HALS
ANLS−BPP
relative obj. value
relative obj. value
HALS
ANLS−BPP
0
10
−10
10
−20
10
0
10
−10
10
−20
0
100
200
10
300
0
time(sec)
syn−90, k=160
proportion of elements
proportion of elements
200
time(sec)
syn−95, k=160
300
1
1
0.8
W sparsity
H sparsity
W change
H change
0.6
0.4
0.2
0
100
0
20
40
60
0.8
W sparsity
H sparsity
W change
H change
0.6
0.4
0.2
0
0
iter
20
40
60
iter
Synthetic data 10, 000 × 2, 000 created by factors with different sparsities
Left: 90% sparsity, Right: 95% sparsity
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank k → k̃
(He, Kim, Cichocki, and Park, in preparation)
Given (W , H) with k , how to compute (W̃ , H̃) with k̃ fast?
E.g., model selection for NMF clustering
AdaNMF
Initialize W̃ and H̃ using W and H
If k̃ > k , compute NMF for A − WH ≈ ∆W ∆H. Set W̃ = [W ∆W ]
and H̃ = [H; ∆H]
If k̃ < k , initialize W̃ and H̃ with k̃ pairs of (wi , hi ) with largest
kwi hiT kF = kwi k2 khi k2
Update W̃ and H̃ using HALS algorithm.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Model Selection in NMF Clustering (He, Kim, Cichocki, and Park, in preparation)
Consensus matrix based on A ≈ WH:
(
0 max(H(:, i)) = max(H(:, j))
Cijt =
, t = 1, . . . , l
1 max(H(:, i)) 6= max(H(:, j))
Dispersion coefficient ρ(k ) =
P t
C = 1l
C
1
n2
Pn
i=1
Pn
− 21 )2 , where
j=1 4(Cij
Reordered Consensus matrix, k=4
Reordered Consensus Matrix, k=3
1
1
200
0.9
200
0.9
400
0.8
400
0.8
600
0.7
600
0.7
800
0.6
800
0.6
1000
0.5
1000
0.5
1200
0.4
1200
0.4
1400
0.3
1400
0.3
0.2
1600
0.1
1800
0
2000
0.95
0.9
1600
Dispersion coefficient
0.85
0.8
0.75
0.7
0.2
0.65
1800
2000
500
1000
1500
2000
Reordered Consensus Matrix, k=5
0.1
500
1000
1500
2000
0
3
3.5
4
4.5
5
Approximation Rank k
5.5
6
Reordered Consensus Matrix, k=6
1
1
12
200
0.9
200
0.9
400
0.8
400
0.8
600
0.7
600
0.7
800
0.6
800
0.6
1000
0.5
1000
0.5
1200
0.4
1200
0.4
1400
0.3
1400
0.3
1600
0.2
1600
0.2
1800
0.1
1800
0.1
0
2000
11
Execution time (seconds)
10
9
8
7
6
5
4
2000
500
1000
1500
2000
500
1000
1500
2000
0
AdaNMF
Recompute
Warm−restart
Clustering results on MNIST digit images (784 × 2000) by AdaNMF
with k = 3, 4, 5 and 6. Averaged consensus matrices, dispersion
coefficient, execution time
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank
(He, Kim, Cichocki, and Park, in preparation)
0.18
0.18
Recompute
Warm−restart
AdaNMF
0.16
Recompute
Warm−restart
AdaNMF
0.16
0.14
Relative error
Relative error
0.14
0.12
0.12
0.1
0.08
0.1
0.06
0.08
0.04
0.06
0
1
2
3
4
Execution time
5
6
7
0.02
0
2
4
6
Execution time
8
10
12
Relative error vs. exec. time of AdaNMF and “recompute”. Given an
NMF of 600 × 600 synthetic matrix with k = 60, compute NMF with
k̃ = 50, 80.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Adaptive NMF for Varying Reduced Rank
(He, Kim, Cichocki, and Park, in preparation)
Theorem: For A ∈ Rm×n
+ , If rank+ (A) > k , then
(k
+1)
(k
+1)
min kA − W
H
kF < min kA − W (k ) H (k ) kF ,
m×i
(i)
where W ∈ R+ and H (i) ∈ Ri×n
+ .
0.35
Training error
Testing error
rank=20
rank=40
rank=60
rank=80
0.3
0.3
0.25
0.25
Classification error
Relative Objective Function Value of NMF
0.35
0.2
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
0
20
40
60
80
100
120
Approximation Rank k of NMF
140
160
0
5
10
15
20
25
30
35
Reduced rank k of NMF
40
45
50
Rank path on synthetic data set: relative residual vs. k
ORL Face image (10304 × 400) classification errors (by LMNN) on training
and testing set vs. k .
k -dim rep. HT of training data T by BPP minHT ≥0 kWHT − T kF
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF for Dynamic Data (DynNMF) (He, Kim, Cichocki, and Park, in preparation)
Given an NMF (W , H) for A = [δA Â], how to compute NMF
(W̃ , H̃) for à = [ ∆A] fast ?
(Updating and Downdating)
DynNMF (Sliding Window NMF)
Initialize H̃ as follows:
Let Ĥ be the remaining columns of H.
Solve min∆H≥0 kW ∆H − ∆Ak2F using block principal pivoting
Set H̃ = [Ĥ ∆H]
Run HALS on à with initial factors W̃ = W and H̃
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
DynNMF for Dynamic Data (He, Kim, Cichocki, and Park, in preparation)
PET2001 data with 3064 images from a surveillance video.
DynNMF on 110, 592 × 400 data matrix each time, with 100 new
columns and 100 obsolete columns. The residual images track the
moving vehicle in the video.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF as a Clustering Method (Kuang and Park, in preparation)
Clustering and Lower Rank Approximation are related.
NMF for Clustering: Document (Xu et al. 03), Image (Cai et al. 08), Microarray (Kim & Park 07), etc.
Equivalence of objective functions between k-means and NMF:
(Ding, et al., 05; Kim & Park, 08)
min
n
X
kai − wSi k22 = min kA − WHk2F
i=1
Si = j when i-th point is assigned to j-th cluster (j ∈ {1, · · · , k })
k-means: W : k cluster centroids, H ∈ E
NMF: W : basis vectors for rank-k approximation,
H: representation of A in W space
(E: matrices whose columns are columns of an identity matrix )
NOTE: The equivalence of obj. functions holds when H ∈ E, A ≥ 0.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF and K-means
min kA − WHk2F s.t. H ∈ E
Paths to solution:
K-means: Expectation-Minimization
NMF: Relax the condition on H to H ≥ 0 with orthogonal rows or
H ≥ 0 with sparse columns - soft clustering
TDT2 text data set: (clustering accuracy aver. among 100 runs)
# clusters
2
6
10
14
18
K-means
0.8099 0.7295 0.7015 0.6675 0.6675
NMF/ANLS 0.9990 0.8717 0.7436 0.7021 0.7160
Sparsity constraint improves clusteringP
result (J. Kim and Park, 08):
minW ≥0,H≥0 kA − WHk2F + ηkW k2F + β nj=1 kH(:, j)k21
# of times achieving optimal assignment
(a synthetic data set, with a clear cluster structure ):
k
NMF
SNMF
3
69
100
6
65
100
9
74
100
12
68
100
15
44
97
NMF and SNMF much better than k-means in general.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
NMF, K-means, and Spectral Clustering (Kuang and Park, in preparation)
Equivalence of objective functions is not enough to explain the
clustering capability of NMF:
NMF is more related to spherical k-means, than to k-means
→ NMF shown to work well in text data clustering
Symmetric NMF: minS≥0 kA − SS T kF , A ∈ R+ n×n : affinity matrix
Spectral clustering → Eigenvectors (Ng et al. 01), A normalized if needed, Laplacian,...
Symmetric NMF (Ding et al.) → can handle nonlinear structure, and
S ≥ 0 natually captures a cluster structure in S
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Summary/Discussions
Overview of NMF with Frobenius norm and algorithms
Fast algorithms and convergence via BCD framework
Adaptive NMF algorithms
Variations/Extensions of NMF : nonnegative PARAFAC and
sparse NMF
NMF for clustering
Extensive computational comparisons
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Summary/Discussions
Overview of NMF with Frobenius norm and algorithms
Fast algorithms and convergence via BCD framework
Adaptive NMF algorithms
Variations/Extensions of NMF : nonnegative PARAFAC and
sparse NMF
NMF for clustering
Extensive computational comparisons
NMF for clustering and semi-supervised clustering
NMF and probability related methods
NMF and geometric understanding
NMF algorithms for large scale problems, parallel
implementation? GPU?
Fast NMF with other divergences (Bregman and Csiszar
divergences)
NMF for blind source separation? Uniqueness?
More theoretical study on NMF especially for foundations for
computational methods
NMF Matlab codes and papers available at
http://www.cc.gatech.edu/∼hpark and
http://www.cc.gatech.edu/∼jingu
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Collaborators
Today’s talk
Jingu Kim
CSE, Georgia Tech
Yunlong He
Math, Georgia Tech
Krishnakumar Balasubramanian
Prof. Michael Berry
Prof. Moody Chu
Dr. Andrzej Cichocki
Prof. Chris Ding
Prof. Lars Elden
Dr. Mariya Ishteva
Dr. Hyunsoo Kim
Anoop Korattikara
Prof. Guy Lebanon
Liangda Li
Prof. Tao Li
Prof. Robert Plemmons
Andrey Puretskiy
Prof. Max Welling
Dr. Stan Young
Da Kuang
CSE, Georgia Tech
CSE, Georgia Tech
EECS, Univ. of Tennessee
Math, North Carolina State Univ.
Brain Science Institute, RIKEN, Japan
CSE, UT Arlington
Math, Linkoping Univ., Sweden
CSE, Georgia Tech
Wistar Inst.
CS, UC Irvine
CSE, Georgia Tech
CSE, Georgia Tech
CS, Florida International Univ.
CS, Wake Forest Univ.
EECS, Univ. of Tennessee
CS, UC Irvine
NISS
Thank you!
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Related Papers by H. Park’s Group
H. Kim and H. Park, Sparse Non-negative Matrix Factorizations via
Alternating Non-negativity-constrained Least Squares for Microarray
Data Analysis. Bioinformatics, 23(12):1495-1502, 2007.
H. Kim, H. Park, and L. Eldén. Non-negative Tensor Factorization
Based on Alternating Large-scale Non-negativity-constrained Least
Squares. Proc. of IEEE 7th International Conference on Bioinformatics
and Bioengineering (BIBE), pp. 1147-1151, 2007.
H. Kim and H. Park, Nonnegative Matrix Factorization Based on
Alternating Non-negativity-constrained Least Squares and the Active
Set Method. SIAM Journal on Matrix Analysis and Applications,
30(2):713-730, 2008.
J. Kim and H. Park, Sparse Nonnegative Matrix Factorization for
Clustering, Georgia Tech Technical Report GT-CSE-08-01, 2008.
J. Kim and H. Park. Toward Faster Nonnegative Matrix Factorization: A
New Algorithm and Comparisons. Proc. of the 8th IEEE International
Conference on Data Mining (ICDM), pp. 353-362, 2008.
B. Drake, J. Kim, M. Mallick, and H. Park, Supervised Raman Spectra
Estimation Based on Nonnegative Rank Deficient Least Squares. In
Proceedings of the 13th International Conference on Information
Fusion, Edinburgh, UK, 2010.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications
Related Papers by H. Park’s Group
A. Korattikara, L. Boyles, M. Welling, J. Kim, and H. Park, Statistical
Optimization of Non-Negative Matrix Factorization. Proc. of The
Fourteenth International Conference on Artificial Intelligence and
Statistics, JMLR: W&CP 15, 2011.
J. Kim and H. Park, Fast Nonnegative Matrix Factorization: an
Active-set-like Method and Comparisons, Submitted for review, 2011.
J. Kim and H. Park, Fast Nonnegative Tensor Factorization with an
Active-set-like Method, In High-Performance Scientific Computing:
Algorithms and Applications, Springer, in preparation.
Y. He, J. Kim, A. Cichocki, and H. Park, Fast Adaptive NMF Algorithms
for Varying Reduced Rank and Dynamic Data, in preparation.
L. Li, G. Lebanon, and H. Park, Fast Algorithm for Non-Negative Matrix
Factorization with Bregman and Csiszar Divergences, in preparation.
D. Kuang and H. Park, Nonnegative Matrix Factorization for Spherical
and Spectral Clustering, in preparation.
Haesun Park [email protected]
Nonnegative Matrix Factorization: Algorithms and Applications