IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
1
Projective Nonnegative Graph Embedding
Xiaobai Liu1,2 , Shuicheng Yan2 , Hai Jin1
1
2
Huazhong University of Science and Technology, China
National University of Singapore, Singapore
{elelxb, eleyans}@nus.edu.sg, [email protected]
Abstract—We present in this paper a general formulation
for nonnegative data factorization, called projective nonnegative graph embedding (PNGE), which 1) explicitly decomposes the data into two nonnegative components favoring the
characteristics encoded by the so-called intrinsic and penalty
graphs [31], respectively, and 2) explicitly describes how to
transform each new testing sample into its low-dimensional
nonnegative representation. In the past, such a nonnegative
decomposition was often obtained for the training samples only,
e.g., nonnegative matrix factorization (NMF) and its variants,
nonnegative graph embedding (NGE) and its refined version
multiplicative nonnegative graph embedding (MNGE). Those
conventional approaches for out-of-sample extension either suffer
from the high computational cost or violate the basic nonnegative
assumption. In this work, PNGE offers a unified solution to
out-of-sample extension problem, and the nonnegative coefficient
vector of each datum is assumed to be projected from its original
feature representation with a universal nonnegative transformation matrix. A convergency provable multiplicative nonnegative
updating rule is then derived to learn the basis matrix and
transformation matrix. Extensive experiments compared with
the state-of-the-art algorithms on nonnegative data factorization
demonstrate the algorithmic properties in convergency, sparsity,
and classification power.
Index Terms—Graph Embedding; Nonnegative Matrix Factorization; Out-of-Sample; Face Recognition.
I. I NTRODUCTION
Nonnegative matrix factorization (NMF) [9] has proved
effective in searching for a set of nonnegative basis to characterize nonnegative multivariate data. NMF decomposes the
data matrix as the product of two matrices that possess only
nonnegative elements. Generally NMF results in a dimensionreduced representation of the original data, and thus belongs
to the techniques for feature extraction and dimensionality
reduction. More importantly, NMF can also be interpreted as
a part-based representation of the data because only additive,
not subtractive, combinations are allowed. Such a property
is achieved by the nonnegativity constraints imposed on both
basis matrix and coefficient matrix, unlike other data factorization methods, such as principle component analysis [26] and
independent component analysis (ICA) [10], which generally
learn holistic basis from the data.
NMF is related to a number of previous works on matrix
factorization, such as positive matrix factorization [20] and
various minimum-volume transforms used in the analysis
*This work was performed when Xiaobai Liu was a research engineer at
National University of Singapore. The research is jointly supported by the
AcRF Tier-1 Grant of R-263-000-464-112, Singapore, and the National High
Technology Research and Development Program of China (863 Program)
under grant No.2006AA01A115.
of sensing data. Recently, there are also some efforts to
extend and apply the method in diverse fields of science,
such as biomedical applications [15], [12], face and object
recognition [7], [22], color science [2], [21], polyphonic music
transcription [23], and so on. The work of Lee and Seung [18]
brought much attention to NMF in machine learning and data
mining community. Li et al. [16] imposed extra constraints to
solve the localized and part-based decomposition by extending
the standard NMF. Hazan et al. [11], [25] proposed to use
nonnegative tensor factorization for handling the data encoded
as high-order tensors. Wang et al. proposed the Fisher-NMF
[29], which was further studied by Kotsia et al. [14], by
adding an extra term of scatter difference to the objective
function of NMF. Recently, Yang et al. presented a general
solution, called nonnegative graph embedding (NGE) [32], for
nonnegative data factorization by integrating the characteristics
of both intrinsic and penalty graphs [31], and this work was
further refined in [27] with an efficient multiplicative updating
rule.
Despite the wide study and applications of NMF and its
variants, the fundamental out-of-sample extension problem,
namely, how to obtain the encoding coefficient vector for the
testing datum, has not been well exploited. There are three
classical ways for computing such coefficients. The first way
is to directly project the new testing samples on the subspace
spanned by the columns of the basis matrix which is factorized
from the training samples. The second way is to reconstruct
the testing sample with the training samples and then use the
derived reconstruction coefficients to combine the encoding
coefficient vectors of the training samples. In the third way,
given the feature basis matrix learnt from training samples,
one can use the updating rule for coefficient matrix in NMF
to optimize the objective function defined over testing samples.
These three methods, however, are far from satisfying, since
the first and second ones may violate the basic nonnegative
assumption for the encoding coefficients, while the third one
is generally time-consuming and is ineffective for NGE like
algorithms, which have extra terms towards discriminating
power in the objective function.
This work is dedicated to providing a general out-of-sample
extensible formulation for nonnegative data factorization, targeting at two characteristics: 1) the explicit projection from
original data representation to the nonnegative dimensionreduced feature space spanned by the column vectors of the
basis matrix, and 2) inheriting the unifying capability of the
graph embedding [31] for most conventional dimensionality
reduction algorithms. Similar to the NMF framework, our
objective function involves two nonnegative parameter matri-
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
ces. One is the basis matrix, and the other is the projection
matrix to transform each datum from the original feature
representation to the desired low-dimensional feature space.
Therefore, the expectations of this formulation are two folds.
The first is to derive sparse, part-based and localized decomposition of images within the context of image analysis. The
second is to learn the transformation matrix, which can be
directly used to obtain the encoding coefficient vectors of the
new testing samples. Compared with conventional ways for
out-of-sample extension, the proposed formulation can satisfy
the basic assumption on the nonnegativity of the encoding
coefficients, which is originally imposed by NMF. In addition,
we also present an efficient updating procedure to optimize
the basis matrix and transformation matrix iteratively, which
is multiplicative and without time-consuming operations like
matrix inverse in [31]. The correctness and convergency of the
proposed procedure are theoretically provable.
This paper is organized as follows. In Section II, we reformulate the nonnegative data factorization problem with the
consideration of both out-of-sample extension and utilization
of possible label information. The nonnegative multiplicative
updating rule along with the algorithmic convergency proof
are demonstrated in Section III. Analysis and discussion on
related works are presented in Section IV and the experiments
with comparisons over several public face datasets are reported
in Section V. We conclude this paper by discussing the future
work in Section VI.
II. F ORMULATION FOR P ROJECTIVE NONNEGATIVE
G RAPH E MBEDDING (PNGE)
In this section, we formulate the problem of multi-variate
data factorization within the framework of nonnegative data
decomposition. Let X = [x1 , x2 , . . . , xN ] denote the data
sample set, in which xi ∈ Rm denotes the feature descriptor
of the ith sample and N is the total sample number. Here,
we assume that the matrix X is nonnegative. Letting k denote
the dimension of the desired dimension-reduced feature space,
the task of data factorization is to derive a nonnegative basis
matrix W = [w1 , w2 , · · · , wk ] ∈ Rm×k and a nonnegative
encoding coefficient matrix H = [h1 , h2 , · · · , hN ] ∈ Rk×N
such that the data matrix X can be approximated as the
product of these two matrices. Note that we utilize in this work
the following rule to facilitate presentation: for any matrix A,
Ai means the ith row vector of A, its corresponding lowercase
version ai means the ith column vector of A, and Aij denotes
the element of A at the ith row and jth column.
A. Motivations
In this work, we re-investigate the NMF framework from
two aspects: 1) how to efficiently obtain the encoding coefficient vectors of new testing samples, and 2) how to perform
nonnegative data factorization targeting at both reconstruction
power and other specific purpose like discriminating power.
For the first question, letting y denote the original feature
vector of the new testing sample and hy denote the encoding
coefficient vector to derive, intuitively there may exist three
different solutions.
2
Method-I: basis matrix reconstruction based. Each testing
dample y is projected into the linear space spanned by the
column vectors of the basis matrix W , namely,
hy ≈ W † y,
(1)
where W † indicates the pseudo-inverse of matrix W . Although
this method has been widely used by a number of previous
works [27], [16], [19], it is however not able to guarantee the
non-negativity of hy , which thus violates the basic assumption
of nonnegative data factorization research.
Method-II: image reconstruction based. First, the datum
y is reconstructed with the training samples, and then we
obtain hy by combining the encoding coefficient vectors
of the training samples based on the derived reconstruction
coefficients. The reconstruction process can, but not limited
to, be formulated as an `2 -norm optimization problem, defined
as
α̃ = arg min ky − Xαk22
(2)
α
The above equation is convex with respect to the variable
α and thus there exists a globally optimal solution, namely
α̃ = (X T X)† X T y. Then, we can calculate hy by using the
reconstruction coefficients,
hy = H α̃.
(3)
Clearly, this method cannot ensure the nonnegativity of hy ,
either.
Method-III: update rule based. Intuitively the encoding
coefficient vector for a new testing sample y can be obtained
by using the same updating rule as in NMF, by assuming that
W is fixed. Formally, given the learnt basis matrix W , hy can
be obtained as
hy = arg min ||y − W hk2 , s.t.
h
h ≥ 0.
(4)
Compared to the previous two methods, the above optimization
procedure is able to obtain a nonnegative encoding coefficient
vector for the testing sample. This method is, however, not
effective for NGE-like algorithms, which may contain in the
objective function the additional regularization terms. Another
underlying issue is the possibly high computational cost in the
updating process, especially for large-scale applications.
In this work, we present a new solution to the out-ofsample extension problem towards three characteristics: 1)
low computational cost in deriving the encoding coefficient
vector for the testing sample; 2) ensuring the nonnegativity
of the encoding coefficient vector of the testing sample; and
3) being applicable to general nonnegative data factorization
problems towards different purposes. In this solution, the
encoding coefficient vector is assumed to be obtained by
transforming the original feature vector with a nonnegative
projection matrix. For the third property, we inherit our
previous work on multiplicative nonnegative graph embedding
framework [27] to integrate the purpose of data reconstruction
and other specific purpose like discriminating power into a
unified objective function. Then these two parts are combined
to form a new general formulation for out-of-sample extensible
nonnegative data factorization.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
3
B. Objective for Projective nonnegative Data Factorization
The basic idea of PNGE is to impose the extra constraint
that the coefficient of each data point lies within the subspace
spanned by the column vectors of one projection matrix.
Formally, letting H = [h1 , h2 , . . . , hN ] denote the encoding
coefficient matrix, we assume that the coefficient vector hi
is obtained by linear transformation from the sample xi as
follows,
hi = P x i
or
H = PX
(5)
where P ∈ Rk×m denotes the projection matrix to transform an m-dimensional feature vector into a k-dimensional
feature space. Then the objective function for nonnegative data
factorization can be defined as,
arg min ||X − W P X||2 , s.t.
W,P
W, P ≥ 0.
(6)
After obtaining the projection matrix P , for each new testing
sample y, we can easily obtain the corresponding encoding
coefficient vector hy ,
hy = P y.
(7)
This process is computationally efficient, and the obtained
hy is naturally nonnegative, which coincides with the basic
assumption on nonnegativity for NMF.
C. Objective for Graph Embedding
The criteria to guide the NMF is to minimize the data
reconstruction error, which is however often not the ultimate
purpose for many tasks. As stated in [31], most conventional
algorithms for feature extraction can be unified within a framework called graph embedding [31], and the certain purpose of
a specific algorithm is characterized by the so-called intrinsic
and penalty graphs. By following the work of multiplicative
nonnegative graph embedding [27], we formulate the PNGE
within the general graph embedding framework. More specifically, we divide the projection matrix H into two parts,
namely,
·
¸
·
¸
Ĥ
P̂
H=
= PX =
X;
(8)
H̃
P̃
where Ĥ = [ĥ1 , ĥ2 , . . . , ĥN ] ∈ Rq×N , which serves
for the certain purpose of graph embedding, and H̃ =
[h̃1 , h̃2 , . . . , h̃N ] ∈ R(k−q)×N , which contains the additional
information for data reconstruction. Note that Ĥ is expected to
be good for the purpose of graph embedding while the whole
H is used for data reconstruction purpose. Hence, the targets
of nonnegative data reconstruction and the purpose of graph
embedding coexist harmoniously and do not mutually compromise as in conventional formulations with two objectives.
Accordingly, the basis matrix W is also divided into two parts,
W = [Ŵ , W̃ ],
(9)
where Ŵ ∈ Rm×q and W̃ ∈ Rm×(k−q) . We consider W̃ as
the complementary space of Ŵ .
Let G = {X, S} be an undirected weighted graph with
vertex set X and real symmetric weight matrix S ∈ RN ×N .
The matrix S measures similarity between data pair, and is
assumed to be nonnegative. Its Laplacian matrix L and the
corresponding diagonal matrix D are defined as
X
L = D − S, Dii =
Sij , ∀ i.
(10)
j6=i
Graph embedding generally involves an intrinsic graph
G, which characterizes the favorite relationship among the
data samples, and a penalty graph Gp = {X, S p }, which
characterizes the unfavorable relationship among the data. As
defined in Eqn. (10), for graph Gp , the Laplacian matrix is
defined as Lp = Dp − S p , where Dp is the diagonal matrix.
Thus, we have two factorization targets to preserve graph
properties, defined as
(
P
p
maxĤ i6=j kĥi − ĥj k2 Sij
,
P
(11)
2
minĤ i6=j kĥi − ĥj k Sij ,
where ĥi is the ith column vector of Ĥ. Intuitively, when Sij is
larger (i.e., xi and xj are more similar), the kĥi − ĥj k2 should
be smaller if we expect to minimize the second objective
function in (11), namely, the ĥi and ĥj shall be more similar,
and thus the similarity property of intrinsic graph G shall
be preserved by {ĥi }. Similarly based on the first objective
function in (11), we can observe that the similarity property of
theP
penalty graph shall be P
avoided by {ĥi }. For aPcertain H,
p
p
as i6=j khi − hj k2 Sij
= i6=j kĥi − ĥj k2 Sij
+ i6=j kh̃i −
2 p
h̃j k Sij , maximizing the objective function with respect to
Ĥ on the penalty graph S p is equivalent to minimizing the
objective function with respect to the complementary part,
namely H̃. Based on this observation, we intuitively change
the objectives in Eqn. (11) into,
(
P
p
minH̃ i6=j kh̃i − h̃j k2 Sij
,
P
2
minĤ i6=j kĥi − ĥj k Sij ,
½
T r(H̃Lp H̃ T ) = T r(P̃ XLp X T P̃ T ),
(12)
=
T r(ĤLĤ T ) = T r(P̂ XLX T P̂ T ),
where T r(·) is the trace of a square matrix and AT denotes
the transform of matrix A.
D. Unified Formulation
To achieve the above two objectives for PNGE, we define
a unified objective function to optimize as follows,
min kX − W P Xk2 + λ[T r(P̂ XLX T P̂ T )
W,P
(13)
+T r(P̃ XLp X T P̃ T )], s.t.W, P ≥ 0,
where λ is a tunable parameter to balance the aforementioned
two terms. This formulation is however ill-posed, and the
objective has the trend to drive P̂ to be zero. This is the
issue also suffered by the formulation of Fisher-NMF [29].
As mentioned above, W is the basis matrix and hence it is
natural to require that the column vectors of W are normalized,
namely,
kwi k = 1, i = 1, 2, · · · , k,
(14)
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
4
where k · k denotes the norm of a vector.
This extra constraint makes the optimization problem more
complicated, and in this work, we compensate the norms of
the basis matrix into the coefficient matrix and get the final
objective function as,
min kX − W P Xk2 + λT r(Q̂P̂ XLX T P̂ T Q̂T ) +
W,P
λT r(Q̃P̃ XLp X T P̃ T Q̃T ), s.t. W, P ≥ 0,
(15)
where
where operator ◦ indicates the element-wise matrix multiplication, I indicates the identity matrix and the matrices Yw+
and Yw− are defined as,
¸
·
P̂ X(λD)X T P̂ T
0
◦ I,
(23)
Yw+ =
0
P̃ X(λDp )X T P̃ T
P̂ X(λS)X T P̂ T
0
◦ I.
Yw− =
(24)
p
T T
0
P̃ X(λS )X P̃
Thus, Eqn. (20) can be rewritten as
Q̂ = diag{kw1 k, kw2 k, · · · , kwq k},
(16)
Q̃ = diag{kwq+1 k, kwq+2 k, · · · , kwk k}.
(17)
Note that as the matrices S and S p are symmetric, the matrices
L and Lp are also symmetric. The objective function defined in
Eqn. (15) is biquadratic, and generally a closed-form solution
does not exist. In next section, we present a nonnegative
multiplicative updating procedure to solve the out-of-sample
extensible data factorization problem under nonnegative constraints.
III. M ULTIPLICATIVE I TERATIVE S OLUTION
The optimization of Eqn. (15) can actually be formulated
as a high-order optimization problem, which, although intractable, can be transformed into a set of tractable subproblems. Here, we also use this philosophy and optimize W
and P by a multiplicative nonnegative iterative procedure.
A. Preliminaries
F (W ) = kX − W P Xk2 + T r(W Yw W T ).
(25)
We integrate the nonnegative constraints into the objective
function with respect to W , and set φij as the Lagrange
multiplier for constraint Wij ≥ 0. Letting Φ = [φi,j ], the
Lagrange L(W ) is then defined as,
L(W ) = kX − W P Xk2 + T r(W Yw W T ) + T r(ΦW T ) (26)
By setting the derivation of L(W ) with respect to W as zero,
∂(L)
= −2XX T P T + 2W P XX T P T + 2W Yw + Φ. (27)
∂W
Along with the Karush-Kuhn-Tucker (KKT) condition [13] of
φij Wij = 0, we get the following equation,
= −(XX T P T )ij Wij + (W P XX T P T )ij Wij + (W Yw )ij Wij
= −(XX T P T )ij Wij + (W P XX T P T )ij Wij
+(W Yw+ )ij Wij − (W Yw− )ij Wij ,
= 0
(28)
which leads to the following update rule:
We first introduce the concept of auxiliary function and the
lemma which shall be used for the algorithmic deduction.
Definition Function G(A, A0 ) is an auxiliary function for
function F (A) if the following conditions are satisfied:
G(A, A0 ) ≥ F (A),
G(A, A) = F (A).
(18)
Wij ← Wij
(XX T P T + W Yw− )ij
.
(W P XX T P T + W Yw+ )ij
(29)
C. Convergence of Update Rule for W
We denote Fab as the part of F (W ) relevant to Wab , we
have
0
From the above definition, we have the following lemma Fab
(W ) = (−2XX T P T + 2W P XX T P T + 2W Yw )ab , (30)
00
with proof omitted [18].
Fab
(W ) = 2(P XX T P T + Yw )bb .
(31)
Lemma 3.1: If G is an auxiliary function, then F is nonThe auxiliary function of Fab is then designed as,
increasing under the update
At+1 = arg min G(A, At ),
A
(19)
where t means the tth iteration.
B. Optimize W for Given P
For a fixed P at the current iteration step, the objective
function in Eqn. (15) with respect to W can be written as,
F (W ) = kX − W P Xk2 + T r(Q̂P̂ X(λL)X T P̂ T Q̂T )
+T r(Q̃P̃ X(λLp )X T P̃ T Q̃T ). (20)
Let
·
Yw
¸
P̂ X(λL)X T P̂ T
0
◦I
0
P̃ X(λLp )X T P̃ T
= Yw+ − Yw− ,
=
(21)
(22)
t
t
0
t
G(Wab , Wab
) = Fab (Wab
) + Fab
(Wab )(Wab − Wab
)
t
T T
t
(W P XX P + W Yw+ )ab
t 2
+
(Wab − Wab
) .
t
Wab
(32)
Lemma 3.2: Eqn. (32) is an auxiliary function for Fab ,
namely the part of F (W ) relevant to Wab .
Proof: Since G(Wab , Wab ) = Fab (Wab ), we need only
t
to show that G(Wab , Wab
) ≥ Fab (Wab ).
First, we get the Taylor series expansion of Fab as,
t
0
t
t
Fab (Wab ) = Fab (Wab
) + Fab
(Wab
)(Wab − Wab
)
1 00
t
t 2
+ Fab (Wab
)(Wab − Wab
) .
2
Then, since
(W t P XX T P T )ab
(W t Yw+ )ab
t
≥ Wab
(P XX T P T )bb ,
t
≥ Wab
(Yw )bb
(33)
(34)
(35)
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
5
we have the following relation,
t
T
T
E. Convergence of Update Rule for P
t
(W P XX P + W Yw+ )ab
≥ (P XX T P T + Yw )bb . (36)
t
Wab
t
G(Wab , Wab
)
Thus,
≥ Fab (Wab ) holds.
Lemma 3.3: Eqn. (29) could be obtained by minimizing the
t
auxiliary function G(Wab , Wab
).
t
∂G(Wab ,Wab
)
Proof: Let
=
0, we have,
∂Wab
0
t
Fab
(Wab
)+2
(W t P XX T P T + W t Yw+ )i,j
t
(Wab − Wab
)
t
Wab
= 0 (37)
Then we can obtain the iterative update rule for P as,
(XX T P T + W t Yw− )ij
(W t P XX T P T + W t Yw+ )ij
and the lemma is proved.
Wijt+1 ← Wijt
(38)
D. Optimize P for Given W
After updating the matrix W , we normalize the column
vectors of W and consequently convey the norm to the
projective matrix P , namely,
Pi
wi
⇐
⇐
Pi × kwi k, ∀ i,
wi /kwi k, ∀ i.
(39)
(40)
Then based on the normalized W in Eqn. (40), the objective
function in Eqn. (15) with respect to P is simplified to be,
2
T
T
F (P ) = ||X − W P X|| + λT r(P̂ XLX P̂ )
+λT r(P̃ XLp X T P̃ T ), s.t. P ≥ 0.
(41)
Let ψij be the Lagrange multiplier for constraint Pij ≥ 0,
and Ψ = [ψij ], the Lagrange L is then defined as,
L(P ) = ||X − W P X||2 + λT r(P̂ XLX T P̂ T ) +
λT r(P̃ XLp X T P̃ T ) + T r(ΨP T )
(42)
Denote Fab as the part of F (P ) relevant to Pab , we have
0
Fab
(P ) = −2(W T XX T )ab + 2(W T W P XX T )ab
·
¸
P̂ XLX T
+2λ(
)ab , (47)
P̃ XLp X T
·
¸
XLX T
00
Fab
(P ) = 2(W T W )aa (XX T )bb + 2λ(
)bb . (48)
XLp X T
The auxiliary function of Fab (P ) is then designed as,
t
t
0
t
G(Pab , Pab
) = Fab (Pab
) + Fab
(Pab )(Pab − Pab
)
(49)
¸
·
t
T
P̂
XDX
)ab
(W T W P t XX T + λ
P̃ t XDp X T
t 2
+
(Pab − Pab
) .
t
Pab
Lemma 3.4: Eqn. (49) is an auxiliary function for Fab (P ),
namely the part of F (P ) relevant to Pab .
Proof: Since G(Pab , Pab ) = Fab (Pab ), we need only to
t
show that G(Pab , Pab
) ≥ Fab (Pab ).
First, we get the Taylor series expansion of Fab (P ) as,
t
0
t
t
Fab (Pab ) = Fab (Pab
) + Fab
(Pab
)(Pab − Pab
)
1 00 t
t 2
+ Fab (Pab )(Pab − Pab ) .
2
Then, since
(W T W P t XX T )ab
(P̂ t XDX T )ab
(P̃ t XDp X T )ab
(50)
t
≥ Pab
(W T W )aa (XX T )bb ,
≥ (P̂ t XLX T )bb ,
≥ (P̃ t XLp X T )bb .
we have the following relation,
·
¸
P̂ t XDX T
(W T W P t XX T + λ
)ab
P̃ t XDp X T
t
Pab
·
¸
XLX T
T
T
≥ (W W )aa (XX )bb + λ(
)bb ,
XLp X T
Thus, the partial derivative of L with respect to P is,
(51)
∂L
= −2W T XX T + 2W T W P XX T
∂P
t
·
¸
Thus, G(Wab , Wab
) ≥ Fab (Wab ) holds.
P̂ XLX T
+2λ
+
Ψ.
(43)
Lemma 3.5: Eqn. (46) could be obtained by minimizing the
p T
P̃ XL X
t
auxiliary function G(Pab , Pab
).
t
∂G(Pab ,Pab
)
Along with the Karush-Kuhn-Tucker (KKT) condition [13]
Proof: Let
=
0, we have,
∂Pab
of Ψij Pij = 0, we get the following equation,
·
¸
P̂ XDX T
T
t
T
−(W T XX T )ij Pij + (W T W P XX T )ij Pij
2(W W P XX + λ
)ab
·
¸
P̃ XDp X T
t
P̂ XLX T
(Pab − Pab
)
t
+λ
Pij = 0.
(44)
P
p T
ab
P̃ XL X
ij
0
t
+Fab
(Pab
) = 0 (52)
As L = D − S and Lp = Dp − S p , we have
Then we can obtain the iterative update rule for P as,
−(W T XX T )ij Pij + (W T W P XX T )ij Pij +
(45)
·
¸
·
¸
·
¸
P̂ t XSX T
T
T
P̂ XDX T
P̂ XSX T
(W
XX
+
λ
)ij
λ
Pij − λ
Pij = 0,
P̃ t XS p X T
P̃ XDp X T ij
P̃ XS p X T ij
t+1
t
Pij = Pij
(53)
·
¸
P̂ t XDX T
which leads to the updating rule as,
(W T W P t XX T + λ
)
ij
·
¸
P̃ t XDp X T
P̂ XSX T
T
T
(W XX + λ
)ij
and the lemma is proved.
P̃ XS p X T
Pij = Pij
.
(46)
·
¸
Algorithm 1 summarizes the entire procedure for PNGE.
P̂ XDX T
T
T
The
² = 10−9 in each update rule is added to avoid division by
(W W P XX + λ
)ij
P̃ XDp X T
zero. The minimization of the proposed formula is conducted
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
6
Algorithm 1. Multiplicative procedure for Projective nonnegative Graph Embedding
Inputs : a set of samples X = [x1 , . . . , xN ] ∈ Rm×N ;
Outputs : Feature basis matrix W ∈ Rm×k and projection
matrix P ∈ Rk×m ;
1) Initialize W=rand(m,k) and P=rand(k,m);
2) For t=1: Tmax
(XX T P T +W t Y
)
ij
a) Wij ⇐ Wij (W t P XX T P T +W t Yw−
, where ² =
w+ )ij +²
−9
10 .
b) Pi ⇐ Pi × kwi k, ∀ i,
wi ⇐ wi /kwi k, ∀ i.
2
3
P̂ XSX T 5
(W T XX T +λ4
)ij
P̃ XS p X T
3
2
,
c) Pij ⇐ Pij
P̂ XDX T 5
+²)ij
(W T W P XX T +λ4
P̃ XDp X T
−9
where ² = 10 .
3) Output the basis matrix W and the projection matrix P .
by a multiplicative iterative procedure and similar optimization
strategy is also adopted in the previous work by Lee and Seung
[18]. Actually, the iteration procedure can only guarantee the
gradient descent property, and it is difficult to justify that any
limit points are the stationary point, although stationary is the
necessary condition for achieving local minima. Lin et. al.
[17] have slightly modified the original iterative algorithm to
achieve the convergence safely. Their modification, however,
shall increase the computational complexity while achieving
the similar performance, as reported in [17]. Therefore, in
this work, we do not adopt the modification and experiment
results show that Algorithm 1 could usually converge to local
minima.
IV. R ELATED W ORKS
There exists quite a few works related to the proposed
projective NGE algorithm, including the standard NMF [18],
[19] and its variants [16], [5], [30], and the supervised learning
formulations [31], [27].
A. NMF and Its Variants
The goal of NMF is to decompose the nonnegative data as
the product of a basis matrix and another encoding coefficient
matrix by imposing the nonnegative constraints on both, and
thus only non-subtractive combinations are allowed. Currently,
there is also a number of variants of NMF methods. Li et al.
in [16] propose the localized NMF (LNMF), which imposes
the extra feature sparsity constraints on basis components to
make the reduced feature space suitable for tasks where feature
localization is important. They also provide an updating procedure to search the optimal solution, and LNMF can produce
sparse and localized basis components. Ding et al. proposed
the convex NMF (CNMF) [5], to restrict the feature basis to
be the convex combination of the samples. One should note
that the basic motivation of using projection matrix is different
between PNGE and CNMF, since the former focuses on how
to project the new testing sample into the learnt subspace.
Another variant of NMF is defined as to solve the following
optimization problem,
arg min kX − W W T Xk2 , s.t. W ≥ 0.
W
(54)
Yuan et al. provide in [30] the updating rule, as follows,
Wij
Wij
2(XX T W )ij
,(55)
(W W T XX T W + XX T W W T W )ij
= Wij /kwj k,
(56)
= Wij
where kwj k denotes the norm of the column vector wj . With
this tri-factor factorization, the out-of-sample extension can
be directly derived by using the matrix W for projection.
However, the update rules defined in Eqn. (55) cannot guarantee the objective function value decreases monotonously,
because the normalization of W after each update step may
increase the objective function value. Although the basic idea
of projection is common with the proposed PNGE algorithm in
this work, our method is characterized by: 1) PNGE provides
a unified formulation to integrate the purpose of nonnegative
data factorization and the specific purpose of graph embedding
and 2) PNGE may achieve more discriminative power than
PNMF does as validated in the experiment part.
B. Nonnegative Graph Embedding
Another work related to PNGE is our previous work on nonnegative graph embedding (NGE) [32] and its refined version,
multiplicative nonnegative graph embedding (MNGE) [27].
The graph embedding framework [31] provides a unified
formulation for dimensionality reduction, and Yang et al.
[32] extended this work by introducing additional nonnegative
constraints and proposed to integrate the data factorization
with the purpose of general graph embedding. Despite of
the mathematic soundness of NGE, its computational cost is
very high due to the matrix inverse calculation. Wang et
al. [27] further extended this work to MNGE by introducing
an efficient multiplicative update rules. MNGE achieves a
satisfactory performance whereas it cannot be directly applied
for obtaining the encoding coefficient vector for new testing
sample. To address this issue, we further investigate in this
work a new out-of-sample extensible formulation for nonnegative data factorization.
V. E XPERIMENTS
In this section we systematically evaluate the proposed
projective nonnegative graph embedding (PNGE) algorithm
for face recognition task and evaluate its properties in terms
of algorithmic convergency, basis sparsity, and classification
capability of the derived encoding coefficient vectors.
A. Datasets
Four different publicly available databases, YALE 1 ,
ORL [3], CMU PIE [24], and YALE-B [8], are used for the
experiments. The Yale face database contains 165 grayscale
images of 15 individuals. There are 11 images per subject, one
1 http://cvc.yale.edu/projects/yalefaces/yalefaces.html
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
7
Fig. 1. Exemplar images from ORL (1-st row), CMU PIE (2-nd row), YALE
(3-rd) and YALE-B datasets (4-th row). Each row shows ten images captured
at different situations for one single people.
B. Baselines and Settings
We use following popular dimensionality reduction and
nonnegative learning algorithms for comparisons: 1) principal component analysis (PCA) [26]; 2) the standard
NMF [18][19]; 3) localized NMF (LNMF) [16]; 4) projective
NMF (PNMF) [30]; 5) linear discriminant analysis (LDA) [1];
6) marginal fisher analysis (MFA) [31]; 7) multiplicative nonnegative graph embedding (MNGE) [27], and 8) the proposed
projective nonnegative graph embedding (PNGE). Herein, the
algorithms (1 − 4) are unsupervised while the algorithms
(5 − 8) are supervised ones. As NGE is essentially the same
as MNGE in formulation, we do not further report the results
from NGE. CNMF is designed for clustering purpose, and thus
we do not evaluate it for face recognition task, either.
For comparisons, we also simplify the proposed PNGE
to fit the unsupervised setting, named Unsupervised PNGE
(UPNGE), by removing the latter two terms in the objective
function defined in Eqn. (15). The objective function is then
defined as,
arg min kX − W P Xk2 , s.t. W, P ≥ 0.
W,P
(57)
Accordingly, we can optimize the factorization by using following update rules,
(XX T P T )ij
Wij = Wij
,
(W P XX T P T )ij
(W T XX T )ij
Pij = Pij
.
(W T W P XX T )ij
(58)
(59)
1000
Objection Function Value
per different facial expression or configuration: center-light,
with-glasses, happy, left-light, without-glasses, normal, rightlight, sad, sleepy, surprised, and wink. The ORL database
contains 10 different images for 40 distinct subjects. All the
400 images have been captured against a dark homogeneous
background with the subjects in an upright, frontal position
with tolerance for some side movement. The CMU PIE
dataset contains more than 40,000 facial images of 68 people.
In this work, we use a subset of 5 near frontal poses (C05,
C07, C09, C27, and C29) and illuminations indexed as 08
and 11 is used, and therefore each person has ten images. For
the YALE-B dataset, which is collected by Athinodoros et
al. [8], we choose a subset of images for 38 individuals with
64 near frontal facial images under different illuminations per
individual. Fig. 1 shows some exemplar images from the above
four datasets, in which each row shows ten images captured
at different conditions for one single people.
800
600
400
200
0
0
500
1000
1500
2000
2500
3000
3500
4000
Iteration Numbers
Fig. 2.
Convergence of our proposed method. Objective function value
decreases along with the increasing of the iteration number. The reduced
dimension is set as 100 and the turning factor is set as λ = 0.8. The data
used are from the ORL dataset.
The convergency proof of the above updating rules is similar
with that of PNGE in Section III.
Face recognition based on the learnt dimension-reduced
feature subspace from various algorithm is performed as
follows. i) Feature extraction. Each training face image xi
is projected into the dimension-reduced feature space as a
feature vector hi , i = 1, . . . , N . A query face image y to be
classified is represented by its projection into the same feature
space, denoted as hy . ii) Nearest neighbor classification. The
Euclidean distance between the query and each training datum,
d(hy , hi ), is calculated. Thus, the query image is assigned to
the class to which the closest training datum belongs.
For the MFA, MNGE, PNGE algorithms, the intrinsic graph
and penalty graph are set as the same, where the number of
nearest kindred neighbors of each data point is fixed to be
min(nc − 1, 3), where nc is the sample number for the c-th
class and the concerned datum is assumed to belong to the c-th
class, and the number of shortest pairs from different classes
is set as 20 for each class in this work.
We evaluate these algorithms on the above four face
datasets. All the face images are manually aligned by fixing
the locations of two eyes and cropped. The size of each
cropped image is 32 × 32 pixels, with 256 gray levels per
pixel. The features (pixel values) are then scaled to be within
[0, 1] (divided by 256). Thus, each image is represented by
a 1024-dimensional vector. We randomly select half of the
images from each class/subject and use them for training, and
the remaining images for testing. The reported accuracies are
the mean values of those calculated from five random splits
of the data into training and testing subsets.
For all above learning algorithms, the dimension of the
subspace is tuned within k ∈ {6 × 6, 7 × 7, . . . , 10 × 10}.
For MNGE and PNGE, we divide both the basis matrix and
the coefficient matrix into two parts and set the size of the first
part as q = 0.6×k. The other free parameter λ is tuned among
an empirical {0.2, 0.4, 0.6, . . . , 2.4} and the optimal value is
determined by the 10-fold cross-validation.
C. Algorithmic Convergency and Sparsity
As proved in previous section, the update rule in Algorithm 1 guarantees a locally optimum solution for our objective
function defined in Eqn. (15). Fig. 2 shows how the objective
function value decreases with the increase of the iteration
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
number on the ORL dataset. The reduced dimension is set
as k = 100 (q=0.6 × k) and the turning factor is set as
λ = 0.8. We can observe that, generally, PNGE will converge
after about 4000 iterations. We implement the algorithm using
MATLAB 2008a and conduct the experiment on a computer
with Intel(R) Core(TM)2 Duo 2.66GHz CPU and 8GB of
RAM. Averagely, each iteration of PNGE costs about 0.5
seconds. We set the maximum iterations number of the NMFrelated algorithms as 4000 and keep it constant in all the
following experiments.
We also evaluate in this experiment the sparsity property
of the PNGE related algorithms. Fig. 3 shows the resulting
feature basis components of NMF, LNMF, PNMF, MNGE and
PNGE for feature dimension of 100. Higher pixel values are
in the darker colors. From the basis components learnt from
different algorithms, we can observe that: 1) NMF basis are
generally nearly holistic, and less sparser than those from the
other four algorithms, which is also observed by Li et al. in
[16]; 2) both LNMF and PNMF can produce localized, partbased components and thus achieve sparse representations for
the input images; and 3) NGE and PNGE can also produce
sparse basis, which are however sort of combination of several
localized regions, instead of a small single region as in LNMF
and PNMF. The sparsity property of PNGE makes it potentially more robust to different affects, such as pose changes,
illumination changes and expression variations as well as
image misalignments, which shall be validated in the next
subsections. Besides, compared to MNGE, PNGE provides a
general way for obtaining the nonnegative dimension-reduced
encoding coefficient vectors for the new testing images.
D. Exp-I: Face Recognition via Unsupervised Learning
To evaluate the discriminating power of the proposed PNGE
algorithm, we compare it with four popular unsupervised subspace learning algorithms: PCA [26], NMF [18], LNMF [16]
and PNMF [30], on the face recognition task. Among the NMF
based methods, PNMF and PNGE can be directly used to
obtain the representation of the testing samples by using the
projection matrix P for PNGE and matrix W T for PNMF. For
NMF and LNMF, we use the three methods defined in Eqn. (1),
Eqn. (3), and Eqn. (4) respectively as discussed in Section II
to derive the dimension-reduced encoding coefficient vectors
for the testing samples.
The comparison results of different algorithms on the
ORL [3], CMU PIE [24], YALE and YALE-B [8] datasets
are reported in Table I. Each column contains the accuracies
of different algorithms for one dataset and the performance
winner is indicated with bolded font. From these results, we
could obtain following observations. 1) The performance of
PCA is usually lower than other NMF-based algorithms, which
demonstrates the potential discriminative power of nonnegative
data factorization. 2) UPNGE outperforms PNMF over all
the four datasets, though both of them utilize the projection
matrix to obtain the dimension-reduced representations. In
addition, we can observe that the method III for processing
out-of-samples slightly outperforms the other two methods
on average, since it can guarantee the nonnegativity of the
8
encoding coefficient vectors. This also validates the benefits
of our novel formulation towards nonnegativity of the derived
encoding coefficient vectors. Fig. 4 shows the face recognition
accuracies of the above five algorithms with different reduced
dimensions on the four datasets. For NMF and LNMF, we only
report the results of using Method-I to obtain the encoding
coefficient vectors of the new testing samples.
E. Exp-II: Face Recognition via Supervised Learning
We also evaluate the classification capability of our proposed algorithm by comparing with the popular supervised
subspace learning algorithms, including LDA [1], MFA [31]
and MNGE [27] based on the graphs of MFA. For MNGE,
we use the method-III, namely, update rule based method, to
obtain the encoding coefficient vectors of the testing images.
We do not use the method-I and II, since these two methods
cannot guarantee nonnegativity and the reconstruction criteria
of them are inconsistent with that for supervised formulation of
MNGE. We apply the nearest neighbor approach to recognize
the testing samples in the dimension-reduced feature space.
Table II shows the comparison results of different algorithms on ORL, PIE, YALE and YALE-B datasets. We
can have the following observations. First, the accuracies of
MNGE and PNGE are slightly higher than those of LDA and
MFA. While all the four algorithms utilize the label information, the nonnegative property is able to bring additional
potential in discriminating power. Second, PNGE outperforms
on average MNGE over three of the four datasets, which shows
that the novel formulation in PNGE can provide not only
a solution for new testing samples, bust also a competitive
classification capability. Besides, compared with the results
in Table-1, we can clearly observe that the performances of
the supervised subspace learning algorithms are much better
than those of the unsupervised ones. Fig. 5 further shows the
face recognition accuracies of the above four algorithms with
different reduced dimensions on the four datasets.
VI. C ONCLUSIONS AND F UTURE W ORK
We would like to conclude this work by discussing some
aspects of our proposed PNGE framework and mentioning
some future research directions:
1) The proposed PNGE framework can reap the benefits
of both nonnegative data factorization and the purpose
of graph embedding. The experiments well validate the
basic idea and show that our proposed algorithm can
outperform the popular subspace learning algorithms,
under both supervised and unsupervised settings.
2) This work provides an extensive proof on the correctness
and convergency of the proposed update procedure.
Actually, the formulation in Eqn. (15) can be generalized
to add other quadratic terms, means that the formulation
in PNGE is ready to be extended to incorporate other
graph embedding algorithms. We will further investigate
this topic in our future work.
3) PNGE is currently limited to the linear projections, and
those non-linear techniques (e.g. kernel tricks), may
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
9
F
M
N
F
M
N
L
F
M
N
P
E
G
N
M
E
G
N
P
Fig. 3. Basis components of NMF(1-st row), LNMF(2-nd row), PNMF(3-rd row), MNGE (4-th row) and PNGE (5-th row). The data used are from the ORL
dataset.
TABLE I
FACE RECOGNITION ACCURACIES (%) OF DIFFERENT UNSUPERVISED LEARNING ALGORITHMS . T HREE METHODS ARE USED TO OBTAIN THE
DIMENSION - REDUCED ENCODING COEFFICIENT VECTORS OF THE NEW TESTING SAMPLES , INCLUDING : I) BASIS MATRIX RECONSTRUCTION BASED ; II)
IMAGE RECONSTRUCTION BASED , AND III) UPDATE RULE BASED . T HE PROPOSED PNMF CAN BE DIRECTLY APPLIED TO OBTAIN THE ENCODING
COEFFICIENT VECTORS OF THE NEW TESTING SAMPLES BASED ON THE PROJECTION MATRIX . I N THE FIRST COLUMN , WE INDICATE THE ALGORITHM TO
EVALUATE . I N THE REMAINING COLUMNS , WE INDICATE THE AVERAGE VALUE AND STANDARD DEVIATIONS ( IN THE PARENTHESES ) OF THE RESULTS
FROM FIVE RANDOM SPLITS OF THE DATASETS . T HE REDUCED FEATURE DIMENSION IS SET AS k = 100.
Algorithm
PCA
NMF(I)
NMF(II)
NMF(III)
LNMF(I)
LNMF(II)
LNMF(III)
PNMF
UPNGE
ORL
84.42(±2.34)
83.35(±3.15)
85.61(±2.18)
86.12(±2.27)
87.50(±1.20)
87.66(±2.25)
83.66(±1.34)
81.55(±3.46)
88.12(±2.26)
PIE
79.43(±1.04)
82.44(±3.21)
83.31(±1.48)
82.31(±1.58)
85.34(±2.98)
84.56(±2.37)
86.88(±3.04)
82.89(±4.15)
85.11(±3.15)
YALE
52.10(±4.56)
54.74(±3.68)
54.57(±3.13)
57.98(±3.34)
58.33(±1.34)
60.59(±2.34)
61.44(±3.15)
55.34(±4.65)
67.62(±2.23)
YALE-B
80.44(±0.63)
82.56(±3.23)
82.69(±3.44)
82.33(±4.31)
81.67(±2.88)
80.34(±2.86)
82.33(±3.09)
80.37(±5.29)
85.36(±4.40)
TABLE II
FACE RECOGNITION ACCURACIES (%) OF DIFFERENT SUPERVISED LEARNING ALGORITHMS . T HE REDUCED FEATURE DIMENSION IS SET AS k = 100.
F OR MNGE AND PNGE, THE OPTIMAL VALUE OF THE PARAMETER λ IS DETERMINED BY THE 10- FOLD CROSS - VALIDATION .
Algorithm
LDA
MFA
MNGE
PNGE
ORL
93.25 (±3.38)
94.00 (±2.74)
93.04 (±4.12)
94.35 (±4.10)
PIE
94.28 (±2.31 )
95.45 (±3.44)
95.65 (±3.16)
97.85 (±3.10)
YALE
80.47 (± 4.42)
82.67 (±3.45)
82.78 (±4.01)
84.21 (±4.54)
YALE-B
93.47 (±1.06)
92.36 (±2.71)
94.00 (±3.22)
94.76 (±2.03)
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
10
PCA
NMF(I)
LNMF(I)
0.9
0.9
0.85
0.85
0.8
0.8
0.75
0.75
0.7
0.65
0.7
0.6
0.6
0.55
36
49
64
Dimension
UPNGE
0.65
0.55
0.5
PNMF
(b) PIE dataset
Accuracy
Accuracy
(a) ORL dataset
81
0.5
100
36
49
(c) YALE dataset
64
Dimension
81
100
81
100
(d) YALE-B dataset
0.9
0.85
0.6
0.8
0.55
0.75
Accuracy
Accuracy
0.65
0.5
0.45
0.7
0.65
0.4
0.6
0.35
0.55
0.3
36
49
64
Dimension
81
0.5
100
36
49
64
Dimension
Fig. 4. Face Recognition accuracies over different feature dimensions for PCA, NMF(I), LNMF(I), PNMF and UPNGE algorithms on (a) ORL, (b) PIE, (c)
YALE and (d) YALE-B datasets. For each dimension, the mean accuracy calculated from five random splits of the dataset for training and testing is reported.
LDA
MFA
1
0.95
0.95
0.9
0.9
0.85
0.85
0.8
0.75
0.8
0.75
0.7
0.7
0.65
0.65
0.6
36
49
64
Dimension
PNGE
(b) PIE dataset
1
Accuracy
Accuracy
(a) ORL dataset
MNGE
81
0.6
100
36
49
(c) YALE dataset
64
Dimension
81
100
81
100
(d) YALE-B dataset
1
0.95
0.85
0.9
Accuracy
Accuracy
0.8
0.75
0.7
0.85
0.8
0.75
0.7
0.65
0.6
0.65
36
49
64
Dimension
81
100
0.6
36
49
64
Dimension
Fig. 5. Face recognition accuracies over different feature dimensions for LDA, MFA MPNGE and PNGE algorithms on (a) ORL, (b) PIE, (c) YALE and
(d) YALE-B datasets. For each dimension, we report the mean accuracy calculated from five random splits of the dataset for training and testing. For MNGE
and PNGE, the optimal value of the parameter λ is determined by the 10-fold cross-validation.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010
further boost the algorithmic performance. We shall
investigate it in our future work.
4) Another further research direction is how to extend the
current framework for tensor-based nonnegative data
decomposition.
R EFERENCES
[1] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces:
recognition using class specific linear projection. IEEE Trans. Patern
Analysis and Machine Intelligence, pp. 711-720, 2002.
[2] G. Buchsbaum and O. Bloch. Color Categories Revealed by Nonnegative
Matrix Factorization of Munsell Color Spectra. Vision Research, vol. 42,
pp. 559-563, 2002.
[3] D. Cai, X. He, Y. Hu, J. Han, and T. Huang, Learning a Spatially Smooth
Subspace for Face Recognition. Proc. IEEE International Conference on
Computer Vision and Pattern Recognition, 2007.
[4] M. Catral, L. Han, M. Neumann and RJ. Plemmons. On reduced rank
nonnegative matrix factorization for symmetric nonnegative matrices.
Linear Algebra and Its Applications, vol. 393, pp. 107-126, 2004.
[5] C. Ding, T. Li and M. Jordan. Convex and Semi-Nonnegative Matrix
Factorizations. IEEE Trans. Pattern Analysis and Machine Intelligence,
2008.
[6] C. Ding, X. He, and H. Simon. On the equivalence of nonnegative matrix
factorization and spectral clustering. In Proc. SIAM Data Mining Conf,
pp. 1-8, 2005.
[7] D. Guillamet and J. Vitria. Nonnegative Matrix Factorization for Face
Recognition. In Proc. Conf. Topics in Artificial Intelligence, pp. 336-344,
2002.
[8] A. Georghiades, P. Belhumeur and D. Kriegman. From Few to Many:
Illumination Cone Models for Face Recognition under Variable Lighting
and Pose. IEEE Trans. Pattern Analysis Machine Intelligence, vol. 23,
no. 6, pp. 643-660, 2001.
[9] P.O. Hoyer. Non-negative matrix factorization with sparseness constraints.
Machine Learning Research, vol. 5. pp. 1457-1469, 2004.
[10] A. Hyvarinen, J. Karhunen and E. Oja. Survey on independent component analysis. Neural Computing Surveys, vol. 2, pp. 94-128, 1999.
[11] T. Hazan, S. Polak, and A. Shashua. Sparse image coding using a 3d
non-negative tensor factorization. In Proc. International Conference on
Computer Vision, vol. 1, pp. 50-57, 2005.
[12] A. Heger and L. Holm. Sensitive Pattern Discovery with ’fuzzy Alignments of Distantly Related Proteins. Bioinformatics, vol. 19, no. 1, pp.
130-137, 2003.
[13] H. Kuhn, and A. Tucker. Nonlinear programming. In Prof. 2nd Berkeley
Symposium, 1951.
[14] I. Kotsia, S. Zafeiriou, and I. Pitas. A Novel Discriminant Non-Negative
Matrix Factorization Algorithm With Applications to Facial Image Characterization Problems. IEEE Trans. Information Forensics and Security,
vol. 2, pp. 588-595, 2007.
[15] P. Kim and B. Tidor. Subsystem Identification through Dimensionality
Reduction of Large-Scale Gene Expression Data. Genome Research, vol.
13, pp. 1706-1718, 2003.
[16] S. Li, X. Hou, H. Zhang, and Q. Cheng. Learning spatially localized,
parts-based representation. In Proc. IEEE Computer Vision and Pattern
Recognition, pp. 207-212, 2001.
[17] Chih-Jen Lin. On the Convergence of Multiplicative Update Algorithms
for Non-negative Matrix Factorization. IEEE Transactions on Neural
Networks, vol. 18, no. 6, pp. 1589-1596, 2007.
[18] D. Lee and H. Seung. Learning the parts of objects by nonnegative
matrix factorization. Nature, vol. 401, pp. 788-791, 1999.
[19] D. Lee and H. S. Seung. Algorithms for non-negatvie matrix factorization. In Advances in Neural Information Processing Systems, vol. 13, pp.
1-8, 2001.
[20] P. Paatero. Least squares formulation of robust non-negative factor
analysis. Chemometrics and Intelligent Laboratory Systems, vol. 37, pp.
23-35, 1997.
[21] R. Ramanath, R. Kuehni, W. Snyder, and D. Hinks. Spectral Spaces and
Color Spaces. Color Research and Application, vol. 29, pp. 29-37, 2004.
[22] R. Ramanath, W. Snyder, and H. Qi. Eigenviews for Object Recognition
in Multispectral Imaging Systems. In Proc. Applied Imagery Pattern
Recognition Workshop, 2003.
[23] P. Smaragdis and J. Brown. Nonnegative Matrix Factorization for
Polyphonic Music Transcription. In Proc. IEEE Workshop Applications
of Signal Processing to Audio and Acoustics, 2003.
11
[24] Terence Sim, Simon Baker and Maan Bsat. The CMU Pose, Illumination, and Expression Database. IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 25, no. 12, pp. 1615-1618, 2003.
[25] A. Shashua and T. Hazan. Non-Negative Tensor Factorization with
Applications to Statistics and Computer Vision. In Proc. Internation
Conference on Machine Learning, vol. 1, pp. 792-799, 2005.
[26] M. Turk and A. Pentland. Eigenfaces for recognition. Cognition Neuroscience, vol. 3, pp. 71-86, 1991.
[27] C. Wang, Z. Song, S. Yan, L. Zhang, and H. zhang. Multiplicative
NonNegative Graph Embedding. In Proc. IEEE Conference on Computer
Vision and Pattern Recognition, 2009.
[28] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face
recognition via sparse representation. IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 31, no. 2, pp.210-227, 2009.
[29] Y. Wang, Y. Jiar, C. Hu, and M. Turk. Fisher non-negative matrix
factorization for learning local features. In Proc. Asian Conference on
Computer Vision, pp. 1-8, 2004.
[30] Z. Yuan and E. Oja. Projective nonnegative matrix factorization for image compression and feature extraction. In Proc Scandinavian Conference
on Image Analysis, pp. 333-342, 2005.
[31] S. Yan, D. Xu, B. Zhang, Q. Yang, H. Zhang, and S. Lin. Graph
Embedding and Extensions: A General Framework for Dimensionality
Reduction. IEEE Trans. Pattern Analysis Machine Intelligence, vol. 29,
pp. 40-51, 2007.
[32] J. Yang, S. Yan, Y. Fu, X. Li, and T. Huang. Non-Negative Graph
Embedding. IEEE Conf. on Computer Vision and Pattern Recognition,
2008.
[33] D. Zhang, Z. Zhou, and S. Chen. Non-negative Matrix Factorization
on Kernels. In Proc. Pacific Rim International Conference on Artificial
Intelligence (PRICAI), Guilin, China, LNAI 4099, pp.404-412, 2006.
© Copyright 2026 Paperzz