LEARNING THE SPARSE REPRESENTATION FOR

LEARNING THE SPARSE REPRESENTATION FOR CLASSIFICATION
Jianchao Yang, Jiangping Wang, and Thomas Huang
Beckman Institute, University of Illinois at Urbana-Champaign, USA
{jyang29, jwang63, huang}@ifp.illinois.edu
ABSTRACT
In this work, we propose a novel supervised matrix factorization method used directly as a multi-class classifier. The
coefficient matrix of the factorization is enforced to be sparse
by ℓ1 -norm regularization. The basis matrix is composed of
atom dictionaries from different classes, which are trained in a
jointly supervised manner by penalizing inhomogeneous representations given the labeled data samples. The learned basis
matrix models the data of interest as a union of discriminative
linear subspaces by sparse projection. The proposed model is
based on the observation that many high-dimensional natural
signals lie in a much lower dimensional subspaces or union of
subspaces. Experiments conducted on several datasets show
the effectiveness of such a representation model for classification, which also suggests that a tight reconstructive representation model could be very useful for discriminant analysis.
Index Terms— Sparse representation, sparse coding, matrix factorization, dictionary training, face recognition, digit
recognition.
1. INTRODUCTION
In machine learning and computer vision, data analysis is often desired to find a proper data representation suited for subsequent data clustering or classification tasks. Principal Component Analysis (PCA) [1] aims to find a low rank orthonormal basis that best reconstructs the original data in the ℓ2 norm sense. As a data processing step, PCA is widely applied
in many classification tasks, e.g., Eigenfaces [2]. Motivated
by the psychological and physiological evidences for partsbased representation of mental objects in the human brain [3],
nonnegative matrix factorization (NMF) that captures the notion of nonsubtractive parts was proposed. The nonnegativity property of NMF and its extensions turns out to be useful
for learning meaningful parts-based representations for various applications, such as face recognition, text mining, and
collaborative filtering. Reconstructive representations, e.g.,
PCA and NMF, can achieve compact representations that are
most effective for signal processing, such as compression and
denoising, but not necessarily for classification. For classification purposes, such reconstructive data representations
have to work with subsequent discriminant models. However,
such a two-step approach does not ensure the optimality of
the choice of a discriminative model.
On the other hand, discriminative representations, such
as linear discriminant analysis (LDA) [4], are usually better suited for classification tasks. The difference between
reconstruction and discrimination has been widely investigated in the patter recognition literature. While both methods have been broadly applied in classification, the discriminative methods have often outperformed the reconstructive
methods [5] [6]. Many previous works on subspace learning [7] and metric learning [8] seek the partial data representations that are best for recognition. However, discriminative
models are usually sensitive to noise, missing data, and outliers [9], and are prone to be overfitting. Further more, these
discriminative models almost surely reduce the entropy of the
original data, and thus the derived representations will only
increase the Bayesian error.
Recently, there has been growing interests in sparse modeling of natural high-dimensional signals. Sparse representations are representations that account for all or most of the
information of a signal with a linear combination of a small
number of elementary signals called atoms, which are usually
chosen from an over-complete dictionary. The over-complete
dictionary allows more representation flexibility and efficiency. Different from conventional inductive orthonormal
basis, such as Wavelets and PCA, searching for the sparse
representation of a signal over an over-complete dictionary
is achieved by optimizing an objective function that consists
of two terms: one that measures the reconstruction error and
the other that measures the sparsity of the representation. The
sparsity of the representation is usually measured by ℓ0 -norm,
which counts the number of the nonzero coefficients. However, solving an ℓ0 -norm minimization is NP-hard, and much
recent process has centered around ℓ1 /ℓ0 equivalence in high
dimensional spaces [10]. In the past few years, sparse representation by ℓ1 minimization has been applied in many tasks,
including image denoising [11], image super-resolution [12],
graph learning [13], face recognition [14], image classification [15] [16].
The success of sparse representation in various areas has
been largely owed to the fact that natural high-dimensional
signals are sparse signals lying in a much lower dimensional
space. Based on the observation that many real data belong-
ing to the same class, e.g., faces or digits, actually live in the
same subspace of a much lower dimension, a new test sample
can be well represented by the training samples from the same
class, which naturally leads to a sparse representation over the
whole training set. Such a sparse representation has been successfully applied to face recognition by Wright et al. in [14],
which suggests that a proper reconstructive model can imply
great discriminative power for classification. In this work,
we directly seek such a reconstructive model based on the
sparsity assumption in a supervised manner, where the data is
modeled as a union of low dimensional linear subspaces, and
each class is modeled as a union of a small subset of these linear subspaces. Accordingly, the classification model is simply
based on the sparse projection errors of the test sample on the
nearest linear dimensional subspaces. Different from previous supervised sparse models [17], [18] and [19], our model
is purely reconstructive and explores the possibility of classification based solely on the data representation.
The reminder of this paper is organized as follows. Section 2 discusses the related works, including matrix factorization and sparse coding. Section 3 presents our supervised dictionary training or regularized matrix factorization algorithm.
Section 4 presents experiment results on various datasets. Finally, section 5 concludes our paper with discussion.
to better model the data,
arg min ∥X − DZ∥F
D,Z
(2)
s.t. ∥zi ∥0 ≤ k,
where zi denotes the i-th column of Z, and ∥ · ∥0 counts the
number of nonzero elements. Eqn.(2) aims to learn an overcomplete dictionary D such that the representations of X in
terms of D are k-sparse. Many algorithms can be applied
to achieve approximate solutions for this problem, e.g., KSVD [20]. Such an over-complete dictionary has led to stateof-the-art performance on many image processing tasks, including denoising and inpainting [11]. Another alternative
formulation for Eqn.(2) is to replace ℓ0 -norm by ℓ1 -norm and
use it as a regularization to promote sparsity,
arg min ∥X − DZ∥F + λ∥Z∥ℓ1
D,Z
(3)
s.t. ∥Di ∥2 ≤ 1,
which is usually referred as sparse coding [21] [22]. Different from conventional matrix factorization techniques, sparse
coding is nonlinear and models the data as a union of low
dimensional subspaces that are adapted to the data space.
2.2. Classification based on Sparse Representation
2. RELATED WORKS
2.1. Matrix Factorization and Sparse Coding
For matrix factorization, we aim to find a basis matrix D ∈
Rd×K and a coefficient matrix Z ∈ RK×N that can represent
the original data matrix X ∈ Rd×N well, subject to some
constraints on the basis matrix D or the coefficient matrix Z
or both, in order to obtain some desired properties for the representation. For example, Nonnegative Matrix Factorization
(NMF) enforces nonnegativity on both D and Z, and the optimization can be iteratively solved in an multiplicative way,
Casting the recognition problem as one of finding a sparse
representation of the test image in terms of the training set
as a whole up to some sparse errors due to occlusion, Wright
et al. [14] showed that such a simple sparse representation
based approach is robust to partial occlusions and can achieve
promising recognition accuracy on public face datasets. Formally, given a set of training samples for the c-th class Dc =
[dc,1 , dc,2 , · · · ], a test sample x from class c can be well represented by Dc with coefficients zc . As the label for x is unknown, it is assumed that zc can be recovered from the sparse
representation of x in terms of the dictionary constructed from
training samples of all I classes by
arg min ∥X − DZ∥F
D,Z
s.t. D ≽ 0, Z ≽ 0.
min ∥z∥1
(1)
On the other hand, Principal Component Analysis (PCA)
aims to find a low-rank approximation of the original data
matrix in an orthonormal system. Independent Component
Analysis (ICA), however, is to find linear representation of
non-Gaussian data so that the components are statistically independent.
Many of these matrix factorization algorithms usually assume some compactness properties on the basis matrix D
(i.e., K < d), implying dimensionality reduction in the representation. For many real signals that are sparse, we can
go to the contrary by increasing the basis matrix to be overcomplete and enforce sparsity on the representation in order
z
s.t. ∥Dz − x∥22 ≤ ϵ,
(4)
where D = [D1 , D2 , · · · , DI ], z = [z1⊤ , z2⊤ , · · · , zI⊤ ]⊤ .
Then the class label is determined based on the minimum
residual error criteria after projecting the test sample on each
class as:
ĉ = arg min ∥Dδc (z) − x∥22 = arg min ∥Dc zc − x∥22 , (5)
c
c
where δc (·) is a vector indicator function that keeps the elements corresponding to the c-th class while setting all others
to be zero. In [14], Wright et al. further extended the basic
formulation of (4) by attaching a trivial basis (identity matrix)
to D to capture sparse errors.
In the general case, if the assumption holds that high dimensional data samples belonging to the same class live in
the same union of a small set of low-dimensional subspaces,
then a new test sample can be represented by only a few basis atoms from the same class, i.e., the new test sample has
a sparse representation in terms of all the basis atoms, which
can be recovered by ℓ1 -norm minimization. In [14], these basis atoms are modeled directly by the raw training samples,
which is not scalable to large scale applications where there
are tens of thousands of training samples. How to find a compact basis matrix that can well model and differentiate different classes of high-dimensional sparse data is our focus in this
work.
3. SUPERVISED SPARSE MATRIX FACTORIZATION
In this work, we are interested in classification of highdimensional data, where we assume each class of the data can
be well approximated by a union of low-dimensional linear
subspaces. We require our basis matrix to be over-complete
so that each union of subspaces from different classes do not
overlap. A trivial way of finding such basis matrices is to
learn an atom dictionary for each class independently, which,
however, is suboptimal for classification. Instead, we propose
to train these atom dictionaries jointly such that a new test
sample will be sparsely projected only onto the linear subspace of the same class.
3.1. Formulation
where Pyi selects the inhomogeneous representation coefficients of xi , i.e., coefficients corresponding to atom dictionaries other than Dyi . Eqn.(7) means that the sparse representation of xi in terms of D will only concentrate on the atom dictionary Dyi . The ideal basis matrix D should be composed of
dictionaries where each dictionary Dc can sparsely represent
data samples from the c-th class but not others. Therefore, if
we find the sparse representation of a given test sample xt in
terms of the trained dictionary D, we can identify the label of
xt based on which atom dictionary accounts for the nonzero
coefficients.
3.2. Approximate Optimization
Finding a basis matrix D for Eqn.(6) and Eqn.(7) is a no easy
task. In this paper, we propose a simple and efficient approximate optimization to integrate Eqn.(6) with Eqn.(7). We
replace the ℓ0 -norm by an ℓ1 -norm regularization to enforce
sparsity of the representation, and add a ℓ2 -norm representation cost for Pyi zi ,
arg min
D,Z
N
∑
∥xi − Dzi ∥22 + λ∥zi ∥1 + γ∥Pyi zi ∥22
i
s.t. ∥dj ∥2 ≤ 1.
Eqn.(8) demands that the learned dictionary can sparsely represent a given data sample while requires that its inhomogeneous representations to be small in the ℓ2 -norm sense.
Expanding the ℓ2 reconstruction error term, the problem is
equivalent to
N
Suppose we are given N training samples {xi , yi }i=1 falling
into I classes, where xi ∈ Rd×1 is the i-th training sample
and yi ∈ {1, 2, ..., I} is its label. We desire to find a composite basis matrix D = [D1 , D2 , ..., DI ], where Dc ∈ Rd×K is
the atom dictionary that can (sparsely) represent the c-th class
well but not others. We also require that the coefficient matrix
Z should be sparse column wise, i.e., each data sample xi will
have a sparse representation with respect to the basis matrix
D. Suppose the signal we are interested in is k-sparse, i.e.,
the signal has a sparse representation in some domain with
nonzero coefficients at most of number k. We aim to learn the
basis matrix D and representation matrix Z such that
arg min ∥X − DZ∥F
(8)
arg min
D,Z
N
∑
ziT Azi + bT zi + λ∥zi ∥1
i
(9)
s.t. ∥dj ∥2 ≤ 1,
where
A
b
= DT D + γPyTi Pyi
(10)
= −2D xi
(11)
T
which is ready to be solved by many existing sparse coding
packages by alternatively optimizing over the sparse representations Z and the dictionary D [21].
D,Z
s.t. ∥zi ∥0 < k,
∥dj ∥2 ≤ 1,
(6)
where zi is the i-th column of Z, and dj is the j-th column
of the composite basis matrix D. Since we are interested in
classification, we hope that the solution Z of Eqn.(6) will have
the following property
Pyi zi = 0,
(7)
3.3. Classification by Regularized Sparse Projection
Given the learned composite dictionary D = [D1 , D2 , ..., DI ]
and a new test sample xt , we want to find its most efficient
sparse projection in order to identify its label. To be consistent with the training phase, we need to add an ℓ2 regularization term on the inhomogeneous representation coefficients,
which, however, is unknown. The training process in Eqn.(8)
suggests that our data sample xt should have the lowest sparse
projection error on the corresponding atom dictionary if the
correctly label is identified. Therefore, we first find the sparse
representations of xt for each class,
ztc = arg min ∥xt − Dz∥22 + λ∥z∥1 + γ∥Pc z∥22 .
z
(12)
And then we assign the classification label by finding the lowest sparse projection residual error
yt = arg min ∥Dδc (ztc ) − x∥22 .
c
texture 1
texture 2
texture 3
Supervised 1
Supervised 2
Supervised 3
(13)
In practice, we find that simply using the sparse representation
based classification (SRC) algorithm talked in Section 2.2 by
neglecting the ℓ2 regularization terms actually performs on
par with the above classification method. Therefore, we use
the SRC algorithm for classification in all our experiments,
which is fast due to the compactness of our learned dictionary.
4. EXPERIMENTAL ANALYSIS
In this section, we present our experimental analysis on three
datasets, i.e., Broadatz texture, Extended Yale-B [23], and
MNIST [24]. For texture analysis, we are interested in learning interesting dictionaries for different visual textures of different visual statistics. And we show that the supervised training helps with classification a lot. For face recognition and
digit recognition, we show how our supervised learning process helps with learning discriminant representation dictionaries for each class. The learned composite dictionary is
compact and thus is scalable to large dataset scenarios in real
applications. For experimental evaluation, we compare with
mainly two algorithms:
1. “Random”: the atom dictionary for each class is generated by random sampling the raw data samples from
that class. If we use all the data, the algorithm is essentially the SRC algorithm in [14].
2. “Unsupervised”: the atom dictionary for each class is
trained independently using sparse coding.
For all the experiments, we initialize our dictionary with a
Gaussian random matrix. As we will see, our algorithm correctly finds and groups the atom dictionary for each class.
Fig. 1. Top row: three textures selected from Broadatz dataset
for analysis; Bottom row: trained dictionaries for each texture.
Table 1. Classification accuracy (%) comparison using random atom dictionaries, unsupervised atom dictionaries, and
supervised atom dictionaries. We use K = 500 for each atom
dictionary.
Algorithm
K = 500
K = 1000
Random
88.25
90.49
Unsupervised
91.74
92.86
Supervised
95.00
95.90
as initialization, our supervised optimization process gradually learns the three atom dictionaries for the three textures.
As shown in the bottom row of figure 1, it is evident that the
learned atom dictionaries nicely capture the statistics of each
texture. For example, texture 1 is composed of large structures of the bricks and fine textures on the brick surface. The
learned atom dictionary captures the large structures by some
translated vertical and horizontal edges and the fine textures
with some DCT like basis atoms.
To evaluate the quality of our trained composite basis matrix, we compare with “Random” and “Unsupervised”. Table 1 lists the recognition accuracies for each method, and our
supervised training outperforms the other two significantly
for both K = 500 and 1000.
4.2. Face Recognition
4.1. Texture Analysis
We select three textures from Broadatz texture dataset for
training the composite basis matrix. Figure 1 top row shows
the three example texture images, which are quite different
in appearance. From each texture image, we randomly sample 10, 000 image patches of size 15 × 15, half of which are
modeled as training and the rest as testing. We use two atom
dictionary sizes K = 500 and K = 1000, and therefore, we
have 15, 000 image patches for training a 1, 500 basis matrix
and a 3000 basis matrix. With the Gaussian random matrix
For face recognition, we use the Extended Yale Face Database
B for evaluation. The database consists of 38 individuals and
each with 64 near frontal face images. The cropped face images were captured under various laboratory-controlled lighting conditions, which are simply normalized into 32 × 32 in
all our experiments. We randomly sample 50 face images
from each individual for training and the rest are modeled as
testing.
For training the composite basis matrix, we try three different atom dictionary sizes K = 10, 20, 30 for both super-
vised training and unsupervised training. Table 2 lists the
classification results on this dataset. Our supervised training
outperforms both random sampling and unsupervised training notably. For random sampling, the best result is achieved
by using all the 50 training images for each individual. As
we can see, our supervised scheme using K = 10 already
achieves the same performance as using all the raw images
directly. The performance improves to 98.44% for K = 20
due to a more flexible model capturing more face image variations, but drops back to 98.05% for K = 30, probably due
to overfitting with the small training dataset. Also, it should
be noted that our supervised training algorithms outperforms
unsupervised training significantly when K = 10, because in
this constrained model our supervised training better identifies the discriminant subspace for each class.
Table 2. Face recognition accuracy on Extended Yale Face
Database B. We compare with atom dictionaries generated
by random sampling from the training data and unsupervised
training.
K
Random
Unsuper.
Supervised
10
86.96
93.39
98.05
20
94.94
96.88
98.44
30
97.28
97.85
98.05
40
97.47
-
50
98.05
-
Figure 2 (a) shows the trained atom dictionaries for K =
10. We only show the dictionaries for 10 subjects out of 38
due to space limitation, where each row corresponds to one
subject. As theoretically shown by Basri and Jacobs [25],
only nine (properly chosen) basis illuminations are sufficient
to generate basis images that span images of the object under all possible lighting conditions to a good approximation.
Our trained dictionary aligns with this theoretical finding to a
large extent. As shown in the figure, each atom dictionary is
composed of one or two mean-like faces for the subject and
the rest part-based basis atoms are used for compensating the
lighting variations, since the most face variations for this face
database is the lighting.
4.3. Digit Recognition
For Handwritten digit recognition, we apply our algorithm
to the large scale MNIST dataset [24]. The database consists of 70, 000 handwritten digits, of which 60, 000 digits
are modeled as training and 10, 000 as testing. The digits
have been size-normalized and centered in a fixed-size image.
We choose three easily confusing digits 4, 7, 9 for our experiments. For dictionary training, we again try different sizes
of the atom dictionary, and the results are listed in table 3. In
all cases compared, the supervised dictionary training outperforms unsupervised dictionary training, and significantly outperforms random sampling under the same dictionary sizes.
Even though much more compact and efficient, our algorithm
(a)
(b)
Fig. 2. (a) The trained dictionaries for Extended Yale Face
Database B. Each row is one atom dictionary for one subject;
(b) The trained dictionaries for digits 4, 7, 9 in MNIST. Starting from a Gaussian random matrix, the algorithm correctly
learns the atom dictionaries for each digit.
Table 3. Classification accuracy (%) on different sizes of
atom dictionaries. We compare with random sampling and
unsupervised dictionary training.
Size K
Random
Unsuper.
Supervised
100
94.50
98.01
98.24
200
95.86
98.24
98.54
300
97.02
98.57
98.71
2000
97.95
-
All
98.54
-
can perform better than the sparse representation based classification using all the training dataset.
In Figure 2 (b), we show the discriminatively learned atom
dictionaries for digits 4, 7, and 9. Again, starting from a random Gaussian matrix, our learning algorithm correctly finds
the atom dictionaries for each digit.
5. CONCLUSION AND FUTURE WORK
In this paper, we propose a supervised sparse matrix factorization model, where the basis matrix is composed of atom
dictionaries from each class and the coefficient or representation matrix is sparse indicating the identity of the data samples. The learning process enforces each data sample to be
represented by a low-dimensional subspace of the same class,
which is found by solving an ℓ1 -norm regularization problem.
Our approach is based on the observation that many natural
high-dimensional signals belonging to the same class tend
to lie in the same low-dimensional subspace. Although the
obtained sparse representations can be combined with subsequent discriminant models, such as SVM, we show that
these data representations can be directly used for the classification purpose. Experiments on three datasets, including
texture images, face database, and handwritten digits, show
promising results of the supervised model with a simple sparsity assumption. For future work, we will investigate structured sparse models, e.g., group sparsity, in our reconstructive
representation model for better discriminative analysis.
6. REFERENCES
[1] I. Joliffe, Principal compoenent analysis, SpringerVerlag, 1986.
[2] M. Turk and A. Pentland, “Face recognition using
eigenfaces,” in IEEE Conference on Computer Vision
and Pattern Recognition, 1991, pp. 586–591.
[3] D. Lee and H. Seung, “Learning the parts of objects by
nonnegative matrix factorization,” Nature, 1999.
[4] R. Duda, P. Hart, and D. Stork, Pattern classification,
Wiley-Interscience, 2 edition, 2000.
[5] A. Martinez and A. Kak, “Pca versus lda,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 23, no. 2, pp. 228–233, 2001.
[6] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenface vs. fisherfaces: recognition using class specific linear projections,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 19, no. 7, pp. 228–233,
2001.
[13] Bin Cheng, Jianchao Yang, Shuicheng Yan, Yun Fu,
and Thomas Huang, “Learning with ℓ1 -graph for image analysis,” IEEE Transactions on Image Processing,
vol. 19, no. 4, pp. 858–866, 2010.
[14] John Wright, Allen Yang, Arvind Ganesh, Shankar
Satry, and Yi Ma, “Robust face recognition via sparse
representation,” IEEE Transactions on Pattern Analysis
and Machine Intelligence (PAMI), Feburay 2009.
[15] David M. Bradley and J. Andrew Bagnell, “Differential
sparse coding,” in NIPS, 2008.
[16] Jianchao Yang, Kai Yu, Yihong Gong, and Thomas
Huang, “Linear spatial pyramid matching using sparse
coding for image classification,” in CVPR, 2009.
[17] Jianchao Yang, Kai Yu, and Thomas Huang, “Supervised translation-invariant sparse coding,” in CVPR,
2010.
[18] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo
Sapiro, “Supervised dictionary learning,” in NIPS,
2008.
[19] Fernando Rodriguez, Guillermo Sapiro, O Rodriguez,
and Guillermo Sapiro, “Learning discriminative and reconstructive non-parametric dictionaries,” 2007.
[7] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and
S. Lin, “Graph embedding and extensions: a general
graph framework for dimensionality reduction,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, pp. 40–51, 2007.
[20] Michal Aharon, Michael Elad, and Alfred Bruckstein,
“K-svd: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions
on Signal Processing, vol. 54, no. 11, pp. 4311–4322,
2006.
[8] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall,
“Learning a mahalanobis metric from equivalence constraints,” Journal of Machine Learning Research, vol. 6,
pp. 937–965, 2005.
[21] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y.
Ng, “Efficient sparse coding algorithms,” in Advances in
Neural Information Processing Systems (NIPS), 2007.
[9] Ke Huang and Selin Aviyente, “Sparse representation
for signal classification,” in In Adv. NIPS, 2006.
[10] D. L. Donoho, “For most large underdetermined systems of linear equations, the minimal ℓ1 -norm solution
is also the sparsest solution,” Communications on Pure
and Applied Mathematics, vol. 59, no. 6, pp. 797–829,
2006.
[11] M. Elad and M. Aharon, “Image denoising via sparse
and redundant representations over learned dictionaries,” IEEE Transactions on Image Processing, vol. 15,
pp. 3736–3745, 2006.
[12] Jianchao Yang, John Wright, Thomas Huang, and
Yi Ma, “Image super-resolution as sparse representation
of raw image patches,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp.
1–8.
[22] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo
Sapiro, “Online learning for matrix factorization and
sparse coding,” Journal of Machine Learning Research,
vol. 11, pp. 19–60, 2010.
[23] A.S. Georghiades, P.N. Belhumeur, and D.J. Kriegman,
“From few to many: Illumination cone models for face
recognition under variable lighting and pose,” IEEE
Trans. Pattern Anal. Mach. Intelligence, vol. 23, no. 6,
pp. 643–660, 2001.
[24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
“Gradient-based learning applied to document recognition,” in Proceedings of IEEE, vol.86, no.11, pp.22782324, 1998.
[25] R. Basri and D. Jacobs, “Lambertain reflectance and linear subspaces,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 25, no. 2, pp. 218–233,
2003.

Download Report

LEARNING THE SPARSE REPRESENTATION FOR

Paperzz.com

Your Paperzz