Homework 7

EE378B Inference, Estimation, and Information Processing
Non-negative Matrix Factorization
Andrea Montanari
Lecture 13-14 - Due on 3-6-2017
Paper submissions are preferred. You can submit the homework in class, or in the box of the second
floor in Packard (there is a box for EE378B). You can use any programming language. Submit your code as
part of the homework. Tables and figures are preferred to help to illustrate your results. There are several
sub-questions for each question. Make sure that it is easy for readers to figure out which sub-questions you
are answering.
This homework is about automatic recognition of handwritten digits. The dataset to be used is the
MNIST test dataset in the file mnist test.csv that you can find at the following url
http://web.stanford.edu/class/ee378b/homework/mnist test.csv
The data consist of n = 10, 000 images. An image is a 28 × 28 gray-level array. Each image is stored as a
line of the file mnist test.csv, in the format
label, pix(1,1), pix(1,2), pix(1,3), . . . ,
where label is a label corresponding to the digit represented by the image, and pix(s,t) is the pixel intensity
at row s, column t (an integer in the range {0, 1, . . . , 255}).
Hereafter, we will denote the i-th image by the vector xi ∈ Rd , d = 282 = 784.
(1) A possible approach towards is to use principal component analysis. Explicitly, compute the empirical
covariance Σ ∈ Rd×d , by
n
Σ≡
1X
xi xT
i ,
n i=1
(1)
and let U ∈ Rd×r be the matrix whose columns are given by the first r = 10 eigenvector of Σ (corresponding
to the largest eigenvalues).
Plot the r principal components (columns of U) as 28 × 28 pixels images. (Scale them as to make them
visible!)
(2) A more interpretable representation is obtained by non-negative matrix factorization. The data can be
represented by a matrix X ∈ Rn×d , whose i-th row represents the i-th image. We suggest to rescale the rows
so that X1 = 1 and implement the original algorithm in (Seung, Lee, 1999) which attempts at maximizing
the following cost over W ∈ Rn×r and H ∈ Rr×d
F (W , H) =
n X
d
X
X ij log(W H)ij − (W H)ij .
(2)
i=1 j=1
Apply this algorithm with r ∈ {3, 6, 10} and, for each of these cases, plot the resulting archetypes (hi )i≤r
(the rows of H) as 28 × 28 pixels images. (Scale them as to make them visible!)
(3) Consider the case r = 10 and pick a random image xi for each digit {0, 1, . . . , 9}:
1
• Plot the coefficients of each image xi in terms of the principal components u1 , . . . , ur computed at
point (1) above. For image i, such a representation is given by the projection
y i = UT xi .
(3)
• Plot the NMF coefficients of each image xi . These are given by the corresponding row in the weight
matrix W .
Comment on the differences.
2