Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin February 28th, 2013 ©Carlos Guestrin 2013 1 Collaborative Filtering n n Goal: Find movies of interest to a user based on movies watched by the user and others Methods: matrix factorization, GraphLab ©Carlos Guestrin 2013 2 1 Women on the Verge of a Nervous Breakdown The Celebra8on en m m co re What do I recommend??? City of God d Wild Strawberries La Dolce Vita ©Carlos Guestrin 2013 3 Cold-Start Problem n n Challenge: Cold-start problem (new movie or user) Methods: use features of movie/user IN THEATERS ©Carlos Guestrin 2013 4 2 Netflix Prize n Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy n n 17770 total movies 480189 total users Over 8 billion total ratings n How to fill in the blanks? n Figures from Ben Recht ©Carlos Guestrin 2013 5 Matrix Completion Problem X= n Xij known for black cells Xij unknown for white cells Rows movies Rowsindex index users Columns index users Columns index movies Filling missing data? ©Carlos Guestrin 2013 6 3 Interpreting Low-Rank Matrix Completion (aka Matrix Factorization) X = L R’ ©Carlos Guestrin 2013 7 Matrix Completion via Rank Minimization n Given observed values: n Find matrix n Such that: n But… n Introduce bias: n Two issues: ©Carlos Guestrin 2013 8 4 Approximate Matrix Completion n Minimize squared error: ¨ (Other loss functions are possible) n Choose rank k: n Optimization problem: 9 ©Carlos Guestrin 2013 Coordinate Descent for Matrix Factorization min L,R X (Lu · Rv ruv )2 (u,v,ruv )2X:ruv 6=? n Fix movie factors, optimize for user factors n First Observation: ©Carlos Guestrin 2013 10 5 Minimizing Over User Factors n For each user u: min Lu X v2Vu (Lu · Rv n In matrix form: n Second observation: Solve by ruv )2 11 ©Carlos Guestrin 2013 Coordinate Descent for Matrix Factorization: Alternating Least-Squares min L,R n (Lu · Rv ruv )2 (u,v,ruv )2X:ruv 6=? Fix movie factors, optimize for user factors ¨ n X Independent least-squares over users min Lu Fix user factors, optimize for movie factors ¨ Independent least-squares over movies n System may be underdetermined: n Converges to ©Carlos Guestrin 2013 min Rv X (Lu · Rv ruv )2 (Lu · Rv ruv )2 v2Vu X u2Uv 12 6 Effect of Regularization min L,R X X (Lu · Rv ruv )2 (u,v,ruv )2X:ruv 6=? = L R’ ©Carlos Guestrin 2013 13 What you need to know… n n n n Matrix completion problem for collaborative filtering Over-determined -> low-rank approximation Rank minimization is NP-hard Minimize least-squares prediction for known values for given rank of matrix ¨ Must n use regularization Coordinate descent algorithm = “Alternating Least Squares” ©Carlos Guestrin 2013 14 7 Case Study 4: Collaborative Filtering SGD for Matrix Completion Matrix-norm Minimization Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin March 7th, 2013 15 ©Carlos Guestrin 2013 Stochastic Gradient Descent min L,R 1X (Lu · Rv 2r ruv )2 + uv n Observe one rating at a time ruv n Gradient observing ruv: n Updates: ©Carlos Guestrin 2013 u 2 ||L||2F + v 2 ||R||2F 16 8 Local Optima v. Global Optima n We are solving: X min L,R ruv (Lu · Rv ruv )2 + n We (kind of) wanted to solve: n Which is NP-hard… ¨ 2 u ||L||F + 2 v ||R||F How do these things relate??? ©Carlos Guestrin 2013 17 Eigenvalue Decompositions for PSD Matrices n Given a (square) symmetric positive semidefinite matrix: ¨ Eigenvalues: n Thus rank is: n Approximation: n Property of trace: n Thus, approximate rank minimization by: ©Carlos Guestrin 2013 18 9 Generalizing the Trace Trick n Non-square matrices ain’t got no trace n For (square) positive definite matrices, matrix factorization: n For rectangular matrices, singular value decomposition: n Nuclear norm: ©Carlos Guestrin 2013 19 Nuclear Norm Minimization n Optimization problem: n Possible to relax equality constraints: n Both are convex problems! (solved by semidefinite programming) ©Carlos Guestrin 2013 20 10 Analysis of Nuclear Norm n Nuclear norm minimization is a convex relaxation of rank minimization problem: min rank(⇥) min ||⇥||⇤ ruv = ⇥uv , 8ruv 2 X, ruv 6=? ruv = ⇥uv , 8ruv 2 X, ruv 6=? ⇥ ⇥ n Theorem [Candes, Recht ‘08]: ¨ ¨ If there is a true matrix of rank k, And, we observe at least C k n1.2 log n random entries of true matrix ¨ Then true matrix is recovered exactly with high probability with convex nuclear norm minimization! n Under certain conditions 21 ©Carlos Guestrin 2013 Nuclear Norm Minimization versus Direct (Bilinear) Low Rank Solutions n ¨ n min Nuclear norm minimization: ⇥ Annoying because: Instead: min L,R X ruv (Lu · Rv X ruv )2 + ||⇥||⇤ (⇥uv ruv ruv )2 + 2 u ||L||F + ¨ Annoying because: ¨ ⇢ 1 1 ||⇥|| = inf min ||L||2F + ||R||2F : ⇥ = LR0 But ⇤ L,R 2 2 n So And n Under certain conditions [Burer, Monteiro ‘04] n ©Carlos Guestrin 2013 2 v ||R||F 22 11 What you need to know… n Stochastic gradient descent for matrix factorization n Norm minimization as convex relaxation of rank minimization ¨ Trace norm for PSD matrices ¨ Nuclear norm in general n Intuitive relationship between nuclear norm minimization and direct (bilinear) minimization ©Carlos Guestrin 2013 23 Case Study 4: Collaborative Filtering Nonnegative Matrix Factorization Projected Gradient Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin March 7th, 2013 ©Carlos Guestrin 2013 24 12 Matrix factorization solutions can be unintuitive… n Many, many, many applications of matrix factorization n E.g., in text data, can do topic modeling (alternative to LDA): = L X n Would like: n But… R’ 25 ©Carlos Guestrin 2013 Nonnegative Matrix Factorization X n Just like before, but min L 0,R 0 n = L R’ X ruv (Lu · Rv ruv )2 + 2 u ||L||F + 2 v ||R||F Constrained optimization problem ¨ Many, many, many, many solution methods… we’ll check out a simple one ©Carlos Guestrin 2013 26 13 Projected Gradient n Standard optimization: min f (⇥) ¨ Want to minimize: ⇥ ¨ Use gradient updates: ⇥(t+1) n Constrained optimization: ¨ ¨ n ⇥(t) ⌘t rf (⇥(t) ) Given convex set C of feasible solutions Want to find minima within C: min f (⇥) ⇥ Projected gradient: ⇥2C ¨ Take a gradient step (ignoring constraints): ¨ Projection into feasible set: 27 ©Carlos Guestrin 2013 Projected Stochastic Gradient Descent for Nonnegative Matrix Factorization min L 0,R 1X (Lu · Rv 0 2 r ruv )2 + uv n n n u 2 ||L||2F + v 2 ||R||2F Gradient step observing ruv ignoring constraints: " (t+1) L̃u (t+1) R̃v # " (1 (1 ⌘t ⌘t (t) u )Lu (t) v )Rv (t) ⌘ t ✏ t Rv (t) ⌘t ✏ t Lu # Convex set: Projection step: ©Carlos Guestrin 2013 28 14 What you need to know… n In many applications, want factors to be nonnegative n Corresponds to constrained optimization problem n Many possible approaches to solve, e.g., projected gradient ©Carlos Guestrin 2013 29 15
© Copyright 2025 Paperzz