Neural networks Sparse coding - online dictionary learning algorithm • ↵ D D T X [email protected] T X 1 1 2 Université de ||h(x(t) )||1 hugo.larochelle@ Sparse codin 1 min min l(x(t) ) = min ||x(t) D h(x(t) )||22 + T T D T D T 2 h(t) X X Abstract 1 1 1 t=1 t=1 (t) 2 1,)||2012 min min l(x(t) ) = min ||x(t) D h(xNovember )||2 + ||h(x(t) 1 (t) D T D T 2 T h Math for my t=1 slides “Sparse coding”. t=1 1 X 1 (t) November min ||x D h(x(t) )||22 Hugo Larochel T Topics: learning algorithm (putting it DallT together) 2 X 1 1 (t) (t) 2 t=1 (t) min ||x D h(x )||2 h D Département d’inform Abstract D T 2 • Learning T • alternates between inference and dictionary learning t=1 X ! T ! 1 1 1 T T Université de Sherbr (t) (t) 2 (t) X X D h || + ||h X ||1 1 1 arg min arg min ||x (t) (t) 2 (t) (t)2 > (t) (t) > Abstr Math for my slides arg“Sparse min Dcoding”. ||x D h(x )|| = x h(x ) h(x ) h(x ) ! 1 T (t) ! 22 hugo.larochelle@usherb h T T T T 2 t=1 D X X X 1 1 (t) t=1 t=1 (t) 2 t=1 (t) (t) Math > (t) (t) > forh(x my slides coding”. min ||x D h(x )||2 = x h(x ) ) h(x “Sparse ) (t) (t) • While has not converged x D Th 2 D • t=1 t=1 t=1 ! T ! 1 November 1, 201 T 1 X T (t) (t) 2 (t) X X 1 1 (t) in my (t)(t) (t) (t)>||12 (t) ‣ find the sparse codes h(x ) for = arg min ||x D h || + ||h all training set with ISTA (t) h (t) D > (t) (t) • x 2 arg min arg min ||x D h || + ||h || D (= (t) x 2 h(x )! 1 h(x ) h(x ) 2 ! 1 T T T h T 2 X X X t=1 (t) D h t=1 1 1 (t) (t) > (t)t=1 (t) > ‣ update the dictionary: D (= x h(x ) h(x ) h(x ) arg min arg min Abstract PT t=1 (t) P T t=1 h(t) 2 (t) > (t) (t) > 1 t=1 T D -• A (= x h(x ) B (= h(x ) h(x ) D (= B A t=1 t=1 Math1 for my slides “Sparse coding”. PT 1 t) (t) -> (t) (t) > (t) (t) (t) 2 (t) h(x )• B (= t=1 h(x ) h(x ) D (= B A h(x ) = arg min ||x D h || + ||h || ! 1 • 2 T +1 T 2(t) > X X (t) (t) (t) h 1 (t) (t) (t) (T +1) (T +1) > - run block-coordinate descent (t) • x h D algorithm to update x h(x !(= x h(x ) + (1 )x h(x )= arg min ||x h(x ) T T +1 T X X X 2 t=1 t=1 (T +1) 1 1 (t) (t) (t) (t) (t) (t) > (T +1) > h x h(x (= x h(x ) + (1 )x h(x ) arg min arg min ||x T t=1 h(t) 2 • • t=1 D ! Similar to the EMT +1t=1 algorithm T X X h(x(t) ) h(x(t) (= h(x(t) ) h(x(t) )> + (1 )h(x(T +1) ) h(x(T +1) )> SPARSE CODING hugo.larochelle@usherbroo hu Départem 3 H D h(x(t) )||22 + ||h(x(t) )||1 hugo.larochelle@usherbroo November 1, 2012 Univers Départe Sparse coding November 1, 2012 T 1 X 1 (t) hugo.laroch Unive November 1, 2012 min ||x D h(x(t) )||22 D T 2 t=1 Topics: online learning algorithm Abstract hugo.laro Hugo Larochelle Nov Abstract ! T ! 1 Math for my slides “Sparse Tcoding”. Département d’informatique • This algorithm isX ‘‘batch’’ (i.e. not online) X (t) (t) 2 (t) (t) > (t) (t) > No Abstract ||x D h(x )|| = x h(x ) h(x ) h(x ) Math for my slides “Sparse coding”. Math for my slides “Sparse coding”. 2 t) Université de Sherbrooke h(t) D update of t=1 ‣ single the dictionary per pass on the training set t=1 T X 1 for (t) 1 “Sparse Math my slides coding”. (t) [email protected] (t) 2 (t) (t) (t) arg min arg min ||x D h || + ||h ‣ for large datasets,!we’d• like x to update h D! after visiting each • 2x h(t)||1 D T X T X 1 1 (t) (t) min l(x ) = min ||x D T 2 h(t) =1 t=1 T X TD X [email protected] SPARSE CODING T h(t) 12 T X (t) (t) > (t) (t) > Math for my slides “Sparse coding”. 1 (t) 1 (t) (t) D (= x h(x ) h(x ) h(x ) • Solution: for each • x h D November arg min T 1, 2012 arg min ||x argD mih t=1 t=1 T 2 (t) slides “Sparse coding”. 1 (t) (t) Math X D(t) D h (t) (t) 2 for my (t) 1 1 t=1 h(x ) = arg min ||x D h || + ||h || ‣ perform (t) inference of for the current • x h D P 1 2 T arg min arg min ||x Dh (= t=1 h(x(t) ) h(x(t) )> D (= B A 1 h(t) 2 T X T 2 (t) (t) (t) 1 D • of the quantities required h ‣ update running averages • x to update h• D : t=1 arg min arg Abstract 1 T (t) (t)T (t( X (t) (t) > D 1 h(x ) = arg min ||x Dh(x hh - B (= B + (1 ) x h(x ) t=1 arg a 2 min (t) h Math•for my slides “Sparse coding”. T t=1(t) 1 D - A (= A + (1 ) h(x(t) ) h(x(t) )> (t) (t) • h(x ) = arg min ||x Dh (T +1) (T +1) > A (= current ) h(x h(x start’’ ) 2h(x(t) ) = arg m (t) ‣ use h of D as )‘‘warm to block-coordinate descent •Ax+(t)(1value h(t) • T (t) X h 1 1 (t) (t) (t) 2 ) = arg (t h(x arg min arg min ||x D h ||2 + ||h t=1 [email protected] November 1, 2012 November 1, 2012 X1 T T 1X 1 X X (t) (t) (t) 2 (t) 1 1 1D h(x (t) (t) (t) 2 n min l(x ) = min ||x )|| + ||h(x )||1 (t) )||1 2 4 min min l(x ) = min ||x D h(x )|| + ||h(x T 2 D X T t=1 h(t) T 2 (t) D T 1 t=1 D1 TNovember 2 1, 2012 h (t) (t) t=1 t=1 min ||x D h(x )||22 DT T (t) 2 (t) )) h(t) (= shrink(h ,t=1 ↵ sign(h X 1 1 (t) (t) 2 T Abstract min ||x D h(x )||2 1DX 1 2 D h(x(t) )||2 minb) = [. .T. , sign(a ||x(t) (1) t=1 ! ! shrink(a, bi , 0),2. . Abstract .] i ) max(|ai | 1 D T 2 T T T X X X t=1 coding”. 1 Math 1 (t)for my slides Abstract (t) learning 2 “Sparse (t) (t) > (t) (t) > Topics: online algorithm nmy slides “Sparse ||x D h(x )|| = x h(x ) h(x ) h(x ) T coding”. ⇣ ⌘ ! ! 2 1 X > > > 1 1 (t) (t) 1T TT T 2 (t) (t) (t) (t) X X X t=1 t=1 = min x x x D h(x ) + D h(x ) D h(x ) (2) 1t=1 (t) (t) 2 (t) (t) > (t) (t) > Abstract Math for slides coding”. D my T t=1 2X 2 h(x ) h(x ) T T = “Sparse ||x D h(x )|| x h(x ) (t) (t) X 2 1 1 1 • (not||x(t)to 0!) 22 + ||h(x(t))||t=1 •D x2 Initialize hl(x(t)) = D min min min D h(x(t) )||! 1 Hugo Larochelle! t=1 t=1 D T D T T 2 T h T ! ! ⇣ ⌘T X t=1 t=1 >1 X T X T > 1 X 1 X 1 1 (t) (t) (t) X Math for my slides “Sparse Département d’informatique (t)D h(x1coding”. (t) 2 h(x(t) (t) 1 =(t)min x D h(x ) + ) D (3) (t)• While (t) (t) > arg min ||x (t) (t) > (t) (t) 2 (t) arg min D h || + ||h || 1 hasn’t converged 2 x h1 X D D T 2 D (= x h(x ) h(x ) h(x ) ! ! 1 T arg min Tde Sherbrooke arg min ||x D h ||2 + ||h ||1 T t=1 2 t=1 t=1 (t) T T D2 Université 1 X h (t) (t) X min ||x D (1) T 2 t=1 t=1 2 (t) (t)h(x )||(t) > (t) (t) > X D D (4) T 2 h D‣ (= x h(x ) h(x ) h(x ) 1 1 (t) (t) t=1 t=1 [email protected] (t) (t) 2 (t) for each • x h D Online Dictionary Learning for Sparse Coding. T P arg arg min ||xT D h ||2 + ||h ||1 ⇣ ⌘> min t=1 > T >(t) (t) > 1 X 1 (t) (t)1t=1 > 1 (t) (t) (t) (t) (t) (x= min )D B (=x xt=1 h(x ) h(x A x D h(x )+ ) DD h(x(= )1 B DMairal, h(x ) Bach, (2) and X T 2 Ponce Sapiro, 2009. (t) (t) (t) (t) (t) 2 D T 2 2 h 1 1 P code •) )> B (=t=1infer h(x ) = arg min ||x D h || + ||h || November 1, 2012 t=1 (t) (t) 2 T 1 T 2 (t) 1 ! (t) )>1TD ! X h(x ) h(x (= B A T arg min arg min ||x D h || + ||h X X ⇣ h(t) (t) ⌘>2 (t) (t) > t=1 > 1 2 1 1 (t) D + ↵ Dh(x (= h(x ) (3) (t) (t) 2 (t) = min x(t) D ) + D (x h(x(t) ) D D h(x h(x(t) )) (t) T 2 (t) T (1 (t) > ) = argD update dictionary h(x min ||x D T 2 + hD h ||2 + ||h ||1 B (= B ) x(t) h(x ) t=1 t=1 t=1 t=1 (4) h(t) 1 2 (t) (t) > ) x h(x ) ✓ B (= B + (1 (t)! Abstract (t) (t) 2 (t) ! 1 h(x ) = arg min ||x D h || + ||h || T T T 1 • 2 X X X 1 Math 1 for (T (t) +1) (t)my (t) 2 coding”. (t) (T > +1) > (t) (t) > 2 T slides “Sparse (t) ✓ A arg min ||x(= h(x )|| = x h(x ) h(x ) h(x ) h + (1 ) h(x ) h(x ) 1 (t) 1 XDA 2(t) (t) (t) > (t) (t) 2 (t) D (= D + ↵ (x D h(x )) h(x ) T 2 D (T +1) (T +1) > h(x ) = arg min ||x D h ||2 + ||h T + (1 t=1 t=1 A ) h(x t=1) h(x ) (t)✓(=(t) A t=1 • x h while D hasn’t converged 2 (t) h T ! T ! 1 X T T 1 1 (t) X X (T +1) (T +1) > (t) 2 (t) 1 X 1 (t) ★ for each column perform gradient update A + (1 ) h(x ) h(x D (= D /||D (t) A 2(= (t) (t) > (t) ·,j ||)2(t) > arg min ·,j ·,j arg min ||x D h || + ||h ||1 in ||x D h(x )||2 = x h(x ) h(x ) h(x ) 2 T t=1 2 T 2 h(t) t=1 1 t=1 D =1 SPARSE CODING Sparse coding (t) x (t) (t) > h(x ) B (= • PT P D ·,jT(= t=1 t=1 (B·,j D ·,j + D·,j Aj,j ) (t) (t)A> A h(x ) D (= B A 1 j,j ) h(x D·,j (= D·,j ||D·,j ||2 (t) 1 (t) h(x (t)) = arg min ||x (t) > D h(t) ||22 + ||h(t) ||1
© Copyright 2025 Paperzz