医療情報システム研究室データマイニング班【文献調査】 Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion 塙賢哉廣安知之山本詩子 2014 年 11 月 01 日 1 タイトル積み重ねられたノイズ除去 autoencoders: ローカルなノイズ除去の基準を備えた Deep Network 中の学習の有用な表現 2 著者 Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol 3 出典 Journal of Machine Learning Research 11 (2010) 3371-3408 4 アブストラクト私たちは入力のノイズを崩壊させるレベルに対してローカルに訓練されるノイズ除去 autoencoders の層を積み重ねることに基づいて，deep networks を構築するためのオリジナルな戦略を調査する．結果として生じるアルゴリズムは通常の autoencoders を積み重ねた上で変化させることです．しかしながら，それは分類問題のベンチマークで著しく低い識別誤差が示され，deep belief networks を用いたパフォーマンスのギャップを埋め，いくつかのケースでそれを越える．この教師なしの方法で学習された高いレベルの表現は，その後，SVM のパフォーマンスを向上させるのにも役立つ．質の実験では，通常の autoencoders に反して，数字イメージからの自然画像のパッチおよびより大きなストローク検知器からノイズ除去する autoencoders がガボールのようなエッジ検出器を学習することができることを示す．この研究は，有用なより高いレベルの表現の学問をガイドするために従順な教師なし学習を目的としてノイズ除去する基準を使用する値を明白に確立する． 5 キーワード deep learning, unsupervised feature learning, deep belief networks, autoencoders, denoising 6 参考文献 [1] 困難な認識タスクに汎化能力を持たせるためのニューラルネットの研究に関して J.L. McClelland, D.E. Rumelhart, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 2. MIT Press, Cambridge, 1986. G.E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40:185-234, 1989. .E. Utgoff and D.J. Stracuzzi. Many-layered learning. Neural Computation, 14:2497-2539, 2002. [2]deep learning における見解は，一次視覚野のような脳の階層化アーキテクチャの知識を抽出する J. Ha stad. Almost optimal lower bounds for small depth circuits. In Proceedings of the 18th annual ACM Symposium on Theory of Computing, pages 6-20, Berkeley, California, 1986. ACM Press. J. Hastad and M. Goldmann. On the power of small-depth threshold circuits. Computational Complexity, 1:113-129, 1991. Y. Bengio and Y. LeCun. Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste, and 1 J. Weston, editors, Large Scale Kernel Machines. MIT Press, 2007. Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1): 1-127, 2009. Also published as a book. Now Publishers, 2009. [3] 多層ニューラルネットの最適化に関して Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Bernhard Scholkopf, John Platt, and Thomas Hoffman, editors, Advances in Neural Information Processing Systems 19 (NIPS ’06), pages 153-160. MIT Press, 2007. [4] 深いアーキテクチャの新しいアプローチに関して G.E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006. G.E. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, July 2006. M. Ranzato, C.S. Poultney, S. Chopra, and Y. LeCun. Efficient learning of sparse representations with an energy-based model. In B. Scho lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19 (NIPS ’06), pages 1137-1144. MIT Press, 2007. H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20 (NIPS ’07), pages 873-880, Cambridge, MA, 2008. MIT Press. [5] 半教師あり学習のアプローチに関して D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pretraining help deep learning? Journal of Machine Learning Research, 11:625-660, February 2010. J. Weston, F. Ratle, and R. Collobert. Deep learning via semi-supervised embedding. In William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors, Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML ’08), pages 1168?1175, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-205-4. doi: 10.1145/1390156.1390303. [6]PCA に関して Y. Cho and L. Saul. Kernel methods for deep learning. In Y. Bengio, D. Schuurmans, C. Williams, J. Lafferty, and A. Culotta, editors, Advances in Neural Information Processing Systems 22 (NIPS ’09), pages 342-350. NIPS Foundation, 2010. [7]RBMs に関して G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:17711800, 2002. P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 6, pages 194-281. MIT Press, Cambridge, 1986. Y. Bengio and O. Delalleau. Justifying and generalizing contrastive divergence. Neural Computation, 21(6):16011621, June 2009. H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 10:1-40, January 2009a. 2