CNN / AlexNet Sungjoon Choi AlexNet + Not so minor (actually SUPER IMPORTANT) heuristics. ReLu Nonlinearity Local Response Normalization Data Augmentation Dropout What is ImageNet? ILSVRC 2010 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) It uses a subset of ImageNet with roughly 1M images with 1K categories. Are these all just cats? (Of course, some are super cute!) These are all different categories! (Egyptian, Persian, Tiger, Siamese, and Tabby cat) Convolution Neural Network This is pretty much everything about the convolutional neural network. Convolution + Subsampling + Full Connection What is CNN? CNNs are basically layers of convolutions followed by subsampling and fully connected layers. Intuitively speaking, convolutions and subsampling layers works as feature extraction layers while a fully connected layer classifies which category current input belongs to using extracted features. http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ Why is CNN so powerful? Local Invariance Loosely speaking, as the convolution filters are ‘sliding’ over the input image, the exact location of the object we want to find does not matter much. Compositionality There is a hierarchy in CNNs. It is GOOD! Huge representation capacity! https://starwarsanon.wordpress.com/tag/darth-sidious-vs-yoda/ What is Convolution? http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution Details of Convolution ZeroStride padding Channel Conv: Zero-padding? What is the size of the input? 𝑛𝑖𝑛 = 5 What is the size of the output? 𝑛𝑜𝑢𝑡 = 5 What is the size of the filter? 𝑛𝑓𝑖𝑙𝑡𝑒𝑟 = 3 What is the size of the zero-padding? 𝑛𝑝𝑎𝑑𝑑𝑖𝑛𝑔 = 1 𝑛𝑜𝑢𝑡 = 𝑛𝑖𝑛 + 2 ∗ 𝑛𝑝𝑎𝑑𝑑𝑖𝑛𝑔 − 𝑛𝑓𝑖𝑙𝑡𝑒𝑟 + 1 5= 5+2∗1−3 +1 Stride? Conv: Stride? (Left) Stride size: 1 (Right) Stride size: 2 If stride size equals the filter size, there will be no overlapping. Conv: Channel [batch, in_height, in_width, in_chnnel] [filter_height, filter_width, in_channels, out_channels] [batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7] Conv: Channel [batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7] What is the number of parameters in this convolution layer? 𝟏𝟖𝟗 = 𝟑 ∗ 𝟑 ∗ 𝟑 ∗ 𝟕 AlexNet What is the number of parameters? Why are layers divided into two parts? AlexNet ReLU Nonlinearity ReLU tanh Faster Convergence! Local Response Normalization The response-normalized activity is given by: 𝑖 𝑎 𝑥,𝑦 𝑖 = 𝑏𝑥,𝑦 min 𝑁−1,𝑖+𝑛/2 𝑗 𝑘 + 𝛼 𝑗=max 0,𝑖−𝑛/2 𝑎𝑥,𝑦 2 𝛽 It implements a form of lateral inhibition inspired by real neurons. Reducing Overfitting It is often called regularization in machine learning literatures. More details will be handled in next week. In the AlexNet, two regularization methods are used. Data augmentation Dropout Reg: Data Augmentation http://www.slideshare.net/KenChatfield/chatfield14-devil Reg: Data Augmentation 1 Original Image (256 × 256) Smaller Patch (224 × 224) This increases the size of the training set by a factor of 𝟐𝟎𝟒𝟖 32 ∗ 32 ∗ 2 . Two comes from horizontal reflections. Reg: Data Augmentation 2 Original Patch (224 × 224) Color variation Altered Patch (224 × 224) To each RGB image pixel, following quantity is added: 𝑝1 , 𝑝2 , 𝑝3 𝛼1 𝜆1 , 𝛼2 𝜆2 , 𝛼3 𝜆3 𝑇 where 𝑝𝑖 and 𝜆𝑖 are 𝑖th eigenvector and eigenvalue of 3 × 3 covariance matrix of RGB pixel values. Probabilistically, not a single patch will be same at the training phase! (a factor of infinity!) Reg: Dropout Original dropout [1] sets the output of each hidden neuron with certain probability. In this paper, they simply multiply the outputs by 0.5. [1] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing coadaptation of feature detectors. arXiv, 2012. http://www.eddyazar.com/the-regrets-of-a-dropout-and-why-you-should-drop-out-too/
© Copyright 2026 Paperzz