Salient Deconvolutional Networks Aravindh Mahendran, Andrea Vedaldi, University of Oxford (A) GOAL & CONTRIBUTIONS (F) Interpretation: Fourier Phase Information Input Image Recently, several methods to understand CNNs through visualization have been proposed: 1. DeConvNet visualizes patterns selected by neurons [5]; 2. Class saliency visualizes the "network attention” pattern [3]. However, both are heuristic and their meaning remains unclear. Fourier Reconstruction Fourier Reconstruction Random Magnitude DeSaliNet Positive Random Input DeConvNet Positive Random Input Our goal is to unify, compare, and understand such techniques. We do so by: 1. Introducing a generalized construction for reversed architectures. 2. Exploring in detail three variants: DeConvNet, SaliNet and the hybrid DeSaliNet. 3. Identifying limitations of these networks for the purpose of CNN visualization. Auxiliary information is like the phase information in a Fourier transform. (B) Reversing CNN Architectures Reversing an architecture layer by layer Neuron Selector Max Pool Auxiliary Information ReLU Max Pool Reversed ReLU Reversed Conv Conv Transpose Forward CNN Reversed CNN Input Image Result Image Max Pooling Randomized magnitudes with ground truth phase yields an edge image similar to the results obtained from a reversed CNN using ReLU backward. Reverse a “Forward CNN” layer by layer using heuristics to form a “Reversed CNN”. Example: Forward layer Convolution Pooling → → Reversed layer Convolution Transpose Un-Pooling -----------VGG-VD-Pool5_3 ----------Input Img Reversing layers SaliNet DeConvNet DeSaliNet -----------VGG-VD-FC8----------DeConvNet SaliNet DeSaliNet Back-propagation defines a natural reverse of each layer. For layer i, ߶ ǣ ݔ՜ ݕthen its BP-reversed becomes ߲ ො ݔൌ ൏ ݕǡ ො ߶ ݔ ߶ ǣ ݕǡ ߲ݔ ෝ where ࢟ is the layer input for the reversed layer. However, other definitions are also commonly used. We consider variations used in: Deconvolutional Networks (DeConvNet) – Zeiler et.al. Class Saliency (SaliNet) – Simonyan et. al. Improved DeConvNets (DeSaliNet) – Springenberg et.al. Backpropagation – Rumelhart et.al. Semantic Segmentation U-Net – Noh et.al. (C) Reversing Layers: Max Pooling and ReLU ReLU Max Locations “Switches” (G) Foreground Object Selectivity • Foreground background differentiation is perhaps implicit in the CNN hidden layer activations. • Extract this and project it into the image using a “Reversed CNN”. Input from reversed layer above Rectification Mask ( r ) Input from reversed layer ݕො ReLU Pooling The above figure suggests that SaliNet and DeSaliNet better highlight foreground objects compared to DeConvNet which has a uniform spread over the image. Un-Pool using “Switches” Un-Pool Center ݕො Ͳ (ReLU) DeConvNet ݕො ٖ ( ݎReLUBP) SaliNet ݕො Ͳ ٖ ݎ (ReLUלReLUBP) DeSaliNet ݕො (No Op.) (H) Weakly Supervised Foreground Object Segmentation • Use the output of a “Backward CNN” to seed a grab cut segmentation. Segment the foreground object! • We compare against the weakly supervised baseline of Guillaumin et al. IJCV 2014 Auxiliary Information Segmentation Pipeline Image DeConvNet, SaliNet and DeSaliNet all use Un-pooling using “Switches”. They differ in ReLU reversed layers. Select Strongest Neuron CNN CNN Reversed (D) Analysis of Reversed Architectures Two types of reversed pooling and four types of reversed ReLU layers. Result images are shown below. DeConvNet Mask SaliNet With Pooling Switches DeSaliNet Un-pool to centre Method ReLU ◦ RUBP ReLU RUBP No Operation Per Pixel IoU Accuracy AlexNet VGG-16 AlexNet VGG-16 SaliNet 82.82 82.45 57.07 56.33 DeSaliNet 82.31 83.29 55.57 56.25 DeConvNet 75.85 76.52 48.26 48.16 Baseline 78.97 46.27 Guillaumin et al. 84.4 57.3 Grab Cut Foreground Segmentation In the table, “baseline” is using a Gaussian mask with mean at the image centre for foreground seed and mean at the image corners for background seed (see figure below). Original Image Ground Truth DeSaliNet Mask DeSaliNet Segment SaliNet Mask SaliNet Segment DeConvNet Mask DeConvNet Segment Baseline Mask Baseline Segment ReLU in the backward direction imparts edges and structure to the output. (E) Lack of Neuron Selectivity DeSaliNet Rnd. Noise Rnd. Neuron Max Neuron We change the neuron selector and view the result image DeConvNet SaliNet • Changing the selected neuron does not significantly change the output. • Suggests that these reversed networks are not suitable for neuron visualization. • The auxiliary information dominates the output. References 1. Guillaumin, M., Küttel, D., Ferrari, V.: Imagenet auto-annotation with segmentation propagation. In: IJCV (2014) 2. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proc. ICCV (2015) 3. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. In: ICLR (2014) 4. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: The all convolutional net. In: ICLR Workshop (2015) 5. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proc. ECCV (2014) 6. Gulshan, V., Rother, C., Criminisi, A., Blake, A., Zisserman., A.: Geodesic star convexity for interactive image segmentation. In: Proc. CVPR (2010) 7. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. In: Nature 323, 533 – 536 Acknowledgements: BP for Aravindh Mahendran, ERC StG IDIU for Andrea Vedaldi
© Copyright 2026 Paperzz