Exponential expressivity in deep neural networks through transient chaos 1 1 2,3 3 Ben Poole , Subhaneil Lahiri , Maithra Raghu , Jascha Sohl-Dickstein , Surya Ganguli 1 1 Stanford University, 2Cornell University , 3Google Brain Introduction Signal propagation in random neural networks Our work: expressivity in neural networks with random weights: ● New framework for analyzing random deep networks using mean field theory and Riemannian geometry ● Random deep nets are exponentially more expressive than shallow: deep → shallow requires exponentially more neurons! ● Can represent exponentially curved decision boundaries in input Self-averaging approximation: for large Nl, average over neurons in Ordered regime: nearby points become more correlated Autocorrelation Depth a layer ≈ average over random weights for one neuron Length propagation Recursion relation for the length of a point as it propagates through the network: q* - fixed point of iterative map Increasing σW Goal: develop a theoretical understanding of deep neural networks ● Expressivity: represent a large class of functions ● Trainability: tractable algorithms for finding good solutions ● Generalizability: work well in unseen regions of input space Expressivity in random neural networks Fully-connected neural network with nonlinearity : Input Chaotic regime: nearby points become decorrelated Output Riemannian geometry of manifold propagation 0 x (θ) 1 x (θ) 2 x (θ) Correlation propagation 3 x (θ) Local stretching, Independent random Normal weights and biases: weight variance 1 Local curvature Global curvature (Grassmanian length) bias variance Experiment: random deep nets more expressive than wide shallow nets error The correlation map always has a fixed point at 1 whose stability is determined by the slope of the correlation map at 1: Theory: how do simple input manifolds propagate through a deep network? A single point: When does its length grow or shrink and how fast? acts like a local stretching factor: 1 1 1 A pair of points: Do they become more similar or more different? < 1 : nearby points come closer together > 1 : nearby points are driven apart Local Stretch Local Curvature Grassmannian Length Ordered: 1 <1 Exponential decay Exponential growth Constant Chaotic: 1 >1 Exponential growth Constant Exponential growth In the chaotic regime, deep networks with random weights create a space-filling curve that exponentially expands in length without decreasing local curvature, leading to an exponential growth in the global curvature. References A smooth manifold: How does its curvature and volume change? M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336 S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232 Code to reproduce all results: github.com/ganguli-lab/deepchaos Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome, Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support.
© Copyright 2025 Paperzz