Ben Poole , Subhaneil Lahiri , Maithra Raghu , Jascha Sohl

Exponential expressivity in deep neural networks through transient chaos
1
1
2,3
3
Ben Poole , Subhaneil Lahiri , Maithra Raghu , Jascha Sohl-Dickstein , Surya Ganguli
1
1
Stanford University, 2Cornell University , 3Google Brain
Introduction
Signal propagation in random neural networks
Our work: expressivity in neural networks with random weights:
● New framework for analyzing random deep networks using mean
field theory and Riemannian geometry
● Random deep nets are exponentially more expressive than shallow:
deep → shallow requires exponentially more neurons!
● Can represent exponentially curved decision boundaries in input
Self-averaging approximation: for large Nl, average over neurons in
Ordered regime: nearby points become more correlated
Autocorrelation
Depth
a layer ≈ average over random weights for one neuron
Length propagation
Recursion relation for the length of a point as it propagates through the network:
q* - fixed point of iterative map
Increasing σW
Goal: develop a theoretical understanding of deep neural networks
● Expressivity: represent a large class of functions
● Trainability: tractable algorithms for finding good solutions
● Generalizability: work well in unseen regions of input space
Expressivity in random neural networks
Fully-connected neural network
with nonlinearity :
Input
Chaotic regime: nearby points become decorrelated
Output
Riemannian geometry of manifold propagation
0
x (θ)
1
x (θ)
2
x (θ)
Correlation propagation
3
x (θ)
Local stretching,
Independent random Normal
weights and biases:
weight variance
1
Local curvature
Global curvature
(Grassmanian length)
bias variance
Experiment: random deep nets more expressive than wide shallow nets
error
The correlation map always has a fixed point at 1 whose stability is determined by
the slope of the correlation map at 1:
Theory: how do simple input manifolds propagate through a deep network?
A single point: When does its length grow or shrink and how fast?
acts like a local stretching factor:
1
1
1
A pair of points: Do they become more similar or more different?
< 1 : nearby points come closer together
> 1 : nearby points are driven apart
Local Stretch
Local Curvature
Grassmannian Length
Ordered:
1
<1
Exponential decay
Exponential growth
Constant
Chaotic:
1
>1
Exponential growth
Constant
Exponential growth
In the chaotic regime, deep networks with random weights create a space-filling
curve that exponentially expands in length without decreasing local curvature,
leading to an exponential growth in the global curvature.
References
A smooth manifold: How does its curvature and volume change?
M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336
S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232
Code to reproduce all results: github.com/ganguli-lab/deepchaos
Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome,
Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support.