Maximum Entropy Models in Neuroscience
Matthias Hennig
Today's topics
1.
2.
3.
4.
5.
Maximum entropy principle
Binary representations of neural activity
The independent and pairwise binary MaxEnt models
Fitting the models
Assessing the models and some applications
Aim
I
I
I
I
We have activity recorded from a population of neurons.
Assumption: some aspects of the activity depend on
interactions between neurons.
But we can't access the detailed parameters of the neurons.
Hence there are insucient constraints for detailed modelling.
Question:
Can we infer some properties of system by using models solely
constrained by the available data?
The MaxEnt principle
Informally:
I We seek the probability distribution with the highest entropy
given some constraints.
I Constraints: probabilistic quantities (e.g. distribution moments
from data).
I MaxEnt: preserves all invariances not subject to these
constraints.
I MaxEnt models what we know (from the data), but not more.
I MaxEnt helps to see what else we might want to know.
Relation to physics:
I Entropy in statistical mechanics and information theory are the
same thing (Jaynes, 1957a,b).
I 2. law of thermodynamics: Physical systems go into states of
higher entropy when they move towards equilibrium.
I MaxEnt corresponds to the state with the most microstates.
MaxEnt: range constraint
We have N dierent states of our system with unknown
probabilities pi .
We know nothing except that the system is nite!
What is our best guess for the pi 's?
Write constraint as Lagrangian:
L = −
N
X
i =1
pi log pi + λ
N
X
i =1
!
pi − 1
Now nd maximum:
∂L
∂ pj
= − log pj − 1 + λ = 0
Solution: the uniform distribution
pj = e λ−1 =
1
N
MaxEnt: constraint on mean
A measurable quantity r has a mean νr :
N
X
i =1
L = −
∂L
∂ pj
=
e λ1 −1
=
pj
=
=
=
N
X
i =1
ri pi = νr
pi log pi + λ1
N
X
i =1
!
pi − 1 + λ2
− log pj − 1 + λ1 + λ2 rj = 0
νr
PN
λ r
i =1 r i e 2 i
e λ1 +λ2 rj −1
νr e λ2 rj
PN
λ r
i =1 r i e 2 i
−rj νr
νr e
N
X
i =1
!
ri pi − νr
MaxEnt: constraint on mean
A measurable quantity r has a mean νr :
ˆ
0
L = −
∂L
∂p
=
ˆ
0
∞
∞
rp (r )dr = νr
p (r ) log p (r )dr + λ1
ˆ
0
∞
p (r )dr − 1 + λ2
ˆ
∞
0
− log p (r ) − 1 + λ1 + λ2 r = 0
p (r ) = e λ1 +λ2 r −1
e λ2 r
= ´∞ λ r
2 dr
0 e
=
νr e −r νr
[n.b. constraints on mean and variance yield the Gaussian
distribution.]
rp (r )dr − νr
MaxEnt: general solution
ˆ
∞
ˆ0 ∞
0
∞
0
+
ˆ
∞
0
M
X
i =1
=
ν1
f2 (r )p (r )dr
=
ν2
.
.
.
ˆ
L = −
f1 (r )p (r )dr
fM (r )p (r )dr
p (r ) log p (r )dr + λ0
λi
ˆ
0
∞
M
p (r ) = e λ0 + i =1 λi fi (r )
P
=
νM
∞
ˆ
0
fi (r )p (r )dr − νi
p (r )dr − 1
Multi neuron recordings
[data from Schneidman et al. (2006)]
Multi neuron recordings
Pairwise correlations in network activity tend to be weak!
How can we investigate the role of interactions between neurons?
[data from Schneidman et al. (2006)]
Binary representation of spiking activity
The state of each neuron is modelled as a spin variable:
(
σ (t) =
i
1
spike in [t : t + ∆t]
−1 no spike in [t : t + ∆t]
Activity patterns of N neurons: {σ} = {σ1 , σ2 , . . . , σN }
2N patterns
We now look for models to reproduce this stationary distribution.
[picture from Ohiorhenuan et al. (2010)]
The independent model
Constraint: average activity for each of N neurons recorded over
time T:
< σi >t =
T
1X
T t =1
σi (t )
Maximum entropy distribution:
PN
e i =1 hi σi
p ({σ}) = QN
i =1 2 cosh(hi )
This is a model of a population of independent neurons with
constant random activity.
The pairwise model
Average ring rates for each of N neurons: < σi >t = T1 Tt=1 σi (t )
P
Pairwise (equal time) correlations: < σi σj >= T1 Tt=1 σi (t )σj (t )
Maximum entropy distribution:
P
p ({σ}) =
Z=
X
1
Z
PN
J σσ
h σ +1
e i =1 i i 2 i 6=j ij i j
PN
P
1
e i =1 hi σi + 2 i 6=j Jij σi σj
P
{σ}
Lagrange multipliers, chosen such that constraints are satised:
hi : bias of neuron i
Jij = Jji : symmetric coupling strength between neuron i and j
Ising model in statistical physics
This model is equivalent to the spin glass model with Hamiltonian
(Energy):
H=−
N
X
i =1
hi σi −
N X
N
1X
J σ σj
2 i =1 j =1 ij i
σi are the magnetic moments or spins
< σi > is the magnetisation
hi is the local eld
Jij is the symmetric coupling strength between spins
Maximum entropy distribution is the Bolzmann distribution:
H
−
P ∝ e kB T
In our MaxEnt models, the temperature T is absorbed into the
elds and couplings.
Strategy to nd MaxEnt distributions
Independent model:
PN
e i =1 hi σi
p ({σ}) = QN
i =1 2 cosh(hi )
hi = tanh−1 (< σi >t )
Pairwise model:
δ hi
δ Jij
I
I
I
= η (< σi >data − < σi >model )
= η (< σi σj >data − < σi σj >model )
Gradient ascent: Bolzman learning with learning rate η
Costly: Z has to be evaluated for every optimisation step
Approximation: evaluate Z by Monte Carlo sampling, but the
size of a Monte Carlo run should be similar to the data size, or
use mean eld approximations (see e.g. Roudi et al. 2009).
Independent and pairwise models
Data from 10 neurons simultaneously recorded in the salamander
retina. For exampe, 1011001010 occurs 1/min, but is predicted 1/3
yrs by independent model.
from Schneidman et al. (2006)
Evaluating the model
from Shlens et al. (2006)
Evaluating the model
Likelihood of data under given model (with ci are estimated pattern
counts):
P
ci )! Y
L = log Q i
P ({σ}i )ci
c
!
i i
i
(
This is related to the Kullback-Leibler divergence between data (d )
and model (m):
DKL (Pm ||Pd ) =
X
i ∈{σ}
p ({σ}i )
pm ({σ}i ) log m
pd ({σ}i )
Likelihood tests
from Shlens et al. (2006)
Inferring functional connectedness
from Schneidman et al. (2006)
Multi-information in an interacting system
Idea: there is a hierarchy of entropies
S1 ≥ S2 ≥ S3 ≥ . . . ≥ SN = S ({σ})
Joint entropy:
S ({σ}) = −
X
i
p ({σ}i ) log p ({σ}i ) ≤
X
i
S ({σ}i )
Multi-information:
Entropy dierence between independent model
and full system, or DKL betweenQ
joint distribution and independent
model produced from marginals j p (σj ):
I ({σ}) =
X
i
S ({σ}i ) − S ({σ}) =
X
p ({σ}i )
p ({σ}i ) log Q
j p (σj )
Decomposition of multi-information
"
I ({σ}) = S
#
Y
p (σi ) − S (p ({σ})) =
X
i
N
X (k )
=
IC ({σ}i )
k =2
p ({σ}i )
p ({σ}i ) log Q
j p (σj )
With connected information:
IC(k ) ({σ}i ) = S p̃ (k −1) ({σ}i ) − S p̃ (k ) ({σ}i )
h
i
h
i
Where p̃ (k ) ({σ}i ) is the maximum entropy distribution consistent
with all k th -order marginals (single, pair, triplet etc.).
This allows to measure the contributions of conditionally
independent, second and higher order interactions to the total
entropy dierence between independent model and data. See also
Schneidman et al. (2003).
Multi-information in retinal recordings
from Schneidman et al. (2006)
But note: relative measures such as I2 /IN improve with very small
time bins, low ring rates or small groups (Roudi et al., 2009). It is
therefore not automatically possible to extrapolate network
properties from small models.
Pairwise MaxEnt as Null model
This shows that higher order interactions are relevant at the scale
of cortical microcolumns, but not beyond. Data from macaque V1,
Ohiorhenuan et al. (2010)
Summary
I
I
I
I
I
Neural activity can be cast into binary representations which
can be analysed with MaxEnt models.
Pairwise MaxEnt models provide a surprisingly good
description of multi-neuron spike pattern statistics.
This works even though no temporal dependencies are
modelled (for such approaches, see e.g. Marre et al., 2009;
Roudi and Hertz, 2011).
MaxEnt models can be used to probe functional connectivity.
MaxEnt models can be useful Null-models for pairwise vs.
higher-order interactions.
References
Cover and Thomas, Chapter 11
Jaynes E (1957a) Information Theory and Statistical Mechanics. Physical Review 106:620630.
Jaynes E (1957b) Information Theory and Statistical Mechanics. II. Physical Review 108:171190.
Marre O, El Boustani S, Frégnac Y, Destexhe A (2009) Prediction of Spatiotemporal Patterns of Neural
Activity from Pairwise Correlations. Physical Review Letters 102:58.
Ohiorhenuan IE, Mechler F, Purpura KP, Schmid AM, Hu Q, Victor JD (2010) Sparse coding and
high-order correlations in ne-scale cortical networks. Nature 466:61721.
Roudi Y, Tyrcha J, Hertz J (2009) Ising model for neural data: model quality and approximate methods
for extracting functional connectivity. Phys Rev E Stat Nonlin Soft Matter Phys 79:51915.
Roudi Y, Hertz J (2011) Mean Field Theory for Nonequilibrium Network Reconstruction. Physical
Review Letters 106:048702.
Roudi Y, Nirenberg S, Latham PE (2009) Pairwise maximum entropy models for studying large
biological systems: when they can work and when they can't. PLoS computational
biology 5:e1000380.
Schneidman E, Berry MJ, Segev R, Bialek W (2006) Weak pairwise correlations imply strongly
correlated network states in a neural population. Nature 440:100712.
Schneidman E, Still S, Berry MJ, Bialek W (2003) Network Information and Connected Correlations.
Physical Review Letters 91:238701.
Shlens J, Field GD, Gauthier JL, Grivich MI, Petrusca D, Sher A, Litke AM, Chichilnisky EJ (2006) The
structure of multi-neuron ring patterns in primate retina. J Neurosci 26:825466.
© Copyright 2026 Paperzz