Midterm Exam Solutions

Midterm Exam: Networks
instructor: xaq pitkow
1
Perceptron
Recall that a single perceptron computes r = sgn (w · s ✓) for input activity pattern s 2 { 1, +1}N ,
synaptic weights w and threshold ✓. Construct a two-layer perceptron to compute the XOR function:
The single output neuron must fire if one and only one of its two input neurons fires. (15 points)
XOR
?
One possible answer is shown in Figure 1.
+1
–1
1
1
–1
1
1
1
+1
–1
+1
+1
Figure 1: One possible multilayer perceptron to calculate an XOR.
2
Efficient coding
If two inputs x = (x1 , x2 ) have zero mean and covary according to Cov(x) = X = ( 1c 1c ) then what
2 ⇥ 2 matrix A would transform those inputs x to noisy outputs y = Ax + n that convey the most
information about x? Interpret your solution in light of redundancy reduction. Assume the noise is
drawn from independent zero-mean normal distributions with variance ⌘, so n ⇠ N (0, ⌘I), and the total
output power of y is constrained to P .
Some step-by-step guidance:
1
1. What is the covariance Y of the output y, in terms of the matrices A, X, and ⌘I? (5 points)
Y = A⌃A> + ⌘I
2. It’s easiest to work in the eigenvector basis.1 What are the two eigenvalues i and eigenvectorspui
of the input covariance X? (5 points) Eigenvalues ± = 1 ± c with eigenvectors u± = (1, ±1)/ 2
3. Assume that A is diagonal in the same basis of eigenvectors ui , with eigenvalues ↵i . What are the
2
eigenvalues i of the output covariance Y ? (5 points) ± = ↵±
±+⌘
P
P
4. The information of our system is I(x; y) = i 12 log ( i /⌘), and the power is P = i i . What
eigenvalues ↵i of A optimize information transmission under the power constraint? (You don’t
have to solve explicitly for the Lagrange multiplier.) (10 points)
P
Solve 0 = @↵@± (I(x; y)
( i i P )) with lagrange multiplier .
@
1X
↵2 i + ⌘
0=
log i
@↵± 2 i=±
⌘
↵± ±
= 2
2 ↵± ±
↵± ± + ⌘
so
↵± =
In this case we can solve for
P =
yielding
=
1
P
+2⌘
1+⌘
+
=
1
2
=
1
s
by observing that
+
⌘
⇣
X
↵i2 i
+⌘
i=±
P
!!
(1)
(2)
1
2
⌘
1±c
(3)
(4)
p1
1+c
2⌘ (1 + ⌘)
⌘2
(1 + c) + ⌘ +
⇣
p1
1 c
⌘2
(1
c) + ⌘
(5)
(6)
.
5. What is the matrix A in the original (neuron) basis? (5 points) Define U = (u+ , u ) =
Then
!
r
p1
p1
p1
p1
+
1
1
1+c
1 c
1+c
1 c
A = U ↵0+ ↵0 U > =
⌘
p1
p1
p1
p1
+
2 2
1+c
1 c
1+c
1 c
p1
2
( 11
1
1 ).
(7)
6. How does this solution relate to redundancy reduction? (5 points)
A⌃A> is proportional to the identity matrix: A has removed the correlations in ⌃, and thus
removed the redundancy in the di↵erent neurons y.
1
For time- or space-invariant statistics, our eigenvectors were the Fourier modes.
2
3
Linear recurrent networks
Consider a three-dimensional firing rate neural network,
ṙ =
with the connectivity matrix
r + Jr + I(t)
0
1
@
a
J=
a
and external input I(t).
a
1
a
1
a
a A
1
1. Find the eigenvalues of the matrix that determines the network dynamics. What is the eigenvector
corresponding to the simplest eigenvalue? (5 points)
p
= {0, ±ia 3}
v 1 / (1, 1, 1)
For the next two parts, you do not need to solve for the equations explicitly. Qualitative answers
based on the eigenvalues and eigenvectors are sufficient.
2. What happens to the network if every neuron is driven by the same constant input, I(t) = (1, 1, 1),
for two initial conditions: r(0) = (0, 0, 0) or (1, 0, 0)? Sketch the dynamics in both cases. (10
points)
The input lies wholely in the direction of v 1 , which has = 0, so the network perfectly integrates
this constant over time, generating r(t) = (1, 1, 1)t for the initial condition
of (0, 0, 0). For the inip
tial condition of (1, 0, 0), the network oscillates with frequency 1/a 3 in the orthogonal directions,
and again grows linearly along (1, 1, 1).
3. What happens to the network if every neuron is also driven by independent noise, in addition to
the constant input? Sketch the dynamics. (10 points)
The mean obeys the same dynamics as before. The noise is integrated in the (1, 1, 1) direction,
yielding a random walk. The noise in the orthogonal directions gradually increases the variance.
4
Nonlinear dynamics
Consider a network of two identical populations that are coupled through excitation,
⌧ Ė1 =
E1 + S(3E2 ) + I
⌧ Ė2 =
E2 + S(3E1 ) + I
You can assume that ⌧ = 20. Let
S(x) =
(
100x2
1202 +x2
for x
0 for x < 0
3
0
Figure 2: Linear dynamics. (1) Sketch of network. (3) Dynamics with constant input and r(0) = (0, 0, 0)
(left) or r(0) = (1, 0, 0) (right). (4) Same, but with additive noise.
1. Assume that the input I = 0 to both populations is equal. Give the equation for E1 = E2 = E
when both population activities are equal. Sketch or plot the right hand side and indicate the
fixed points and their stability if you start with initial conditions E1 (0) = E2 (0). (5 points)
See Figure 3.
2. Suppose that the system starts with I = 0 at the lower state. Given this equation, how large
must the input to the two populations be so that the activity jumps to the higher level? That
is, at what value of I does the lower stable fixed point disappear? (When I = 0, the function
E + S(3E) has a minimum at E ⇡ 8.79). (10 points)
Denoting the position of the minimum by E ⇤ ⇡ 8.79, if I > ( E ⇤ + S(3E ⇤ )) ⇡ 4.18 then the
minimum rises above 0, so the left two fixed points disappear, and Ė(0) > 0 so E will increase
until reaching the right (only) fixed point.
3. Suppose that the system is noisy. Consider inputs I = 1 and I = 1. For which value is it easier
to escape the lower stable state? For which value is it easier to escape the higher stable state?
You do not need to compute anything, but explain your answer. (10 points)
The larger input I = 1 makes it easier (more probable) for noise to exceed the value of 4.181=3.18 needed for the lower fixed point to disappear transiently. The smaller input I = 1 makes
it harder (less probable) for noise to exceed the value of 4.18+1=5.18 needed to destabilize the
lower fixed point.
The reverse is true for the upper state. The negative input I =
4
1 pushes the curve and upper
S(3E)–E
10
20
40
60
80
100
E
–10
Figure 3: Equilibrium is reached for E = E1 = E2 when Ė = 0 = E + S(3E). The plot here shows
there are three roots. The slopes are , +,
respectively, implying that the outer points are stable
(small perturbations from these values shrink), whereas the middle point is unstable (small perturbations
grow). Starting from E(0) = 0 the system will remain there.
local maximum down, leading to derivatives Ė with larger negative values that are needed to
decrease the activity E enough to switch to the lower state.
5