Detailed parallel simulation of a biological neuronal network

COMPUTATiONAL BiOLOCY
Detailed Parallel
Simulation of a
Biological Neuronal
Network
Dean Brettle and Ernst Niebur
California Institute of Technology
L H E HUMAN BRUN, T H A T MOST COMPLEX OF COMputing architectures, has an estimated 10” nerve cells with 10“ interconnections. Computer scientists have tried to mimic its structure, creating
artificial neural networks to perform varied tasks and to further the pursuit of machine intelligence. Our interest here is different. T h e computational neurobiology work we do focuses on studying the original for its
own sake: we want to model the p-eal brain and learn how it works. This
requires a simulator that in a strict sense is both “artificial” and a “neural
network,” but is so unlike most ANNs that it is best thought of as an entirely different tool.
Researchers simulate biological neural networks for the same reasons
they simulate other complex physical systems. It is difficult and impractical, sometimes impossible, to do certain kinds of experiments on living
neural systems. For instance, though one of the most direct ways to observe neuronal activity is by microelectrodes, drawbacks to this method
abound. Living tissue can only support so many electrodes, limiting the
number of cells that can be probed simultaneously and independently.
Thus, only a small part of a complex neural system can be monitored at a
given time. Connections between cells are also hard to learn about experimentally. Although some techniques can determine where synapses are,
knowledge of their strengths is very scarce. T h e theorist is left to guess,
or a t best infer, such information based on few (and sometimes contradictory) experimental results.
WINTER
1994
1070-9924194/$4 00 0 1994 IEEE
31
COMPUTATIONAL BIOLOGY
Simulation provides theorists with a testbed
for new theories. Change the assumptions, rerun the model, and examine the results; conclusions can be more quickly reached about the
rightness of both the assumptions and the underlying theory. Simulation can also help experimentalists, in designing experiments to test hypotheses. Simulation thus helps advance both
theoretical and experimental neurobiology and
may also lead to improved artificial neural network models.
T h e simulator we describe can be used to
study a large class of biological neural networks,
whose characteristics we will specify in more
detail a little later. Several research groups are
currently using it to study how the visual system
of the cat processes information (see the accompanying sidebar). T h e cat's vision system is one
of the most studied nervous systems, and the
wealth of experimental data can be used both in
determining simulation parameters and in e ~ a l uating simulation results. Such results help to
shed light on the validity of competing theories
of the functions performed by the first stages of
the visual system.
Our simulator differs from previous work in
several respects. Simulations of single cells or
small networks of cells, though common, fail to
show the effect of large-scale interconnections.
A few researchers have implemented large
neural networks 0 1 1 massively parallel niachines," but they have invariably used highly
simplified formal neuron models. It is unclear
how appropriate these models are for understanding biological neural networks. More biologically realistic sirnulations of networks with a
neuron count comparable to what we use in our
work have been done on sequential computers.
These studies have modeled several brain substructures on several computing platforms: area
17 of the cat brain on Unix workstationsJ and on
Cray supercomputers,5 piriform cortex on Unix
workstations,6 and hippocampus on IBM mainframes.' T h e high computational dernands lead
to very long execution times, however.
Our simulator is appropriate for modeling
large networks of biological neurons, using a
fairly realistic single-neuron model and biologically plausible connectivity. It is unusual in that
it exploits the full computational power of a
massively parallel architecture-the Thinking
n/lachines C W - n o t only to solve the neuronal voltage equations, but also for the communication between neurons, which is by far
the more demanding task.
A preliminary version of some of our results
was published in a conference volume.x
32
Biological neuronal networks
Animals have varied levels of neural complexity.
iVe know that the tiny worm Caenod7abditi.r elega17s, for instance, has exactly 302 nerve cells.
Aloreover, every single worm of this species has
the same neuronal structure. W e even know the
types and locations of all connections between
these neurons-again,
they are identical from
one worm to the next-which makes it possible
to simulate a substantial fraction of this simple
nervous system in detail."
As we climb the phylogenetic tree to the
higher animals, though, the numbers of neurons
and connections increase dramatically. No simulation has begun to reach the complexity of a
mammalian brain. There is hope, however, that
we will be able to understand its function without
constructing a system of equal complexity, by
identifying certain governing principles. Such
principles are clearly at work in brain development: the plans for constructing more than 10"
cells and 10" connections in a human brain are
defined by fewer than IO5 genes. W e can thus expect a certain scalability in the complexity of
neural networks. That is, an artificial network
which emulates in principle the fundamental
structure and function of real networks should be
able to reproduce the essence of their behavior.
One governing principle of brain structure
seems to be the aforementioned large number
of connections between neurons. Without going too deeply into neuroanatomy, let's briefly
examine this connectivity in the cerebral cortex
and especially the neocoptex, the outer parts of
the lobed surface of the brain.
How are neocortical neural networks
connected?
T o efficiently simulate large neural networks
in cortical and related subcortical structures, we
must know how they are put together in nature.
In general, the neocortex is massively but spamly
connected. "Massive7' means that each neuron
receives input from and provides output to a
large number of other cortical cells, typically on
the order of a few thousand (the number of connections with subcortical areas is comparatively
small). "Sparse" means that the probability of
two randomly chosen neurons being connected
by a direct synapse is very small: less than one in
a million in all of the cortex; higher for two neurons in the same cortical area. (The probability
that they are connected by indi?*ectsynapses of a
low order is, however, high: a given neuron is indirectly connected to practically any other via no
more than five or so intermediate neurons.)
IEEE COMPUTATIONAL
SCIENCE& ENGINEERING
Another property o f the cortex i s that, a l though t h e majority of connections a r e r e l a tively local, this i s n o t by any means the case for
all, and scientists expect t h a t more distant connections play a crucial role in cortical function.
Furthermore, the connection scheme i s n o t well
ordered (that is, n o t “crystalline”). Thus far nobody has been able t o give simple deterministic
r u l e s about t h e details of the connection
schemes, and i t m a y well be impossible t o do so.
This poses problems for a n efficient implenien-
Simulating How a Cat Sees
This article discusses how to efficiently implement simulations of biological neural networks, but how is this implementation actually being used?
The prime example is in simulating
the primary visual pathway of the cat.’-3
This pathway includes the retina of the
eye, a brain structure called the lateral
geniculate nucleus, and the brain region
called area 17. Layer IV of this area is
the principal target layer for signals
from the LGN.
Our model of the cat visual system is
based mostly on work by Christof Koch
and Florentin Worgotter at Caltech who
implemented it on Unix workstations
but had to fight very long execution
times (several days for the simulation of
a simple experiment). This prompted us
to develop the parallel simulator, which
can increase execution speed by a factor of 100 or more while generating results of higher accuracy. The model
consisted of a 5 x 5-degree patch of the
peripheral visual field. All retinal ganglion cells and all geniculate relay cells
in this patch were explicitly simulated,
as were a significant fraction of all inhibitory cells in layer IV of the primary
visual cortex. (Since the study was
about intracortical inhibition, simulation
of excitatory cells was considered less
important.)
Each neuron in the simulation was
characterized in terms of its membrane
parameters and by the connections it
makes with other neurons. All synaptic
parameters were chosen randomly from
Gaussian distributions with means and
variances in agreement with known ex-
WINTER
1994
tation of such networks o n machines w i t h
cally connected architectures.
lo-
How do neurons communicate?
Seocortical neurons communicate mostly by
means of electrical impulses called action potentials,
o r more informally, “spikes.” (Our model neglects other m e a n s o f neuronal communication,
which are much less studied than spikes.) W%en
communicating, neurons are essentially “binary”:
the cell i s either o n or off. This binary character
perimental results. Input to the simulator
was generated by a simulated retina that
is based on a multistage filter model. In
principle, any kind of visual stimulus can
be delivered to this simulated retina. In
the simulations performed thus far, we
restricted ourselves to stimuli similar to
those used in electrophysiological experiments, like moving and flashing light
bars and moving and stationary sinusoidal intensity gratings.
Connections between retina and LCN
and between LCN and cortex were constructed in agreement with established
facts from anatomy and physiology.
These facts are, however, by no means
strict constraints. They allow for many
different implementations.The detailed
structure of the “real” connectivity is
not known. One of the important questions is, what level of specificity is necessary for the different cortical functions?
On the one hand, networks that are
connected in a completely random
manner will produce useful results only
very rarely. On the other hand, if the
proper functioning of the network relies
upon the exact fulfillment of too many
conditions, it will not only be difficult to
construct but it will react drastically to
small perturbations. This would be inconsistent with the observed fault tolerance of the brain.
Together with Koch and Worgotter,
we studied the required degree of specificity using the example of intracortical
inhibitory connections. One function of
inhibition is certainly gain control: if the
cortex becomes overactive, inhibition
kicks in and dampens the overexcitation. (Suppression of inhibition by drugs
prevents gain control, inducing epilep-
tic seizures.) Other functions have been
postulated for intracortical inhibition,
which contribute to shaping the response of individual
Researchers
formerly believed that such specific
functions require very precisely crafted
connectivity schemes. What nobody understood, though, was how nature
could generate such schemes during
the organism’s development.
To test hypotheses, we constructed
several different connectivity schemes.
All of them were consistent with what
we know about cortical anatomy, but
some we meticulously crafted for the
specific purpose while the others
(though not completely random) were
quite unspecific, requiring only a few
parameters to construct the whole network. When we studied system behavior in the different cases, we found that
high specificity was not only unnecessary but in fact harmful: the less specific
inhibition schemes actually performed
better than the “hand-crafted” ones.
Nature seems to be opportunistic and
resourceful.
References
1. F. Worgotter and C. Koch, ”A Detailed
Model of the Primary Visual Pathway in the
Cat: Comparison of Afferent Excitatory and
lntracortical Inhibitory Connection Schemes
for Orientation Selectivity,” j . Neuroscience,
Vol. 11, No. 7, 1991, pp. 1,959-1,979.
2. E. Niebur and F. Worgotter, “Circular Inhibition: A New Concept in Long-Range Interactions in the Mammalian Visual Cortex,“ Proc.
lnt’l joint Conf. Neural Networks--Son Diego,
IEEE, Piscataway, N.J., 1990, pp. 11-36711-372.
3. F. Worgotter, E. Niebur, and C. Koch,
”Isotropic Connections Generate Functional
Asymmetrical Behavior in Visual Cortical
Cells,” 1. Neurophysiology, Vol. 66, No. 2,
1991, pp. 444459.
33
MPUTATIONAL BIOLOGY
does not extend to the computation of the state of
the neuron itself, of course. Excitation in the neuron builds up to a critical point and it fires; then
and only then does it generate a spike and induce
a flow of current in all neurons connected to it.
Details can be found in any book on neurophysiology and in many basic biology texts; we will not
go into the 6ne points here. Suffice it to say that
the electrical properties of neurons that we will
discuss, including capacitance, current, voltage,
and conductance, for instance, follow the laws of
physics, and must be modeled in some detail to
simulate the system well.
Recent data indicate that the temporal fine
structure of spiking patterns, on the order of
milliseconds or even less,'" may in fact be a very
important information carrier and that any
model that ignores this structure (and instead
uses average spike rates) map miss important aspects of neuronal information processing.
Therefore, our model provides realistic timing
schemes of spike activity.
Action potentials are comparatively rare
events in neocortical neurons. In the cortical
area of the primary visual system, which neurobiologists designate as area VI, spike rates are
relatively high, but even there a cortical cell will
rarely sustain a firing rate of more than 100
spikes per second. Typical responses to natural
stimuli have rates around 10 to 20 per second.
At a given time, many neurons will not get significant external stimulation, and will only fire
at a spontaneous rate on the order of one spike
per second. Since a spike lasts about I millisecond and the arrival of a presynaptic spike can
change the cell potential significantly, numerical accuracy requires that the discretization time
be smaller than I ms. W e have chosen typically
0.1 ms as a time step. Therefore, a neuron will
generate a spike only once in hundreds of time
steps even when it is highly stimulated, and
considerably less when it is not.
In summary, the most important characteristics of the neuronal systems we want to simulate
are these:
+ Cells are numerous and of different types.
+ Cells are highly interconnected.
+ Connections do not follow simple deterministic rules (as, for example, nearest-neighbor
connections).
+ Connections can be short-range o r longrange.
+ Cells communicate with each other via delayed spikes, which are binary events ("all-ornothing").
+ Such communication events are infrequent.
34
The mathematical model
Before we describe our model in some technical
detail, it will help to understand a few neurobiological tertns. Neurons are electrically polaT*ized
while at rest, which means their interior has an
electrical potential (called the resting potential) of
about -70 millivolts relative to the exterior. A
spike reverses this polarity for about a millisecond. After a spike, a~e.ihype?~oIarization
restores
the cell to the resting state. Connections between neurons, called sywapses, come in two general categories. When a signal is transmitted
across an excltntoiy synapse the likelihood that
the cell on the receiving end will spike is increased; with an inbibitoq synapse the effect is
just the opposite. In both cases, it takes some
time, usually a few milliseconds, between the
moment the spike is generated in the spiking
neuron and the moment the effect is felt in the
receiving neuron. This introduces a delay in the
equations.
Single-cell model
I4'e modeled the biological network as a set
of improved integrate-and-fire neurons that
communicate with each other via delayed impulses (spikes). W e neglect variations of the
membrane potential across the membrane; that
is, we model the voltage responses by or-dinapy
differential equations, not by p a h a 1 differential equations and therefore disregard space as
an independent variable. In the classification of
Segev," ours is a neuron model of medium
complexity and realism, in between very simple formal neuron models and very complex
compartmental models with hundreds or thousands of compartments. W e assume that the
firing threshold Kthrrshis different for every
neuron i. All other time-independent cell parameters are constant within any given neural
population (but the synaptic parameters, discussed in the next section, are in general different for each synapse). These parameters are
the membrane capacitance C, the conductance
gleak and the reversal potential Eleak for the
resting potential, the reversal potentials E,,,
and Einh for the excitatory and inhibitory
synapses, and E,Hp for the afterhyperpolarization after a spike (see below). More general
models can be implemented by obvious modifications. Each of our neurons has two timedependent state variables, the membrane voltage V j and the output state 0, (for the ith
neuron), as well as several time-independent
and time-dependent conductances: &.k
for the
leakage conductance, g,,,(t) and g;,h(t) for exciIEEE COMPUTATIONAL
SCIENCE & ENGINEERING
tatory and inhibitory synapses, respectively,
and g m p ( t ) for the afterhyperpolarization. As a
function of these variables, the membrane potential Y,(t)of the ith cell is given by the firstorder, nonlinear differential equation
Synaptic communication
The output O,(t) of the ith cell is a train of impulses (Equation 2) which is translated into a release of neurotransmitter substances at the location of chemical synapses, and then i n t o
conductance changes gq,exc(t)and gq,lnh(t)in the
postsynaptic membrane (see Equation 1). Each
excitatory connection has a weight w ~ ,delay
~ ~ ~ ,
dr3,exc,and source cell s ~ associated
,
~
with
~
~ it
(analogously for inhibitory synapses). Since there
can be more than one connection from the same
source cell to the same destination cell i, the
sums in Equation 1 are over connections, not
source cells. The conductances are again modwhere k and I are the total number of excitatory eled by convolving presynaptic spike events with
and inhibitory synapses providing input to this a functions. For excitatory synapses this means
cell. T h e contributions in the two sums stem
from synaptic input and will be discussed later.
The third term represents leakage currents and
causes V, to asymptotically approach the resting An analogous formula holds for the inhibitory
potential Eleak = -70 millivolts in the absence of synapses.
synaptic input. The last term contributes to the
T o efficiently implement Equation 1, we first
resetting of the cell’s potential after an action note that because convolution is a linear operapotential, by the mechanism described after tion, we can regroup terms and lump together
Equation 2 . If K(t) exceeds the threshold Vithresh all contributions with common reversal potenand the time since the last spike is larger than tial. Therefore we define the current I (for the
the refractory period z, an action potential is excitatory synapses) thus:
generated and relayed, with the appropriate dek
lay, to all postsynaptic cells. T h e output state
li,exc ( t )= g;i,exc ( t )
Oi(t) of neuron i is determined by
C
j=1
s(t>if vi(t>
> .Vithresh
oi(t)=
i
0
and Oj(t’)= 0 for all t‘ > t - z,
otherwise
(2)
Besides conveying the synaptic interaction
with other neurons (discussed in the next section), the output state exerts an important influence on the voltage of the spiking neuron itself,
insofar as this neuron experiences an afterhyperpolarization after producing an action potential. This function is realized by increasing
the membrane conductance g i , m p in Equation 1
with a reversal potential negative to the cell’s
resting potential; E M p = -90 millivolts < Eleak.
The shape of the conductance functions is modeled using so-called a functions, which are defined by
a(t)=
;{
ift>O
1 &-t/7
otherwise
(3)
where z is the time constant. The afterhyperpolarization is then given by the convolution of
Oj(t)with an a function, that is, by
*
g j , m P = oi(t) a(t>
WINTER
1994
(4)
W e will refer to conductance terms with the
same reversal potential as being of the same
“kind” and, using Equation 6, we can rewrite
Equation 1 as
vi> + gleak (Eleak - vi>
(7)
where n runs over the kinds (for instance, n =
exc, n = inh, ...). W e will assume that the number of kinds is K.
In passing, we note that although convolving
with a(t) at every time step of the simulation
may look computationally intensive, it can be
accomplished in only three multiply-adds by
taking the so-called 2-transform of ol(t).’’
The program needs to simulate the above cell
model for each neuron with hundreds of connections per cell. In most of our simulations of
the primary visual pathway of the cat, there
were three kinds of connections (two inhibitory
35
MPUTA TIONAL BlOLOG Y
and one excitatory). The differential equation is
solved using fourth-order Kunge-Kutta methods, usually with a time step ofO.1 ins. W e ypically used a time constant of 1 nis for the a
functions. Usually we also set the refractory period t,.at I nis. More details on typical paraineter values are given elsewhere.4
Simulator design
T h e central tasks necessary to simulate the
neural network are, at every time step, the solution of one differential equation for every simulated neuron (Equation 1 or Equation 7) and, if
the right-hand side of Equation 2 is nonzero,
the communication of the occurrence of a spike
to all cells postsynaptic to spiking neurons.
The large number of compucationally intensive
neurons suggests that each processor simulate one
neuron. (Note that although there are many more
connections than neurons, the connections perform essentially no computation.) Performing the
fourth-order Rung-Kutta integration of Equation l with K = 3 requires approxitnatelp 100
floating-point operations per cell per time step. In
straightforward iinplementations, in which each
processor communicates the occurrence of a spike
either with its presynaptic or postsynaptic partners, the time required to do this computation
tends to be swamped by the time required for interneuron communication. Later, under Spike
Propagation, we discuss more complex and efficient data representations and algorithms, which
reduce to an acceptable level the time required
for this task. ’I’he essential observation is that interprocessor communication is the most demanding task and that maximurn resources must be devoted to it. Before coming to this central point,
we want to discuss the computer architecture
used and the reasons we chose it.
Hardware considerations
T h e characteristics of the connectivity
scheme we described earlier, which are probably
essential for the computational power of biological nervous syste~ns,provide a problem for its
technical implementation. After each iteration
step, the output state of each neuron (“did it
spike or not?”) has to be communicated to all its
postsynaptic partners, but not necessarily to all
neurons in the network. W e believe that, for
this purpose, a “smart” hypercube architecture
is superior to other architectures, such as planar
or toroidal. By “smart” hypercube we mean a
hypercube structure in which routing is handled
flexibly and efficiently by dedicated hardware,
like the router units in the CM-2.
36
Such an architecture makes optimal use of the
available connections in propagating the information in few steps over long distances, while
local connection schemes must pass the information stepwise from one processor to the next.
While local connections can efficiently spread
information about the occurrence of large numbers of events over short distances and in r e p lar (spatial and temporal) patterns, they are inefficient for propagating relatively small
quantities of information over long distances in
irregular patterns. This was one of the decisive
arpnients leading to our choice of the Connection Machine 2 in 1990. Since that time, other
parallel architectures have supplanted the ( 3 - 2 .
However, the techniques we used are applicable
to almost all other massively parallel architectures, as we discuss later.
T h e CM-2 is a single instruction, multiple
data (SLMD) massively parallel computer. The
machine we used has 65,536 one-bit processors
each with 1 Mbit of local memory, and is divided into four partitions of 16,384 processors
each. (We usually used only one of the partitions; note that these hardware partitions have
nothing to do with the finer software partitions
introduced in the next section.) ‘4 message
router for each set of 16 processors handles
long-distance communications, and the routers
are connected together in a Boolean 12-cube.
T h a t is, each router is connected to 12 other
routers. Each processor can send a message to
any other processor by sending it through a t
most 12 routers, in the absence of network congestion. There is also one 32-bit floating-point
unit for every 3 2 processors.
T h e CM-2 is connected to a front-end sequential machine, in our case a Lnix workstation. T h e workstation executes the control
structures, does file access, and broadcasts the
instructions t o the CM-2 processors. (Also,
while the simiilatiorz of the network is implemented on the massively parallel machine, the
constr-uction of the network, which we do not discuss in this article, is performed on a Unix
workstation.) Only those CM-2 processors that
are activated by setting of an “activation flag”
carry out the instructions. All others sit idle. Facilities are provided for fast floating-point operations, as well as reading and writing large
blocks of data to and from the CM-2.
W e did all programming for the simulator using C/Paris 6.0, the C interface to the CiM-2
Parallel Instruction Set. This parallel, assembler-like language provided for more efficient
implementation than the available high-level
languages.
IEEE COMPUTATIONAL
SCIENCE& ENGINEERING
Spike propagation
By spike propagation we mean the process by
which a neuron communicates the occurrence
of an action potential to all its postsynaptic
partners. W e will use the following notation for
representing the connections (synapses) between neurons. Let S be the set of all defined
synapses. Then for every synapse s in the set S
let source(s), de&), weight(s), delq(s), and kind(s)
be the source cell number, destination cell number, weight, delay, and kind of synapse associated with synapse s. Furthermore, let proc,b]
represent thejth element of an array of synapses
stored on processor i.Which synapse s is stored
in each array location proc,[il and what algorithm is used to perform the spike propagation
is of decisive importance for the implementation of an efficient spike propagation mechanism. Let’s look at a few different possibilities.
First, consider the following simple implementation of communication between neurons,
which is close to biological theories but unfortunately inefficient. Let us define R, as the set of
all synapses s for which des+) equals i. Then
source(R,) is called the “receptive field” of cell i.
It is the set of all cells that can send spikes to
cell i, that is, the set of neurons presynaptic to
this cell. (For convenience we are somewhat
stretching the meaning of the term receptive
field, which in neurobiology usually refers to the
part of the sensory field a given neuron responds to, but this should not be confusing in
the present context.) Many neurobiological theories are phrased in terms of the receptive field
of a cell, and so it is a natural way of thinking
about connection schemes. Let the elements of
R, be numbered starting at 0, and let Y, refer to
thejth element of R,.
So, a simple method of propagating spikes is
the following. Let proc,[jl = r,. Then, at each
time step t , each processor i simultaneously
sends a request to processor source(pmc,[O])to
determine if that cell spiked. Then a request is
sent to source@roc,[l]),and so on, until requests
have been sent to all presynaptic cells (that is, to
source(proc,[il) for all j ] . Processor source(proc,[il)
sends a binary (yes-or-no) reply. If the reply is
yes, zueight(proc,b]) is added to the input to the
a function term for the kind specified by
kind@Yoc,~~)
in Equation 7 for V, at time t +
delay(proc,[il).
T h e time per integration time step required for communication in this algorithm is
trequest
x max(size(RJ), where trequestis the time
required to make the request and receive a reply, and max(size(R,)) is the maximum number
of synapses in the receptive field of any neuron
WINTER
1994
Processor number
Cell data
2
3
1,l
1,2
1,3
1,4
2.1
2,2
2,3
2,4
3,l
3,2
3,3
3,4
{
r
Synapse data
1
1
...
i
-e..
-
..
...
...
..
...
...
...
.
...
...
...
.
l , M 2,M3,M
...
...
...
...
i,1
i, 2
i, 3
c3.
-
...
...
...
...
3.
+
...
-
...
i, M
-
Figure 1. Direct transmittive method for storing synaptic connections.
Columns represent storage space for each of the T processors. A small
part of this space (“cell data”) holds the time-dependentvariables describing each neuron. Most of the space stores the addresses, weights,
and delays of the synapses represented by the indices i, j. Here, “1, 1”
and “1, 2” are the first and second synapses of neuron 1, and so on. If
neuron i spikes, it will sequentially send messages (arrows) to all neurons at the receiving end of synapses (i, l), (i, 2), ..., (i, M).
The magnitude of size(R,) is typically on the
order of 100 to 1,000 synapses per cell. On the
CM-2, trequest
is about 30 times longer than a
floating-point multiply.’3
A more efficient solution consists of the following. Let 0,be the set of synapses s for which
source(s) equals i. Then dest(0,) is called the
“transmittive field” of cell i, that is, the set of
neurons postsynaptic to neuron i. As before,
number the elements of 0, starting at 0, and let
Oq refer to t h e j t h element of 0,. Then, let
proc2[il = 0,. This method of storing synapses is
depicted in Figure 1. Now, at each time step t ,
all processors i that represent spiking cells-and
only these processors-simultaneously
loop
over all synapses p r o c , ~ ]and send a message to
dest(proc,[i]) consisting of weight(proc,[j]),
delay@roc,b]), and kind@roc,[jl). All other processors sit idle. Receiving processors add the
synaptic weight, with the appropriate delay, to
the a function term.
With this algorithm, the time per integration time step required for communication is
&end x max(stze(O,)), where tse,d is the time required to send a message (indicating a spike) to
a postsynaptic processor. The time tsend is less
than half of trequestbecause the latter involves
two “sends” (a request and a reply). Another advantage of this method is that there is considerably less communication traffic since only spik2.
37
TIONAL BlOLOG Y
ing cells perform the sends. Nonetheless, this
method does not solve the central problem
that spike propagation takes time proportional
to max(size(0,)). T h e fact that all nonspiking
cells sit idle in this method provides the first
hint a t a solution.
T h e solution lies in using the idle processors
to propagate spikes from cells that they do not
necessarily represent. How can we devise a
scheme to do this? Again, for simplicity, consider a single spike from cell i. Conceptually,
what we want to do is modify Figure 1 somehow, so that instead of a long sequence of arrows in one column, we have lots of columns
with shor-t sequences of arrows in them. First, let
0, and Oq be defined as above. For convenience
define a quantity iz/I to equal max(size(O,)). Also
define i div M to mean i divided by hi' with the
result rounded down to the nearest integer.
Now make the columns of synapse data in Figure 1 into rows, and vice versa, by the following
method. Instead of IettingprocJj] = $, let
where i mod iZ.2 is the remainder when i is divided by M . As shown in Figure 2, this amounts
to a periodic transposition of the two-dimensional array proc into T/iV pap-titions, where T is
the total number of neurons. T o see this, take
the special case of 0 2 i < :21.Then the above expression reduces to proc,[i] =
Similarly, for 1Z.l 2 i < 2 M , we would have
proc!i,f.l,,[i- MI = 8!j. For each partition of M
Processor no.
1
2
3
4
i
M-1
M
Cell data
Synapse
data
Figure 2. A transposed transmittive storage method for synapse connections helps balance the communication load. The storage space of
processor i no longer holds the synapses of neuron i. If neuron i generates a spike, all M processors propagate it (arrows). (The "cell data" section still contains the same data as in Figure 1, because each processor
computes the new state of one neuron and thus needs the cell parameters in its local storage.) Only one partition of M processors is shown.
38
processors the two-dimensional array proc is
transposed. T h e synapses in 0, are distributed
over processors M.
(i div hi') through M . (i div hf)
+ M - 1.
T h e advantage to the representation in Figure 2 is that a spike on cell i can be propagated
to all destination cells in dest(0,) in parallel.
However, for this algorithm t o work, all
processors in the same partition as cell i must
know that cell i spiked. At first glance, this
would appear to take M sends, and the method
would present no advantage. However, the
CM-2, like many other parallel platforms, is capable of performing communication that occurs in a regular pattern very quickly. By organizing the hypercube connectivity into a binary
tree structure, spreading the number of a spiking cell to all other cells in the same partition
of M processors requires a time tSpredd proportional to log2 iW.For iZ.2 = 1,024, as in our implementation, this time is only a small fraction
of IW' t,,"<i.
T h e only other concern is what happens
when multiple cells in the same partition spike
during the same time step. These spikes must
be propagated sequentially. T h e CM-2 Paris
instruction set provides several instructions
that combine the spreading operation described
above with various logical operations (such as
minimum and maximum). By using this feature,
it is easy to propagate spikes sequentially from
different cells in the same partition.
T h e algorithm runs as follows within each
partition (all partitions perform the algorithm
in parallel). For each cell i' that spikes within a
partition, the number i' mod M is spread to all
other processors in the same partition, and then
each processor i in the partition simultaneously
sends the information associated with synapse
procr[i' mod
to cell dest@roc,[i'mod
This
information is then available to all destination
cells for the next time step in the integration of
Equation 7.
T h e average time per iteration required for
communication in this algorithm is
iw
iw).
where At is the time step in seconds, and r is the
average spike rate for one cell in impulses per
second. Note that the time complexity is still
linear in M. However, the prefactor ?,At may be
very small. In simulations of cortical circuits, Y
is on the order of 10 impulses per second, At is
on the order of
seconds, and M is on the
order of lo3. In this case, rAtiV2 is approximately
one, and this method of spike propagation is
IEEE COMPUTATIONAL
SCIENCE& ENGINEERING
several hundred times faster than the methods
described earlier!
Note that this algorithm will not solve the
load-balancing problem entirely, since many
more spikes may occur in one partition than in
another. However, in actual simulations, this
method reduces the time required for spike
propagation to a level comparable to the time
required for integration in our simulations.
The “transposed” algorithm is not specific to
the CM-2. Although our implementation takes
advantage of the “partitioned broadcast” capability of the CM-2 hardware, this feature can be
emulated efficiently in software on architectures
that do not support it in hardware. On a binary
hypercube architecture, a partitioned broadcast
to M processors can be done in 0(10g2 M) parallel local sends. O n a grid architecture, the
broadcast will take O(&) parallel local sends.
On grid architectures with little or no penalty
for nonlocal sends, it may be better to broadcast
in O(log2 M) parallel nonlocal sends by doing
nonlocal sends to the processors that would be
connected if they were in a hypercube.
Although it is possible to implement partitioned broadcast (and thus the transposed algorithm) on almost any massively parallel architecture, the decision to use this algorithm for a
particular problem depends on several factors.
Most importantly, the problem should be communication-bound. If the bulk of the time is
spent doing computation, faster communication
will not help much. Second, the frequency of
communication must be low, and the “fan-out”
high. Third, this method is most useful if the
communication patterns are irregular and over
long distances. Also, if all processors need to
send at the same time or if a message need only
be sent to a few processors, overhead will make
the transposed algorithm somewhat slower than
the direct algorithm. Last, the user should analyze the time needed for partitioned broadcast
to M processors versus the time required to do
M nonlocal sends.
Performance analysis
In order to accurately compare the performance
of the last two spike propagation algorithms described, we implemented both the direct transmittive field algorithm (direct algorithm for
short) and the transposed transmittive field algorithm (transposed algorithm) in the same
program. Because we were only interested in
comparing the time required for spike propagation, we did not simulate the neuronal equations in this timing program.
WINTER
1994
.o001
,001
.01
.1
1
Probability p of cell spiking at each time step
Figure 3. Time required for the direct algorithm as a
function of p, the
probability of a cell
spiking at each
time step. For p
less than about .02,
the time is nearly
constant. For larger
values of p, network congestion
increases the time
moderately.
W e introduced spikes into this pseudosimulation as independent Poisson processes on each
neuron (that is, all neurons had the same probability of spiking at each time step). W e randomized the proc,[ll synaptic matrix and set the number of neurons M in the transmittive field to
1,024. All experiments were performed on a
16K time-sharing partition of a 64K CM-2. W e
measured the time required for each algorithm
to propagate all spikes occurring in the simulation in one time step for different probabilities p
of spiking at each time step. These timings include all overhead associated with each algorithm (such as the time required for the spread
operation in the transposed algorithm).
Each neuron determined whether or not to
spike by choosing a random number x between
0 and 1 and comparing it to the probabilityp. If
x was less than p , the neuron spiked. Next, we
measured the time required to propagate all
spikes via the direct algorithm. Finally, we measured the time required to propagate all spikes
via the transposed algorithm. These two times
were recorded and the procedure repeated for a
new value of p . W e used several safeguards to
ensure accuracy in timing.
Figure 3 shows the time required for the direct algorithm as a function of p. Note that the
time required does increase significantly (from
1.4 to 5.4 microseconds) as p increases. This is
because as more cells spike at the same time,
more processors must send at the same time and
the CM-2 communication network gets congested. This problem is even worse for the
transposed algorithm because at least M = 1,024
processors send a t the same time. This increase
in network traffic results in ts,,d for the transposed algorithm being about 1.5 times longer
than ts,,d for the direct algorithm.
Figure 4 compares the speeds of the direct algorithm and the transposed algorithm as a func39
COMPUTATIONAL BIOLOGY
tion ofp. The upper (direct) curve is identical to
the curve in Figure 3. Note that while the time
required rises much niore rapidly for the transposed algorithm than the direct algorithm, it
starts from a much lower value for small p and it
takes significantly less time for p < .5. T h e
transposed algorithm gave a peak speedup of
454 a t p = .00012 (or 1.2 impulses per second a t
a time step of 0.1 ins, corresponding approximately to spontaneous spike rates in cortex).
Figures 3 and 4 correspond to N = 16 partitions
of A4 = 1,024 neurons each, for a total of 16,384
neurons on 16,384 processors.
T h e highest theoretically possihle speedup,
obtained if there is exactly one spike in every
partition a t every time step, is equal to M . On
average, the speedup is determined by the maximuin number of spihng neurons per time step
in any partition, since the processors in all partitions have to wait until the last partition has
propagated all of its spikes. For example, suppose Partition 2 contains five spiking cells. If all
other partitions contain fewer than five, they
How Many Cells Will Spike at Once?
Let n(k) be the probability that a given partition has exactly k
spikes. We denote the probability that a given partition has less than
k spikes by
fI(k) =
2n(i)
,=O
The probability that m partitions have exactly k spikes and the remaining N - m partitions have less than k spikes is
The probability that k is the maximum number of spikes in any partition is obtained by summing the above expression over all m. This
probability can then be used to obtain the expected maximum by a
weighted sum over all k
For independent neurons with a spiking probability of p ,
the binomial distribution of k events out of M,
n ( k )=
n(k) is
p k ( l- P ) " - ~
For low spike rates, this can be replaced by the Poisson distribution,
OOOlL
0001
r(k,w w .
40
h(k) is the incomplete r function h(k) =
001
01
1
1
Probability p of cell spiking at each time step
Figure 4. Execution time of the direct algorithm as
a function of p varies much less than that of the
transposed algorithm. If all cells fire at each time
step, the transposed algorithm is actually at a disadvantage. Thus the two curves cross just below
p = 1. As expected, they differ most at the smallest
values of p.
will sit idle (after propagating their spikes) until
Partition 2 has propagated all its spikes. Thus
the speedup is driven by the expected maximum
number of simultaneously spiking neurons in
any partition of a system consisting of N partitions, each with 144 neurons. In the accompanying sidebar we briefly derive this quantity.
Figure 5 shows the expected time-averaged
maximum number of spikes, computed from
Equation C in the sidebar, for A4 = 1,024 and
different values ofp. It is a mildly growing function of the number of partitions, which shows
that performance will not be limited crucially by
changing the number of partitions. Therefore,
we expect that the algorithm scales well with increasing network size and that the performancelimiting factor is the activity level in the network rather than the size of the network. This is
also evident in Figure 6, which shows the effective speedup we obtained in simulations with
16, 32, and 64 partitions, compared to the highest theoretical (ideal) speedup. T h e latter would
be obtained if the transposed algorithm were
limited only by Equation C and required no additionalornmunication or other overhead. Using Ntllax@,
144, -\? from Equations C and D it
is clear that for independent neurons with spiking probabilityp this ideal speedup is given 1 1 ~ 7
;VI
AT,,, ( p , M , N )
~
where 1 = pM. In this case,
I
'
"','"I
(9)
T h e increase of tSe,,(l
due to network congestion mentioned above explains the largest part
IEEE COMPUTATIONAL
SCIENCE & ENGINEERING
of the difference between theory and experiment. The rest we can attribute to the time required for the spread operation and other additional overhead associated with the transposed
algorithm. Looking a t the graph of the 16-partition simulation in Figure 6, we can see that a t
p = 0.0010 (or 10 impulses per second per cell)
the speedup is a factor of 106, and at p = 0.0081
(or 81 impulses per second per cell) the speedup
is a factor of 24. Thus, even at the highest realistic sustained spike rates for every single neuron, the speedup is still quite substantial. Note
that usually the average activity of a simulated
network is much less than 10 ips even during
stimulation, since at any given time, only a fraction of all neurons fire a t an elevated rate.
T h e CM-2 allows a single physical processor
to emulate several virtual processors. T h e ratio
of virtual processors to physical processors is
called the VP ratio. By repeating the timing
comparisons described above at different VP ratios, we began to investigate how speedup is affected as the number of neurons increases. Figure 6 plots speedup as a function of p for three
different VP ratios (that is, different numbers of
neurons) with M = 1,024 neurons per partition.
In all cases, the performance is essentially limited by the expected maximum number of spikes
in a partition, whose effect is expressed by
Equation 9, and the additional overhead originating from the higher VP ratio is small. This
corroborates our earlier conclusion that the algorithm scales well with the size of the simulated system.
0
20
40
60
80
100
Number of partitions
Figure 5. Speedup of the transposed over the direct algorithm depends on the average number of
spikes per time step in the partition most active
during each step. Here this average maximum is
graphed as a function of the number of partitions,
for three values of the spiking probability p with
M = 1,024 neurons per partition.
electrophysiological experiment required a
CPU time on the order of several days. As we
showed, our implementation of essentially the
same model on the Connection Machine reduces this time significantly. (Moreover, our
simulator solves the neuronal dynamics in
C
onstructing numerical models of nervous
systems is a promising approach to understanding them better. W e have modeled one of the
most complicated of all nervous systems, the
neocortex of higher mammals. Obviously, our
model is very simplified: it comprises only about
16,000 neurons, a few different ionic currents in
each neuron, and so forth. Nevertheless, we believe that this implementation has the potential
to support what are in some respects the most
realistic models of neocortical networks. In particular we are referring to the realism of the
connection schemes, which we have only discussed in passing in this article but which are
treated in more detail elsewhere (see the references for the sidebar on page 3 3 ) . T h e price
paid for this realism is that considerable computing power is required. A very similar simulation has been run previously on Unix workstat i o n ~ but
, ~ even the simulation of one simple
WINTER
1994
Figure 6. Speedup of the transposed algorithm over the direct algorithm decreases only moderately when more neurons, corresponding to
higher VP ratios (see text), are simulated. The ideal speedup from Equation 9 (shown for N = 16 partitions only, graphs for larger N would be
slightly lower) essentially determines the observed speedup. As expected, speedup approaches unity as p approaches 1 in all cases.
41
MPUTATfONAl BfOLOGY
Equation 1 with a fourth-order Runge-Kutta
algorithm instead of the simple first-order
explicit Euler algorithm used in the sequential
simulation. Therefore the influence of numerical errors, although we did not systematically
compare them, should be much reduced.)
Using the “virtual processor” capability of the
CM-2, which allows each processor to simulate
more than one neuron, and the full local memory of the machine, the largest number of
neurons we can simulate without any change of
algorithm is 4,194,304, or 4111. (This is for a
maximum of 1,024 connections per neuron.)
Although we did not study systematically how
total execution time scales with the size of the
simulated system for more than 16K processors,
we expect it to be basically independent of the
number of neurons, as long as additional neurons are distributed on additional physical
processors. W e have shown that adding neurons
on viy-tual processors increases the required execution time a t a rate only slightly higher than
proportional to the number of neurons.
Although our siinulator has been used to
model a particular biological neural network,
the primary visual pathway of the cat, it has
broader possibilities. It is suitable for modeling
all neural networks with the six characteristics
we initially described, just by using the appropriate connection schemes and single-neuron
characteristics.
Speculating further, although we devised the
system to model biological neural networks, it
might have use beyond this field. Consider that
an important ingredient of our implementation
is the change of role of the processors from
computing to communicating. This might be
applicable even more generally, namely in systems in which communication is (a) rare, (b)
nonlocal, and (c) irregular. Problems like digital
circuit simulation and communication network
simulation often satisfy all three of these criteria. Other network Simulations, like analog circuit simulation, may seem to require constant
communication, thereby violating (a). However,
in many such cases communication can be made
more rare by using methods that only communicate “significant” changes.
In a recent editorial in this magazine [CS&E
Summer ’94, p. 11, Francis Sullivan suggested
that “for applications requiring extensive unpredictable communication ... massive parallelism
may not be helpful [but] some applications ...
might be reformulated to use regular communication.” W e feel that the simulation of large biological neural networks fits this description
well. Such a simulation requires extensive un42
predictable communication, but we have shown
how the problem can be translated into one using regular communication structures. This has
allowed us to reduce communication time to be
on a par with computation time and has thus let
us make better use of massive parallelism.
+
Acknowledgments
We thank Udo Wehmeier and Florentin Worgotter, who
provided us with the code for generating the connections,
and Gary Holt for his retina simulator. Discussions with
Christof Koch and Florentin Wiirgotter were very helpful.
We would like to thank Christof Koch for his continuing
support and for providing a stimulating research atmosphere. Some of the numerical work was performed at the
Advanced Computing Laboratory of Los Alainos National
Laboratory. This work was supported by the National Science Foundation, the Office of Naval Research, and the Air
Force Office of Scientific Research.
References
1. 1’. Fujimoto, N. Fukuda, and ’1.. Akahane, “Massively
Parallel Architectures for Large Scale Neural Network
Simulations,” IEEE Tram. h’ezeural iVemorks, Vol. 3 , No.
6 , 1992, pp. 876-888.
2. F.J. Nunez and J A B . Fortes, “Performance of Connectionist Learning Algorithms on 2-D SIMD Processor
.brays,” in Advances in Mural Infbpmation Processing Systems 2, D.S. Touretzky, ed., Morgan Kaufmann, San
Francisco, 1990, pp. 810-817.
3. X. Zhang et al., ‘‘An Efficient Implementation of the
Back-Propagation Algorithm on the Connection Machine CM-2.” in Advances in Nenral Information Processing Systems 2, D.S. Touretzky, ed., Morgan Kaufmann.
San Francisco, 1990, pp. 801-809.
4. F. Wiirgotter and C. Koch, “A Detailed Model of the
Priinary Visual Pathway in the Cat: Comparison of 4 f ferent Excitatory and Intracortical Inhibitory Connection Schemes for Orientation Selectivity,” 3. h‘eur-0rcience, Vol. 11, No. 7, 19‘91, pp. 1.959-1,979.
5. P. Patron, E. Thomas, and R.E. Wyatt, “Computational
Model of Vertical Signal Propagation in the Primary
Visual-Cortex,” Biological cybernetics, Vol. 68, No. 1,
1992, pp. 43-52.
6. M.Wilson anti J.M. Bower, “Cortical Oscillations and
Temporal Interactions in a Computer Simulation of
Piriform Cortex,” 7. Are‘europhysiolo,g,Vol. 67, No. 4,
1992, pp. 981-995.
7. R.D. Traub and R. Miles, ,Veiironal Networks of the Hippocampus, Cambridge Lniu. Press, New York, 1991.
8. E. Niebur and D.W. Rrettle, “Efficient Simulation of
Biological Neural Networks on Massively Parallel Supercomputers with Hypercube Architecture,” in Advances in N e u r a l Infomaation Processing $ystems 6 , J.
Cowan, G . Tesauro, and J . Alspector, eds., Morgan
Kaufmann, San Francisco, 1994, pp. 904-910.
9. E. Niebur and P. Erdiis, “Modeling 1,ocotnotion and Its
Neural Control in Nematodes,” Comments in Theor-etical
Biology,Vol. 3 . No. 2 , 1993, pp. 109-139.
10. W.Softky, “Sub-Millisecond Coincidence Detection in
Active Dendritic Trees,” hrezwoscience, Vol. 58, KO. 1,
1994, pp. 13-41.
11. 1. Segev, “Single Neuron Models: Oversimple, Complex
IEEE COMPUTATIONAL
SCIENCE & ENGINEERING
and Reduced,” Trendr in Neuroscience, Vol. IS, No. I I ,
1992, pp. 414-421.
12. AV. Oppenheim and A.S. Willsky, Signals and Systems,
Prentice-Hall, Englewood Cliffs, N.J., 1983.
13. R. Stevens and P. McDonough, “Instruction Timings
and Message Passing Performance on the Connection
Machine 2,” tech. report, Argonne Nat’l Lab, Argonne,
Ill., 1990.
Dean Brettle is a senior consultant in
the Advanced Computational Technologies Practice at Booz, Allen &
Hamilton. He received his BS in engineering and applied science from the
California Institute of Technology in
1992. His interests include pattern
recognition, parallel processing, image
processing, and simulation. At Booz
Allen, he has been involved in developing an object recognition system and implementing it on
the CM-2, CM-5, and networks of TI C40 DSPs and T800
transputers. Brettle is a member of the International Neural
Network Society and SPIE (the Society of Photo-optical Instrumentation Engineers).
Ernst Niebur received his MSc in
physics from the University of Dortmund, Germany, and his PhD in theoretical physics from the University of
Lausanne, Switzerland. In 1989 he
joined the faculty of Caltech, where he
is presently a senior research fellow in
the Computation and Neural Systems
Program. Niebur’s research interests
are in computational neuroscience and
related areas. His current main interest is the neuronal implementation of selective visual attention in humans and
other high animals, and its application to the construction
of advanced artificial vision systems. Niebur is a member of
the Society for Neuroscience, the International Neural Network Society, and the German Physical Society.
Brettle may he reached at Booz, Allen & Hamilton, 8283
Greensboro Drive, McLean, VA 22 102, e-mail brettleapicard.jmb.bah.com. Niebur is at Computation and Neural
Systems 139-74, California Institute of Technology,
Pasadena, CA 91 125, e-mail [email protected].
IEEE/IAFE Conference on
IEEE
IEEElmypm8oolm
Computational Intelligence for Financial Engineering (CIFEr)
April 9-11, 1995, New York City, Crowne Plaza Manhattan
Sponsors
The IEEE Neural Networks Council The International Association of Financial Engineers The IEEE Computer Society
The IEEE/IAFE CIFEr Conference is the first major collaboration between the professional engineering and financial
communities, and will be the leading forum for new technologies and applications in the intersection of computational
intelligence and financial engineering. Intelligent computational systems have become indispensable in virtually all
financial applications, from portfolio selection to proprietary trading to risk management. Topics in which papers, panel
sessions, and tutorial proposals are invited include, but are not limited to, the following:
Financial Engineering Applications
Computer h Engineering Applications h Models
Asset Allocation
Risk Management
N e u r a l Networks
Trading Systems
Corporate Financing
Forecasting
Hedging Strategies
Options and Futures
Risk Arbitrage
Complex Derivatives
Currency Models
Technical Analysis
Portfolio Management
Standards Discussions
Machine Intelligence
Probabilistic Reasoning
Fuzzy Systems
Parallel Computing
Pattem Analysis
Genetic Algorithms
Stochastic Processes
Dynamic Optlmization
Knowledge & Data Engineerin
Time Series Analysis
Harmonic Analysis
SignalProcessing
Non-Linear Dynamics
Keynote Speaker
Conference Committee General Co-chairs:
Tomaso Poggio. Whltaker Professor
MIT Artificial Intelligence Laboratoly and
Robert C. Merton
Baker Professor Of Business Administration
Harvard Business School
Brain Sciences Department
John Marshall, Professor of Finance
St. John’s University
International Liaison
Toshio Fukuda,Dept. of Mechanical Eng..
Program Committee Co-chairs:
Andrew W. Lo. Professor of Finance
Organizational Chair
Scott H.MatheW8,MWC Financial Eng..
Nagoya University. Japan
MIT Sloan School of Management
Robert Marks. Professor of Electrical Eng.
University of Washington, Seattle
International Chair
Apostolos N. Refenes
London Business School
WINTER
1994
l
For M m
n f o
Bothell. WA
Meeting Management
2603 Main Street. Suite 690.
~Idme.
~ ~ CA
~ 92714
(714) 752-8206 Fax (714) 762-7444
EmaU 74710.2266@comp~erve.com
Tutorials chairs
Douglas stone, Frank Russell R&D, Tacoma Wk’
Joe R. Brown,MCC, Austin, TX
David Schwartz, Mitsubishi Bank, NY
Finance Chair
Christine Alan, CPA. Motorola, AZ
Mibits Chair
Steve fiche, MCC, ~
~TX ~ m ,
Plenary Chair
Douglas Stone, Frank Russell R&D. Tacoma WA
Publications Chair
Donald Wunsch. Dept. of Electrical Engineering.
Texas Tech University
43