Two-dimensional Gabor-type receptive field as derived by

NN 1160
Neural
Networks
PERGAMON
Neural Networks 11 (1998) 441–447
Contributed article
Two-dimensional Gabor-type receptive field as derived by mutual
information maximization
K. Okajima*
NEC Corporation, Tsukuba, Japan
Received 19 May 1997; accepted 25 December 1997
Abstract
Two-dimensional receptive fields are investigated from an information theoretic viewpoint. It is known that the spatially localized and
orientation- and spatial-frequency-tuned receptive fields of simple cells in the visual cortex are well described by Gabor functions. This paper
shows that the Gabor functions are derived as solutions for a certain mutual-information maximization problem. This means that, in a low
signal-to-noise ratio limit, the Gabor-type receptive field can extract the maximum information from input local images with regard to their
categories. Accordingly, this suggests that the receptive fields of simple cells are optimally designed from an information theoretic viewpoint. q 1998 Elsevier Science Ltd. All rights reserved.
Keywords: Gabor function; Information; Visual cortex; Simple cell; Receptive field
1. Introduction
The spatially localized and spatial-frequency (and orientation)-tuned receptive field (RF) of a simple cell in the
visual cortex is well described by the Gabor function
(Marcelja, 1980; Daugman, 1980), which is defined as a
plane wave, or a complex exponential function localized
by a Gaussian envelope G j(x) (Gabor, 1946; Daugman,
1985):
ÿ
ÿ
(1)
f x; k0 ; Gj (x) exp ik0 ·x
Its Fourier transform f̃ðk; k0 Þ is also localized around k 0 in
the frequency domain. Therefore, as a filter it shows
(broadly tuned) band-pass characteristics (Pollen and Ronner, 1983). An example of the (two-dimensional) Gabor
function is shown in Fig. 1. The problem considered in
this paper is why the visual system in the brain adopts
such a function to analyse its input.
Okajima (1997) showed that the well-known property of
the Gabor function, that it is maximally localized in the
space and frequency domains, is closely related to its
information theoretic capability. Following Linsker’s
approach (Linsker, 1988, 1993), he showed that the
Gabor function is derived as a solution for a certain
* Requests for reprints should be sent to Dr K. Okajima at Exploratory
Research Laboratory, Fundamental Research Laboratories, 34 Miyukigaoka, Tsukuba, 305 Japan.
0893–6080/98/$19.00 q 1998 Elsevier Science Ltd. All rights reserved.
PII: S0 89 3 -6 0 80 ( 98 ) 00 0 07 - 0
mutual-information maximization problem. This means
that, under rather general assumptions, it can extract the
maximum information from input local signals. Thus, he
suggested that the RFs of simple cells must be optimally
designed from an information theoretic viewpoint.
To explain the two-dimensional RFs of simple cells in
more detail, one has to investigate RF functions in twodimensional cases. Okajima (1997) did not, however, deal
explicitly with the two-dimensional cases, and we had some
difficulty with them, as we shall see in Section 3. Thus, the
purpose of this paper is to investigate two-dimensional RF
functions from an information theoretic viewpoint. The next
section describes two objective functions to be investigated
in this paper. The first one, which is basically the same as
that used by Okajima (1997), will be analysed in Section 3.
It will be shown that in two-dimensional cases, when the
input signal statistics are isotropic, maximization of this
objective function does not result in the orientation-tuned
Gabor functions. In contrast, it will be shown in Section 4
that the other objective function leads to the Gabor function
as an optimal RF function even in two-dimensional cases.
Thus, the result still suggests that the Gabor-type RFs of
simple cells are optimally designed under a certain information theoretic criterion which is, however, slightly different
from that used in Okajima (1997). Preliminary results are
presented in Okajima (1996).
By using computer simulations, Kohonen (1994) demonstrated that the Gabor-function-like RFs are self-organized
442
K. Okajima / Neural Networks 11 (1998) 441–447
Below, I shall determine the optimal RF function f by
maximizing the mutual information MI 1 or MI 2. Before that,
however, a ‘localization term’ will be introduced to the
objective functions.
2.2. Localization term
Fig. 1. An example of the Gabor function. (a) The real part of the function
(or the cosine-type Gabor function). (b) The imaginary part of the function
(or the sine-type Gabor function).
through a learning algorithm called the adaptive-subspace
self-organizing feature map (ASSOM). Olshausen and Field
(1996) also used computer simulations to demonstrate that
the Gabor-function-like RFs can emerge by learning a
sparse code for natural images. The results obtained in
this paper might be related to these simulation results,
although the relationship is not yet clear.
2. Objective functions
2.1. Feature extraction and the information obtained by
measuring the feature
x
in the objective function to be maximized. Here, m is a
Lagrange multiplier and u(x) ; 1/w(x) 2 is a ‘localization
potential’ defined by the window function, which is
assumed to be well approximated by the second-order
expansion around its minimum as
u(x) < u0 þ bx2
Shift invariance for the input signal statistics is assumed
throughout this paper. That is, it is assumed that if a certain
image f g(x) is presented, its displaced images f g(x ¹ x 0) will
be also presented with the same probability. In this case, any
image can be formally specified by two parameters: g,
which specifies its category, and x 0, which specifies its
position.
Now suppose we extract a feature a from an input image
f(x) using an RF function f(x),
X
f(x){f (x) þ n(x)}
(2)
a ¼ (f, f þ n) ;
x
Here n represents an additive noise, assumed to be uncorrelated Gaussian noise independent of the image. We obtain a
certain amount of information by measuring the feature.
This paper considers the following two measures for this
amount of information:
ÿ
(3)
MI1 [f] ¼ H[a] ¹ H[alf ] ¼ H[a] ¹ H alG, X0
MI2 [f] ¼ H[a] ¹ H[alG]
Empirically, most of the features that have been useful for
analysing images are localized ones. Accordingly, Okajima
(1997) confined himself to localized features, and determined the RF function by maximizing MI 1 by considering
local signals as signals (the local signals are obtained by
seeing the original signals through a certain window function w(x)). Through such a procedure, he derived the ‘localization term’,
X
(5)
¹ m u(x)lf(x)l2 =kfk2
(4)
Here MI 1 is the mutual information between a and f,
where H[a] ¼ ¹oa P(a) log[P(a)] and H[al f ] ¼
¹of P(f ) oa P(alf) log[P(alf)] are respectively the entropy
and the conditional entropy, which are defined using the
probability P. On the other hand, MI 2 represents the
mutual information between a and g; that is, the average
amount of information obtained about the category of
the image by measuring the feature a. In Eq. (4), H[alG]
denotes the conditional entropy defined by H[alG] ¼
¹o gP(g) o aP(alg) log[P(alg)].
(6)
Meanwhile, we can show that we obtain almost the same
result as in Okajima (1997) if we start by explicitly assuming this ‘localization term’ (5) and maximize
X
(7)
l1 ¼ MI1 [f] ¹ m u(x)lf(x)l2 =kfk2
x
instead of maximizing Eq. (3) for local signals. In this case,
however, m is not a Lagrange multiplier, but a fixed parameter determining the relative weight of the ‘localization
term’, or a penalty term for non-localized RF functions. In
this paper I also confine myself to localized RF functions
and, for simplicity, I shall take the latter approach. That is, I
shall determine the optimal RF function by maximizing l 1
in Eq. (7) or l 2 in Eq. (8):
X
l2 ¼ MI2 [f] ¹ m u(x)lf(x)l2 =kfk2
(8)
x
3. RF functions maximizing the objective function l 1
Okajima (1997) analysed a mutual-information maximization problem whose objective function is basically the
same as Eq. (7) and showed that when the power spectrum
of the signals has a peak at frequency k 0, a Gabor function of
frequency k 0 becomes the optimal RF function under rather
general assumptions. This section first summarizes the
results obtained by Okajima (1997), some of which will
be used in Section 4. Then, a problem in two-dimensional
cases will be discussed.
Okajima (1997) assumed a low signal-to-noise ratio
(SNR) limit in his analysis (when the input signal statistics
443
K. Okajima / Neural Networks 11 (1998) 441–447
are Gaussian, this assumption is not necessary). In a low
SNR limit, the mutual information MI 1 can be approximated
as
MI1 <
(f, Cf)
(9)
2j2n kfk2
where C is the covariance matrix of f and j2n is the variance
for the noise. Therefore, we can maximize Eq. (7) by
maximizing
X
X
p(k)lf̃(k)l2
l1 9 ¼ (f, Cf) ¹ m9 u(x)lf(x)l2 ¼
¹ m9
X
x
k
u(x)lf(x)l2
ð10Þ
x
3.2. Two-dimensional isotropic cases
under the constraint of kfk 2 ¼ o xlf(x)l 2 ¼ 1. Here m9 ¼ 2j2n m
is a constant, p(k) is the power spectrum of the input signals,
and f̃(k) is the Fourier transform of the RF function f(x). In
deriving theXright-hand side of Eq. (10), I used the equality
2
(f,Cf) ¼
k p(k)lf̃(k)l , which is valid when C is shift
invariant.
Then, by expanding p(k) and u(x) around their maximum
or minimum up to the second order as p(k) < p 0 ¹ a(k¹k 0) 2
and u(x) < u 0 þ bx 2, Eq. (10) is rewritten as
"
#
X ÿ
X
2
2
2
2
a k ¹ k0 lf̃(k)l þ m9b x lf(x)l
l1 9 < ¹
ÿ
k
þ p0 ¹ m9u0
Fig. 2. Schematic representations of the power spectrum p(k) and an ‘effective power spectrum’ p9(k) or remaining frequency components of p(k) that
are not extracted by the Gabor function f(x;k 0).
x
ð11Þ
Okajima (1997) showed that the Gabor function f(x;k 0) of
frequency k 0 becomes the optimal RF function that exactly
maximizes objective function (11), by using the well-known
property that the Gabor function is maximally localized in
the space and frequency domains.
3.1. ‘Effective power spectrum’
Since the Gabor function f(x;k 0) thus obtained is localized in space, we actually have to prepare a set of Gabor
functions f(x ¹ x 0;k 0), whose centers x 0 are located at
various positions in space, in order to analyse input signals
all over space. Now, suppose that we have measured
features a(x 0;k 0) ¼ o xf(x ¹ x0 ;k 0)f (x) at positions x 0,
which are sampled in space with an interval satisfying the
sampling theorem. Then, what will be the next optimum RF
function, provided we already know these a(x 0,k 0)? Intuitively speaking, the frequency components of signals
around k 0 are already extracted by the set of Gabor functions
f(x ¹ x 0;k 0). Accordingly, the power spectrum p9(k) of the
remaining frequency components will have the profile
depicted schematically in Fig. 2, with a new peak at frequency k0 9 (see Okajima, 1997). Then, by considering this
p9(k) to be a new ‘effective power spectrum’, the procedure
described in the previous subsection can be repeated to
obtain the new Gabor function of frequency k0 9. By repeating these procedures again and again, we can obtain Gabor
functions of various frequencies in sequence.
In two-dimensional cases, however, objective function
(7) does not give a Gabor function (of nonzero frequency)
when the input image statistics are isotropic. This is
because, in two-dimensional isotropic cases, the power
spectrum (or the ‘effective power spectrum’) cannot have
an isolated peak, except when it has a peak at zero frequency. Meanwhile, in order to derive a Gabor function as
a solution for the maximization problem, we need a power
spectrum having an isolated peak.
For example, suppose the power spectrum has a profile
with a peak at zero frequency. Then, under appropriate
assumptions, a Gabor function of zero frequency (i.e., a
Gaussian) is obtained as the optimal RF function. However,
when we extract frequency components around zero frequency by the set of this RF function, the remaining ‘effective power spectrum’ might have a crater-like profile, which
does not have an isolated peak (see Fig. 3). Thus we see that,
in two-dimensional isotropic cases, one cannot obtain Gabor
functions of nonzero frequencies by maximizing the objective function (7).
Let us approximate the ‘localization potential’ as u(x) <
u 0 þ bx 2, and consider the problem of maximizing
X
X
p(k)lf̃(k)l2 ¹ m9b x2 lf(x)l2
(12)
k
x
Fig. 3. A crater-like two-dimensional isotropic power spectrum (schematic
representation).
444
K. Okajima / Neural Networks 11 (1998) 441–447
under the constraint of kfk 2 ¼ 1. By using the equality
X
X
f̃p (k)=2k f̃(k)
(13)
x2 lf(x)l2 ¼
¹
x
k
we obtain the following eigen equation from the objective
function (12):
(14)
¹ p(k) ¹ m9b=2k f̃(k) ¼ lf̃(k)
The solution which maximizes (12) becomes the eigen function corresponding to the lowest eigen value. Meanwhile,
when p(k) is isotropic, the lowest ‘energy state’ of Eq. (14)
is an ‘s-state’ which is also isotropic. Accordingly, a Gabor
function of nonzero frequency cannot be the solution, since
it has an oriented profile and is not isotropic.
4. RF functions maximizing the objective function l2
Next, let us move on to objective function (8). It will be
shown that maximization of this objective function results in
a Gabor function even in two-dimensional isotropic cases.
Here let us also assume a low SNR limit. Then the first
term in MI 2 (see Eq. (4)) is expanded in 1/j n (j2n : the variance of the noise) as
H[a] < 1=2 þ 1=2 log 2p þ 1=2 log j2n þ j2s
<1=2 þ 1=2 log 2p þ 1=2 log j2n
D
E
X
lf̃(k)l2 lF(k)l2
þ k
2j2n
D
ED
E
X
lf̃(k)l2 lf̃(k9)l2 lF(k)l2 lF(k9)l2
¹
ð15Þ
k
where F(k) denotes the Fourier transform of the input image
and ‘h i’ denotes an averaging operation.
Similarly, the second term in MI 2 is expanded as
þ 1=2 log
þ
X
#+
2
k
<1=2 þ 1=2 log 2p þ 1=2 log j2n
D
E
X
lf̃(k)l2 lFg (k)l2
g
þ k
2
2jn
D
E
X
lf̃(k)l2 lf̃(k9)l2 lFg (k)l2 lF(k9)l2
¹
k, k9
4j2n
It should be noted that the first term in the right-hand side of
Eq. (17) is second order in lf̃l2 while that in Eq. (10) is
linear in lf̃l2 . This will cause the localization of the function
in the frequency domain even when the power spectrum has
a crater-like profile and its maximum is ‘degenerated’ over
every orientation.
Suppose here, for example, that the input image statistics
are Gaussian. Then, using
D
E D
ED
E
when k Þ 6 k9
lF(k)l2 lF(k9)l2 ¼ lF(k)l2 lF(k9)l2
D
E
D
E2
lF(k)l4 ¼ 2 lF(k)l2 ¼ 2p(k)2
otherwise
x
where p0 is defined as
ÿ p"(k) ¼ 1= 4j4n p(k)2 lf̃(k)l2
g
g
ð16Þ
(19)
x
We see that the first term in Eq. (19) tends to localize the RF
function in the frequency domain around the region where
the power spectrum takes its maximum, while the second
term tends to localize the function in the space domain. This
is the same as in objective function (10). An important point
here, however, is that even when the power spectrum has a
crater-like profile as shown in Fig. 3, the first term is larger
when the function is localized around a certain frequency on
the ridge of the crater than when it is extended all over the
ridge.
To see this, we rewrite Eq. (19) as
X
X
p"(k)lf̃(k)l2 ¹ m u(x)lf(x)l2
(20)
l2 ¼
k
lf̃(k)l lFg (k)l
2
ð17Þ
x
k
(see Appendix A). Here j2s is defined as
E
X
XD
p(k)2 lf̃(k)l2 ¼
lF(k)l2 lf̃(k)l2
j2s ¼ (f, Cf) ¼
j2n
k, k9
D
ED
E
¹ lF(k)l2 lF(k9)l2 lf̃(k)l2 lf̃(k9)l2
X
¹ m u(x)lf(x)l2 ; kfk2 ¼ 1
(see Appendix B), Eq. (17) is rewritten as
X
ÿ X
p(k)2 lf̃(k)l4 ¹ m u(x)lf(x)l2
l2 ¼ 1= 4j4n
4j2n
H[alG] < 1=2 þ 1=2 log 2p
*
"
x
E
ÿ X D
lF(k)l2 lF(k9)l2
<1= 4j4n
(18)
k, k9
k
Here, I explicitly write the averaging operation over g as
‘h i g’. However, since the variables inside the bracket do not
depend on x 0, this averaging operation can be replaced by
that over g and x 0 (i.e., over f), ‘h i’.
Thus, Eq. (8) is rewritten as
X
l2 ¼ MI2 ¹ m u(x)lf(x)l2
(21)
As described in Section 3, we already know that a Gabor
function is the solution for maximizing l 2 in Eq. (20) when
p0 has a peak at a certain frequency. In this case, however,
since p0 depends on the solution f itself, Eqs. (20) and (21)
must be solved self-consistently. Let us suppose temporarily
that a Gabor function f(x;k 0) of frequency k 0 (k 0 being a
certain frequency on the ridge of the crater-like power
K. Okajima / Neural Networks 11 (1998) 441–447
445
Fig. 4. Numerical solutions that maximize objective function (16). The power spectrum of an uncorrelated random signal filtered through an isotropic twodimensional DoG (Difference of Gaussians) filter is adopted for p(k), and an exponential function, exp(cx) 2, is adopted for the localization potential u(x).
spectrum) is a solution. Then, from Eq. (21), p0 will have a
peak at the frequency k 0, because the Fourier transform of
the Gabor function f̃(k;k 0) becomes a Gaussian with its
center at k 0. Meanwhile, we already know that when p0
has a peak at k 0, the solution maximizing Eq. (20) is the
Gabor function of frequency k 0. Accordingly, we see that
this Gabor function was actually a self-consistent solution.
Fig. 4 shows a numerical solution obtained by solving the
maximization problem in Eq. (19) with a computer. We see
that even when the image statistics are isotropic, an
oriented, Gabor-like RF function is obtained from the
objective function (8).
5. Discussion
We started from the objective function (8) and found that,
in a low SNR limit, it is rewritten as the objective function in
Eq. (17). We also found that objective function (17) leads to
an oriented, two-dimensional Gabor function as an optimal
RF function even when the input image statistics are
isotropic.
In the previous section, we considered a case where the
input image statistics are Gaussian. I think, however, that
the result may be general, and not confined to Gaussian
cases. Let us consider here another simple example.
Suppose a sine-wave grating of frequency k and of a constant amplitude is shown as an input image, each presented
with a probability P(k). In this case, Eq. (17) is calculated as
X
X
p"(k)2 lf̃(k)l2 ¹ m u(x)lf(x)l2
(22)
l2 ¼
k
x
where
p"(k) ~ P(k) lf̃(k)l2 ¹ lf̃l2 ,
lf̃l2 ;
X
P(k)lf̃(k)l2
k
(23)
D
E
D
ED
E
2
2
2
2
(note that lF(k)l lF(k9)l
¹ lF(k)l
lF(k9)l
~
P(k)dk, k9 ¹ P(k)P(k9) holds in this case). Therefore, we can
follow the same procedure as described in Section 4 to see
that a Gabor function becomes a self-consistent solution
maximizing Eq. (22), even when the probability P(k) is
isotropic having a crater-like profile.
Neurons in the visual cortex receive their input signals by
spike trains, whose frequency is thought to code the strength
of the signal. Then, a trade-off must exist between the SNR
and processing speed: if we require the higher SNR, the
neurons will need the longer averaging time to evaluate
the frequency. On the other hand, if we require a high processing speed, the neurons must deal with signals of low
SNR. In the latter case, the low SNR assumption made in
this paper might be justified.
Even when we look at a single object, its retinal image
might be projected at slightly different positions from time
to time. Accordingly, we expect a neuron in the visual cortex to ‘see’ the same feature (such as a line or an edge of a
certain orientation) at various positions within its receptive
field. Objective function (7) requires the neurons to discriminate a feature from one presented at displaced position
even when the feature itself is the same. Objective function
(8), however, requires the neurons to discriminate a feature
from others only when their categories (such as orientations)
are different. The result obtained in this paper suggests that
the RFs of simple cells in the visual cortex are designed
based on the latter strategy.
It should be emphasized that objective function (8) leads
to a Gabor function even when the input signal statistics are
Gaussian. Let us imagine that the visual system is equipped
with a certain learning mechanism which self-organizes the
RFs of simple cells in such a way that objective function (8)
is maximized. If this is the case, from the results in Section
4, we expect that the learning mechanism will self-organize
the Gabor-function-like oriented receptive fields even
before animals open their eyes, driven by the random spontaneous firing of retinal ganglion cells. This expectation is in
accordance with some self-organization simulation results
(Miyashita and Tanaka, 1992; Miyashita et al., 1997) as well
as the experimental finding that the orientation selectivity of
visual cortical neurons is observed even before the animals
have visual experiences (Wiesel and Hubel, 1974).
Acknowledgements
This work was performed under the management of
FED as part of the MITI R&D of Industrial Science and
446
K. Okajima / Neural Networks 11 (1998) 441–447
Technology Frontier programme (Bioelectronic Devices
project) supported by NEDO.
Appendix A Derivation of Eqs. (15)–(17)
Let us rewrite Eq. (2) as a ¼ a s þ a n, where as ¼ (f,f )
represents the signal component and a n ¼ (f,n) represents
the noise component. We are assuming that a s and a n are
mutually independent, and that a n is a Gaussian variable.
Then, in a low SNR limit, we may regard P(a) and P(alg) as
almost Gaussian,
are respectively
2 whose
variances
2 written
2
2
2
2
2
¼
a
þ
j
and
j9
¼
as P(alg) þ
as
j
þ
a
¼
j
s
n
s
n
2
2
2
2
an ¼
j2s, g are given by
Xjs, g þ jnD whereE js and X
j2s ¼ k lf̃(k)l2 lF(k)l2 and j2s, g ¼ k lf̃(k)l2 lFg (k)l2 .
If P(a) is exactly Gaussian, the entropy H[a] is exactly
calculated as 1/2 log 2pej 2 (see Rieke et al., 1997). When
P(a) and P(alg) are almost Gaussian, H[a] and H[alG] are
respectively expressed as
1
log 2pej2 þ h
2
1
2
þ h9
log
2pej9
H[alG] ¼
2
g
H[a] ¼
(A1)
where h and h9 represent correction terms whose order is, as
shown below, o(1/j6n ). Accordingly, by expanding Eq. (A1)
in 1/j n up to the fourth order, we obtain Eq. (17).
The order of magnitude for the correction term h is
estimated below. For simplicity, let us assume ha si ¼ 0
Z us also assume that the characteristic
and ha ni ¼ 0. Let
function F(y) ¼ P(a) exp(iya) da and the cumulant function W(y) ¼ log F(y) exist, and the following cumulant
expansion is possible:
W(y) ¼
X (iy)m am c
m!
mÞ0
when P(a) is Gaussian (see Rieke et al., 1997). Third, we see
that h depends on e 3,e 4,… only in the form of (e 3/j 2),(e 4/
j 4),…. Therefore, using the above properties we see that the
most dominant term in h is (e 3/j 3) 2, whose order is o(1/j6n ).
To see the third point stated above, let us
Z formally write
the probability distribution as P(a) ¼ 1/2p exp(W(y)) exp(
¹ iya) dy. By substituting Eq. (A3) and introducing y9 ¼ jy,
this is rewritten as
(
)
ÿ
ÿ
Z
3
e4 =j4 4
1
y9 2 i e3 =j
3
¹
exp ¹
y9 þ
y9 þ …
P(a) ¼
2
2p
6
24
expð ¹ iy9a=jÞ
dy9
j
ðA4Þ
which indicates that P(a), and accordingly H[a], also
depend on e 3,e 4,… only in the form of (e 3/j 2),(e 4/j 4),….
Finally, I show that when the variance is constant, the
entropy H[a] takes its maximum when the probability distribution P(a) is Gaussian. Suppose a stochastic variable x
takes a value x i with a probability P i. Then the entropy is
written as H[x] ¼ ¹ oP
Xi log P i. We want to maximize
H under the constraint x2i Pi ¼ j2 (the mean is assumed to
be zero for simplicity). Since we also have another obvious
constraint oP i ¼ 1, we obtain
h X
i
X
X
(A5)
d ¹ Pi log Pi ¹ m1 Pi ¹ m2 x2i Pi ¼ 0
¹
X
dPi log Pi þ 1 þ m1 þ m2 x2i ¼ 0
ÿ
[Pi ¼ const·exp ¹ m2 x2i
The Lagrange multiplier m 2 is determined by the constraint
as m 2 ¼ 1/(2j 2).
The order of h9 can also be estimated as o(1/j6n ) by the
same procedure.
(A2)
where ha mi c is the m-th cumulant (Kubo, 1962). Since a s and
a n are independent and a n is a Gaussian variable, the expansion (A2) is rewritten as
ÿ 2
js þ j2n 2 ie3 3 e4 4
y ¹ y þ y þ…
(A3)
W(y) ¼ ¹
6
24
2
for the signal component
where e 3,e 4,…
are
the cumulants
a s, i.e., e3 ¼ a3s c , e4 ¼ a4s c and
so
on. In deriving Eq.
(A3), we used ha 2i c ¼ j 2 and am
n c ¼ 0 (m $ 3) for a
Gaussian variable a n (Kubo, 1962).
Now, let us expand the correction term h in these cumulants. Three points should be made here. First, we see that
when all the higher order cumulants, e 3,e 4,… are zero, the
correction term h also becomes zero, since in this case P(a)
becomes Gaussian (Kubo, 1962). Second, the expansion
begins from the second-order terms. This is because when
the variance is constant, the entropy H[a] takes its maximum
when all the higher order cumulants, e 3,e 4,… are zero; i.e.,
Appendix B Derivation of Eq. (18)
Let uspdefine
 X the Fourier transform of an image f as
F(k) ¼ 1= N x f (x) exp( ¹ ikx), where N ¼ o x1. Then,
the first term in Eq. (18) is written as
D
E
X ÿ ÿ ÿ ÿ 1
f x1 f x2 f x3 f x4
lF(k)l2 lF(k9)l2 ¼ 2
N x 1 , x2 , x3 , x 4
ÿ
ÿ
exp ¹ ik x1 ¹ x2 ¹ ik9 x3 ¹ x4
ðB1Þ
Let us assume here for simplicity hfi ¼ 0. Then we have
(Kubo, 1962)
ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ f x1 f x2 f x3 f x4 ¼ f x1 f x2 f x3 f x4 c
ÿ ÿ ÿ ÿ f x 3 f x4
þ f x1 f x2
ÿ ÿ ÿ ÿ f x 2 f x4
þ f x1 f x3
ÿ ÿ ÿ ÿ f x 2 f x3
ðB2Þ
þ f x1 f x4
K. Okajima / Neural Networks 11 (1998) 441–447
When the statistics of f are Gaussian, the first term in the
right-hand side of Eq. (B2) becomes zero because, in this
case, all the cumulants higher than the second order
becomes zero (Kubo, 1962). By substituting Eq. (B2) into
Eq. (B1), we obtain Eq. (18) (note that, for example,
hf(x 1)f(x 2)i ¼ c(x 1 ¹ x 2) holds, where c(x) is the correlation
function that satisfies o xc(x) exp( ¹ ikx) ¼ p(k) ¼ lF(k)l 2).
References
Daugman J.G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 10, 847–856.
Daugman J.G. (1985). Uncertainty relation for resolution in space, spatial
frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America, A2, 1160–1169.
Gabor D. (1946). Theory of communication. J. Inst. Elec. Eng., 93, 429–
457.
Kohonen, T., 1994. Self-Organizing Feature Map. Springer-Verlag, New
York.
Kubo R. (1962). Generalized cumulant expansion method. Journal of the
Physical Society of Japan, 17, 206–226.
Linsker R. (1988). Self-organization in a perceptual network. Computer, 21
(3), 105–117.
Linsker, R., 1993. Deriving receptive fields using an optimal encoding
447
criterion. In: Hanson, S.J., Cowan, J.D., Giles C.L. (Eds.), Advances
in Neural Information Processing Systems, vol. 5. Morgan Kaufmann,
San Mateo, CA, pp. 953–960.
Marcelja S. (1980). Mathematical description of the responses of simple
cortical cells. Journal of the Optical Society of America, 70, 1297–
1300.
Miyashita M., & Tanaka S. (1992). A mathematical model for the selforganization of orientation columns in visual cortex. NeuroReport, 3,
69–72.
Miyashita M., Kim D.-S., & Tanaka S. (1997). Cortical direction selectivity
without directional experience. NeuroReport, 8, 1187–1192.
Okajima, K., 1996. The Gabor-type RF as derived by the mutual-information maximization. Extended Abstracts of the International Workshop
on Brainware, Tokyo, Japan, pp. 119–121.
Okajima, K., 1997. The Gabor function extracts the maximum information
from input local signals. Neural Networks, 11, 435–439.
Olshausen B.A., & Field D.J. (1996). Emergence of simple-cell receptive
field properties by learning a sparse code for natural images. Nature,
381, 607–609.
Pollen A.D., & Ronner S.F. (1983). Visual cortical neurons as localized
spatial frequency filters. IEEE Trans. System Man, and Cybernetics,
SMC13, 907–916.
Rieke, F., de Ruyter, van S., Bialek, W., 1997. Spikes. MIT Press,
Cambridge, MA.
Wiesel T.N., & Hubel D.H. (1974). Ordered arrangement of orientation
columns in monkeys lacking visual experience. Journal of Comparative
Neurology, 158, 307–318.