Finite-state computation in analog neural networks: steps towards

Finite-state computation in analog neural
networks: steps towards biologically plausible
models?
Mikel L. Forcada and Rafael C. Carrasco
Departament de Llenguatges i Sistemes Inform
atics,
Universitat d'Alacant,
E-03071 Alacant, Spain.
E-mail: fmlf,[email protected]
The elds of neural networks and nite-state computation started simultaneously: when McCulloch & Pitts (1943) formulated mathematically the behaviour of ensembles of neurons (after a number of simplifying assumptions such
as the discretization of time and signals), they dened what we currently know as
a nite-state machine (FSM). Later, Kleene (1956) formalized the sets of input
sequences thatled a McCulloch-Pitts network to a given state and dened what
we currently know as regular sets or languages. Minsky (1967) showed that any
FSM can be simulated by a discrete-time recurrent neural net (DTRNN) using
McCulloch-Pitts units; the construction used a number of neurons proportional
to the number of states in the automaton m; more recently, Alon et al. (1991)
and Horne & Hush (1996) have established better bounds on the number of
discrete neurons necessary to simulate an m-state machine.
But such discrete neurons (taking values, for example, in f0; 1g) are a very
rough model of neural activity, and, on the other hand, error functions for discrete networks are not continuous with respect to the values of weights, which
is crucial for the application of learning algorithms such as those based in gradient descent. Researchers have therefore also shown interest in neural networks
containing analog units with continuous, real-valued activation functions such
as the logistic sigmoid gL (x) = 1=(1 + exp( x)). In principle, neural networks
having neurons with real-valued states should be able to perform not only nitestate computation but also more advanced computational tasks.
Under this intuitive assumption, a number of researchers set out to test
whether sigmoid DTRNN could learn FSM behavior from samples (Cleeremans
et al. 1989; Pollack 1991; Giles et al. 1992; Watrous & Kuhn 1992; Maskara &
Noetzel 1992; Sanfeliu & Alquezar 1994; Manolios & Fanelli 1994; Forcada &
~
Carrasco 1995; Tino & Sajda 1995; Neco
& Forcada 1996; Gori et al. 1998). The
results show that, indeed, DTRNN can learn FSM-like behavior from samples,
but some problems persist: after learning, FSM-like behaviour is observed only
for for short input sequences but degrades with sequence length (this behavior
often called instability); also, when the task to be learned has long-term depen Work
supported through grant TIC97-0941 of the Spanish CICyT.
1
dencies, that is, when late outputs depend on very early inputs, gradient-descent
algorithms have trouble relating late contributions to the error to small changes
in the state of neurons in early stages of the processing (Bengio et al. 1994).
Once trained, specialized algorithms may extract FSM from the dynamics of the
DTRNN; some use a straightforward equipartition of neural state space followed
by a branch-and-bound algorithm (Giles et al. 1992), or a clustering algorithm
(Cleeremans et al. 1989; Manolios & Fanelli 1994; Gori et al. 1998). Very often,
the nite-state automaton extracted behaves correctly for strings of any length.
Automaton extraction algorithms have been criticised (Kolen & Pollack 1995;
Kolen 1994) in the sense that FSM extraction may not reect the actual computation performed by the DTRNN. More recently, Casey (1996) has shown
that DTRNN can indeed \organize their state space to mimic the states in the
[...] state machine that can perform the computation" and be trained or programmed to behave as FSM. Also recently, Blair & Pollack (1997) presented
an increasing-precision dynamical analysis that identies those DTRNNs that
have actually learned to behave as FSM.
Finally, some researchers have set out to study whether it is possible to program a sigmoid-based DTRNN so that it behaves as a given FSM, that is, to
formulate sets of rules for choosing the weights and initial states of the DTRNN
based on the transition function and the output function of the corresponding
FSM. Omlin & Giles (1996) have proposed an algorithm for encoding deterministic nite-state automata (DFA, a class of FSM) in second-order recurrent
neural networks which is based on a study of the xed points of the sigmoid
function. Alquezar & Sanfeliu (1995) have generalized Minsky's (1967) result to
show that DFA may be encoded in Elman (1990) nets with rational (not real)
sigmoid transfer functions. Kremer (1996) has recently shown that a singlelayer rst-order sigmoid DTRNN can represent the state transition function of
any nite-state automaton. Frasconi et al. (1996) have shown similar encodings
for radial-basis-function DTRNN. All of these constructions use a number of
hidden units proportional to the number of states in the FSM. More recently,
ma (1997) has shown that the behavior of any discrete DTRNN may be stably
S
emulated by another DTRNN using activation functions in a very general class
ma & Wiedermann
which includes sigmoid functions. In a more recent paper, S
(1998) show that any regular language may be more eÆciently recognized by a
DTRNN having threshold units. Combining both results, one concludes that
sigmoid DTRNN can act as DFA accepting any regular language.
Recently, Carrasco et al. (1998) have expanded the current results on stable
encoding of FSM on DTRNN to a larger family of sigmoids, a larger variety of
DTRNN (including rst- and second-order architectures), and a wider class of
FSM architectures (DFA and Mealy and Moore FSM), by establishing a simplied procedure to prove the stability of a devised encoding and to obtain weights
as small as possible. Small weights are of interest if encoding is used to inject
partial a priori knowledge into the DTRNN before training it through gradient
descent. Carrasco et al.'s (1998) constructions, including recent extensions such
~
as Neco
et al.'s (1999) and Carrasco et al.'s (1999) will be briey reviewed in
the talk.
All of the encodings discussed are for nite-state machines in discrete-time
recurrent neural networks which assume the existence of a non-neural external
clock which times their behavior and a non-neural storage or memory for the
previous state of the network, which is needed to compute the next state from
2
the inputs. However, real neural networks are physical systems that operate in
continuous time and should contain, if involved in the emulation of nite-state
behaviour, neural mechanisms for synchronization and memory. A more natural
model would be a continuous-time recurrent neural network (CTRNN), whose
inputs and outputs are functions of a continuous-time variable and whose neurons have a temporal response that is described as a dierential equation in time
(for an excellent review, see Pearlmutter (1995)); we are not aware of any attempt to describe the nite-state computational behavior of CTRNN. DTRNN
and CTRNN are usually formulated in terms of sigmoid neurons; the computational capabilities of recurrent neural networks containing other biologicallymotivated neuron models such as spiking or integrate-and-re neurons have yet
to be explored, which opens a wide eld for future interaction between theoretical computer scientists and neuroscientists.
References
, A. K. Dewdney, & T. J. Ott. 1991. EÆcient simulation of
nite automata by neural nets. Journal of the Association of Computing
Machinery 38(2):495{514.
Alon, N.
, & A. Sanfeliu. 1995. An algebraic framework to represent nite state automata in single-layer recurrent neural networks. Neural
Computation 7(5):931{949.
Alqu
ezar, R.
, P. Simard, & P. Frasconi. 1994. Learning long-term dependencies with gradient descent is diÆcult. IEEE Transactions on Neural
Networks 5(2):157{166.
Bengio, Y.
, & J. B. Pollack. 1997. Analysis of dynamical recognizers. Neural
Computation 9(5):1127{1142.
Blair, A.
,
,
,
~
n P. Neco
& Ramo
. 1998. Stable encoding of nite-state machines in
discrete-time recurrent neural nets with sigmoid units. Technical report,
Departament de Llenguatges i Sistemes Informatics, Universitat d'Alacant,
Alacant, Spain. Submitted to Neural Computation.
~ oz
Carrasco, Rafael C. Mikel L. Forcada M. Angeles
Vald
es-Mun
||, Jose Oncina, & Mikel L. Forcada. 1999. EÆcient encodings of nite
automata in discrete-time recurrent neural networks. In Proceedings of
ICANN'99, International Conference on Articial Neural Networks . (in
press).
1996. The dynamics of discrete-time computation, with application
to recurrent neural networks and nite state machine extraction. Neural
Computation 8(6):1135{1178.
Casey, M.
, D. Servan-Schreiber, & J. L. McClelland. 1989. Finite state automata and simple recurrent networks. Neural Computation
1(3):372{381.
Cleeremans, A.
Elman, J. L.
1990. Finding structure in time. Cognitive Science 14:179{211.
3
, & R. C. Carrasco. 1995. Learning the initial state of a
second-order recurrent neural network during regular-language inference.
Neural Computation 7(5):923{930.
Forcada, M. L.
Paolo, Marco Gori, Marco Maggini, & Giovanni Soda.
1996. Representation of nite-state automata in recurrent radial basis function networks. Machine Learning 23:5{32.
Frasconi,
, C. B. Miller, D. Chen, H. H. Chen, G. Z. Sun, & Y. C.
. 1992. Learning and extracted nite state automata with second-order
recurrent neural networks. Neural Computation 4(3):393{405.
Giles, C. L.
Lee
, Marco Maggini, E. Martinelli, & G. Soda. 1998. Inductive inference from noisy examples using the hybrid nite state lter. IEEE
Transactions on Neural Networks 9(3):571{575.
Gori, Marco
, & D. R. Hush. 1996. Bounds on the complexity of recurrent
neural network implementations of nite state machines. Neural Networks
9(2):243{252.
Horne, B. G.
1956. Representation of events in nerve nets and nite automata.
In Automata Studies , ed. by C.E. Shannon & J. McCarthy, 3{42. Princeton,
N.J.: Princeton University Press.
Kleene, S.C.
1994. Fool's gold: Extracting nite state machines from recurrent
network dynamics. In Advances in Neural Information Processing Systems
6 , ed. by J. D. Cowan, G. Tesauro, , & J. Alspector, 501{508, San Mateo,
CA. Morgan Kaufmann.
Kolen, J. F.
, & Jordan B. Pollack. 1995. The observer's paradox: apparent
computational complexity in physical systems. Journal of Experimental and
Theoretical Articial Intelligence 7:253{277.
Kolen, J.F.
, 1996. A Theory of Grammatical Induction in the Connectionist Paradigm . Edmonton, Alberta: Department of Computer Science,
University of Alberta dissertation.
Kremer, Stefan C.
, & R. Fanelli. 1994. First order recurrent neural networks and
deterministic nite state automata. Neural Computation 6(6):1154{1172.
Manolios, P.
, & Andrew Noetzel. 1992. Forcing simple recurrent neural networks to encode context. In Proceedings of the 1992 Long Island
Conference on Articial Intelligence and Computer Graphics .
Maskara, Arun
, & W. H. Pitts. 1943. A logical calculus of the ideas
immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115{
133.
McCulloch, W. S.
1967. Computation: Finite and Innite Machines . Englewood
Clis, NJ: Prentice-Hall, Inc. Ch: Neural Networks. Automata Made up of
Parts.
Minsky, M.L.
, & M. L. Forcada. 1996. Beyond Mealy machines: Learning
translators with recurrent neural networks. In Proceedings of the World
Conference on Neural Networks '96 , 408{411, San Diego, California.
~
Neco,
R. P.
4
Forcada, Rafael C. Carrasco, &
. 1999. Encoding of sequential translators in
discrete-time recurrent neural nets. In Proceedings of the European Symposium on Articial Neural Networks ESANN'99 , 375{380.
~
Neco,
n
Ramo
,
P.
Mikel
L.
~ oz
M. Angeles
Vald
es-Mun
, & C. L. Giles. 1996. Constructing deterministic nite-state
automata in recurrent neural networks. Journal of the ACM 43(6):937{972.
Omlin, C. W.
1995. Gradient calculations for dynamic recurrent neural
networks: a survey. IEEE Transactions on Neural Networks 6(5):1212{
1228.
Pearlmutter, B. A.
1991. The induction of dynamical recognizers. Machine
Learning 7:227{252.
Pollack, Jordan B.
, & R. Alquezar. 1994. Active grammatical inference: a new
learning methodology. In Shape and Structure in Pattern Recognition , ed.
by Dov Dori & A. Bruckstein, Singapore. World Scientic. Proceedings
of the IAPR International Workshop on Structural and Syntactic Pattern
Recognition SSPR'94 (Nahariya, Israel).
Sanfeliu, A.
. 1997. Analog stable simulation of discrete neural networks. Neural
Network World 7:679{686.
S
ma, Jir
||, & Jir Wiedermann. 1998. Theory of neuromata. Journal of the ACM
45(1):155{178.
, & Jozef Sajda. 1995. Learning and extracting initial Mealy
automata with a modular neural network model. Neural Computation 7(4).
o, Peter
Tin
, & G. M. Kuhn. 1992. Induction of nite-state languages
using second-order recurrent networks. Neural Computation 4(3):406{414.
Watrous, R. L.
5