Finite-state computation in analog neural networks: steps towards biologically plausible models? Mikel L. Forcada and Rafael C. Carrasco Departament de Llenguatges i Sistemes Inform atics, Universitat d'Alacant, E-03071 Alacant, Spain. E-mail: fmlf,[email protected] The elds of neural networks and nite-state computation started simultaneously: when McCulloch & Pitts (1943) formulated mathematically the behaviour of ensembles of neurons (after a number of simplifying assumptions such as the discretization of time and signals), they dened what we currently know as a nite-state machine (FSM). Later, Kleene (1956) formalized the sets of input sequences thatled a McCulloch-Pitts network to a given state and dened what we currently know as regular sets or languages. Minsky (1967) showed that any FSM can be simulated by a discrete-time recurrent neural net (DTRNN) using McCulloch-Pitts units; the construction used a number of neurons proportional to the number of states in the automaton m; more recently, Alon et al. (1991) and Horne & Hush (1996) have established better bounds on the number of discrete neurons necessary to simulate an m-state machine. But such discrete neurons (taking values, for example, in f0; 1g) are a very rough model of neural activity, and, on the other hand, error functions for discrete networks are not continuous with respect to the values of weights, which is crucial for the application of learning algorithms such as those based in gradient descent. Researchers have therefore also shown interest in neural networks containing analog units with continuous, real-valued activation functions such as the logistic sigmoid gL (x) = 1=(1 + exp( x)). In principle, neural networks having neurons with real-valued states should be able to perform not only nitestate computation but also more advanced computational tasks. Under this intuitive assumption, a number of researchers set out to test whether sigmoid DTRNN could learn FSM behavior from samples (Cleeremans et al. 1989; Pollack 1991; Giles et al. 1992; Watrous & Kuhn 1992; Maskara & Noetzel 1992; Sanfeliu & Alquezar 1994; Manolios & Fanelli 1994; Forcada & ~ Carrasco 1995; Tino & Sajda 1995; Neco & Forcada 1996; Gori et al. 1998). The results show that, indeed, DTRNN can learn FSM-like behavior from samples, but some problems persist: after learning, FSM-like behaviour is observed only for for short input sequences but degrades with sequence length (this behavior often called instability); also, when the task to be learned has long-term depen Work supported through grant TIC97-0941 of the Spanish CICyT. 1 dencies, that is, when late outputs depend on very early inputs, gradient-descent algorithms have trouble relating late contributions to the error to small changes in the state of neurons in early stages of the processing (Bengio et al. 1994). Once trained, specialized algorithms may extract FSM from the dynamics of the DTRNN; some use a straightforward equipartition of neural state space followed by a branch-and-bound algorithm (Giles et al. 1992), or a clustering algorithm (Cleeremans et al. 1989; Manolios & Fanelli 1994; Gori et al. 1998). Very often, the nite-state automaton extracted behaves correctly for strings of any length. Automaton extraction algorithms have been criticised (Kolen & Pollack 1995; Kolen 1994) in the sense that FSM extraction may not reect the actual computation performed by the DTRNN. More recently, Casey (1996) has shown that DTRNN can indeed \organize their state space to mimic the states in the [...] state machine that can perform the computation" and be trained or programmed to behave as FSM. Also recently, Blair & Pollack (1997) presented an increasing-precision dynamical analysis that identies those DTRNNs that have actually learned to behave as FSM. Finally, some researchers have set out to study whether it is possible to program a sigmoid-based DTRNN so that it behaves as a given FSM, that is, to formulate sets of rules for choosing the weights and initial states of the DTRNN based on the transition function and the output function of the corresponding FSM. Omlin & Giles (1996) have proposed an algorithm for encoding deterministic nite-state automata (DFA, a class of FSM) in second-order recurrent neural networks which is based on a study of the xed points of the sigmoid function. Alquezar & Sanfeliu (1995) have generalized Minsky's (1967) result to show that DFA may be encoded in Elman (1990) nets with rational (not real) sigmoid transfer functions. Kremer (1996) has recently shown that a singlelayer rst-order sigmoid DTRNN can represent the state transition function of any nite-state automaton. Frasconi et al. (1996) have shown similar encodings for radial-basis-function DTRNN. All of these constructions use a number of hidden units proportional to the number of states in the FSM. More recently, ma (1997) has shown that the behavior of any discrete DTRNN may be stably S emulated by another DTRNN using activation functions in a very general class ma & Wiedermann which includes sigmoid functions. In a more recent paper, S (1998) show that any regular language may be more eÆciently recognized by a DTRNN having threshold units. Combining both results, one concludes that sigmoid DTRNN can act as DFA accepting any regular language. Recently, Carrasco et al. (1998) have expanded the current results on stable encoding of FSM on DTRNN to a larger family of sigmoids, a larger variety of DTRNN (including rst- and second-order architectures), and a wider class of FSM architectures (DFA and Mealy and Moore FSM), by establishing a simplied procedure to prove the stability of a devised encoding and to obtain weights as small as possible. Small weights are of interest if encoding is used to inject partial a priori knowledge into the DTRNN before training it through gradient descent. Carrasco et al.'s (1998) constructions, including recent extensions such ~ as Neco et al.'s (1999) and Carrasco et al.'s (1999) will be briey reviewed in the talk. All of the encodings discussed are for nite-state machines in discrete-time recurrent neural networks which assume the existence of a non-neural external clock which times their behavior and a non-neural storage or memory for the previous state of the network, which is needed to compute the next state from 2 the inputs. However, real neural networks are physical systems that operate in continuous time and should contain, if involved in the emulation of nite-state behaviour, neural mechanisms for synchronization and memory. A more natural model would be a continuous-time recurrent neural network (CTRNN), whose inputs and outputs are functions of a continuous-time variable and whose neurons have a temporal response that is described as a dierential equation in time (for an excellent review, see Pearlmutter (1995)); we are not aware of any attempt to describe the nite-state computational behavior of CTRNN. DTRNN and CTRNN are usually formulated in terms of sigmoid neurons; the computational capabilities of recurrent neural networks containing other biologicallymotivated neuron models such as spiking or integrate-and-re neurons have yet to be explored, which opens a wide eld for future interaction between theoretical computer scientists and neuroscientists. References , A. K. Dewdney, & T. J. Ott. 1991. EÆcient simulation of nite automata by neural nets. Journal of the Association of Computing Machinery 38(2):495{514. Alon, N. , & A. Sanfeliu. 1995. An algebraic framework to represent nite state automata in single-layer recurrent neural networks. Neural Computation 7(5):931{949. Alqu ezar, R. , P. Simard, & P. Frasconi. 1994. Learning long-term dependencies with gradient descent is diÆcult. IEEE Transactions on Neural Networks 5(2):157{166. Bengio, Y. , & J. B. Pollack. 1997. Analysis of dynamical recognizers. Neural Computation 9(5):1127{1142. Blair, A. , , , ~ n P. Neco & Ramo . 1998. Stable encoding of nite-state machines in discrete-time recurrent neural nets with sigmoid units. Technical report, Departament de Llenguatges i Sistemes Informatics, Universitat d'Alacant, Alacant, Spain. Submitted to Neural Computation. ~ oz Carrasco, Rafael C. Mikel L. Forcada M. Angeles Vald es-Mun ||, Jose Oncina, & Mikel L. Forcada. 1999. EÆcient encodings of nite automata in discrete-time recurrent neural networks. In Proceedings of ICANN'99, International Conference on Articial Neural Networks . (in press). 1996. The dynamics of discrete-time computation, with application to recurrent neural networks and nite state machine extraction. Neural Computation 8(6):1135{1178. Casey, M. , D. Servan-Schreiber, & J. L. McClelland. 1989. Finite state automata and simple recurrent networks. Neural Computation 1(3):372{381. Cleeremans, A. Elman, J. L. 1990. Finding structure in time. Cognitive Science 14:179{211. 3 , & R. C. Carrasco. 1995. Learning the initial state of a second-order recurrent neural network during regular-language inference. Neural Computation 7(5):923{930. Forcada, M. L. Paolo, Marco Gori, Marco Maggini, & Giovanni Soda. 1996. Representation of nite-state automata in recurrent radial basis function networks. Machine Learning 23:5{32. Frasconi, , C. B. Miller, D. Chen, H. H. Chen, G. Z. Sun, & Y. C. . 1992. Learning and extracted nite state automata with second-order recurrent neural networks. Neural Computation 4(3):393{405. Giles, C. L. Lee , Marco Maggini, E. Martinelli, & G. Soda. 1998. Inductive inference from noisy examples using the hybrid nite state lter. IEEE Transactions on Neural Networks 9(3):571{575. Gori, Marco , & D. R. Hush. 1996. Bounds on the complexity of recurrent neural network implementations of nite state machines. Neural Networks 9(2):243{252. Horne, B. G. 1956. Representation of events in nerve nets and nite automata. In Automata Studies , ed. by C.E. Shannon & J. McCarthy, 3{42. Princeton, N.J.: Princeton University Press. Kleene, S.C. 1994. Fool's gold: Extracting nite state machines from recurrent network dynamics. In Advances in Neural Information Processing Systems 6 , ed. by J. D. Cowan, G. Tesauro, , & J. Alspector, 501{508, San Mateo, CA. Morgan Kaufmann. Kolen, J. F. , & Jordan B. Pollack. 1995. The observer's paradox: apparent computational complexity in physical systems. Journal of Experimental and Theoretical Articial Intelligence 7:253{277. Kolen, J.F. , 1996. A Theory of Grammatical Induction in the Connectionist Paradigm . Edmonton, Alberta: Department of Computer Science, University of Alberta dissertation. Kremer, Stefan C. , & R. Fanelli. 1994. First order recurrent neural networks and deterministic nite state automata. Neural Computation 6(6):1154{1172. Manolios, P. , & Andrew Noetzel. 1992. Forcing simple recurrent neural networks to encode context. In Proceedings of the 1992 Long Island Conference on Articial Intelligence and Computer Graphics . Maskara, Arun , & W. H. Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115{ 133. McCulloch, W. S. 1967. Computation: Finite and Innite Machines . Englewood Clis, NJ: Prentice-Hall, Inc. Ch: Neural Networks. Automata Made up of Parts. Minsky, M.L. , & M. L. Forcada. 1996. Beyond Mealy machines: Learning translators with recurrent neural networks. In Proceedings of the World Conference on Neural Networks '96 , 408{411, San Diego, California. ~ Neco, R. P. 4 Forcada, Rafael C. Carrasco, & . 1999. Encoding of sequential translators in discrete-time recurrent neural nets. In Proceedings of the European Symposium on Articial Neural Networks ESANN'99 , 375{380. ~ Neco, n Ramo , P. Mikel L. ~ oz M. Angeles Vald es-Mun , & C. L. Giles. 1996. Constructing deterministic nite-state automata in recurrent neural networks. Journal of the ACM 43(6):937{972. Omlin, C. W. 1995. Gradient calculations for dynamic recurrent neural networks: a survey. IEEE Transactions on Neural Networks 6(5):1212{ 1228. Pearlmutter, B. A. 1991. The induction of dynamical recognizers. Machine Learning 7:227{252. Pollack, Jordan B. , & R. Alquezar. 1994. Active grammatical inference: a new learning methodology. In Shape and Structure in Pattern Recognition , ed. by Dov Dori & A. Bruckstein, Singapore. World Scientic. Proceedings of the IAPR International Workshop on Structural and Syntactic Pattern Recognition SSPR'94 (Nahariya, Israel). Sanfeliu, A. . 1997. Analog stable simulation of discrete neural networks. Neural Network World 7:679{686. S ma, Jir ||, & Jir Wiedermann. 1998. Theory of neuromata. Journal of the ACM 45(1):155{178. , & Jozef Sajda. 1995. Learning and extracting initial Mealy automata with a modular neural network model. Neural Computation 7(4). o, Peter Tin , & G. M. Kuhn. 1992. Induction of nite-state languages using second-order recurrent networks. Neural Computation 4(3):406{414. Watrous, R. L. 5
© Copyright 2026 Paperzz