Preventing Catastrophic Interference in Multiple-Sequence Learning Using Coupled Reverberating Elman Networks Bernard Ans, Stéphane Rousset, Robert M. French & Serban Musca (European Commission grant HPRN-CT-1999-00065) The Problem of Multiple-Sequence Learning • Real cognition requires the ability to learn sequences of patterns (or actions). (This is why SRN’s – Elman Networks – were originally developed.) • But learning sequences really means being able to learn multiple sequences without the most recently learned ones erasing the previously learned ones. • Catastrophic interference is a serious problem for the sequential learning of individual patterns. It is far worse when multiple sequences of patterns have to be learned consecutively. The Solution • We have developed a “dual-network” system using coupled Elman networks that completely solves this problem. • These two separate networks exchange information by means of “reverberated pseudopatterns.” Pseudopatterns • Assume a network-in-a-box learns a series of patterns produced by a function f(x). • These original patterns are no longer available. How can you approximate f(x)? Outputs f(x) f(x) Neural Network Inputs 1 0 0 1 1 Random Input 1 1 1 0 0 0 Associated output 1 1 Random Input 1 1 1 0 0 0 This creates a pseudopattern: Associated output 1 1 Random Input 1: 1 0 0 1 1 1 1 0 A large enough collection of these pseudopatterns: 1: 2: 3: 4: Etc 10011 11000 00010 01111 110 011 100 000 will approximate the originally learned function. Transferring information from Net 1 to Net 2 with pseudopatterns Associated output 110 1 1 0 target Net 1 Random input Net 2 1 0 0 1 1 input 10011 Information transfer by pseudopatterns in dual-network systems • New information is presented to one network (Net 1). • Pseudopatterns are generated by Net 2 where previously learned information is stored. • Net 1 then trains not only on the new pattern(s) to be learned, but also on the pseudopatterns produced by Net 2. • Once Net 1 has learned the new information, it generates (lots of) pseudopatterns that train Net 2 This is why we say that information is continually transferred between the two networks by means of pseudopatterns. Are all pseudopatterns created equal? No. Even though the simple dual-network system (i.e., new learning in one network; long-term storage in the other) using simple pseudopatterns does eliminate catastrophic interference, we can do better using “reverberated” pseudopatterns. Building a Network that uses “reverberated” pseudopatterns. Start with a standard backpropagation network Output layer Hidden layer Input layer Add an autoassociator Output layer Hidden layer Input layer A new pattern to be learned, P: Input Target, will be learned as shown below. Input Target Input What are “reverberated pseudopatterns” and how are they generated? We start with a random input î0, feed it through the network and collect the output on the autoassociative side of the network.. This output is fed back into the input layer (“reverberated”) and, again, the output on the autoassociative side is collected. This is done R times. iˆ1 iˆ0 iˆ1 iˆ1 iˆ2 iˆ1 iˆ2 iˆ2 iˆ3 iˆ2 After R reverberations, we associate the reverberated input and the “target” output. This forms the reverberated pseudopattern: : iˆR tˆ tˆ iˆR This dual-network approach using reverberated pseudopattern information transfer between the two networks effectively overcomes catastrophic interference in multiple-pattern learning Net 1 New-learning network Net 2 Storage network But what about multiple-sequence learning? • Elman networks are designed to learn sequences of patterns. But they forget catastrophically when they attempt to learn multiple sequences. • Can we generalize the dual-network, reverberated pseudopattern technique to dual Elman networks and eliminate catastrophic interference in multiple-sequence learning? Yes Elman networks (a.k.a. Simple Recurrent Networks) S(t+1) Hidden H(t) Standard input S(t) Copy hidden unit activations from previous time-step Context H(t-1) Learning a sequence S(1), S(2), …, S(n). A “Reverberated Simple Recurrent Network” (RSRN): an Elman network with an autoassociative part “autoassociative” (Input) nodes Teacher S(t) H(t-1) “target” nodes S(t+1) Error Output layer Hidden layer Input layer H(t) S(t) Standard Input H(t-1) Context RSRN technique for sequentially learning two sequences A(t) and B(t). • • • • Net 1 learns A(t) completely. Reverberated pseudopattern transfer to Net 2. Net 1 makes one weight-change pass through B(t). Net 2 generates a few “static” reverberated pseudopatterns • Net 1 does one learning epoch on these pseudopatterns from Net 2. • Continue until Net 1 has learned B(t). • Test how well Net 1 has retained A(t). Two sequences to be learned: A(0), A(1), … A(10) and B(0), B(1), … B(10) Net 1 Net 1 learns (completely) sequence A(0), A(1), …, A(10) Net 2 Transferring the learning to Net 2 1110010011010 1110010011010 Teacher Net 1 Net 2 010110100110010 010110100110010 Input Net 1 produces 10,000 pseudopatterns, 1Net 1 : 010110100110010 1110010011010 Transferring the learning to Net 2 1110010011010 Teacher Net 1 feedforward Net 2 010110100110010 Input Transferring the learning to Net 2 1110010011010 Teacher Net 1 Backprop Netchange 2 weight 010110100110010 Input For each of the 10,000 pseudopatterns produced by Net 1, Net 2 makes 1 FF-BP pass. Learning B(0), B(1), … B(10) by NET 1 Net 1 1. Net 1 does ONE learning epoch on sequence B(0), B(1), …, B(10) 3. Net 1 does one FF-BP pass on each NET 2 Net 2 2. Net 2 generates a few pseudopatterns NET 2 Learning B(0), B(1), … B(10) by NET 1 Net 1 1. Net 1 does ONE learning epoch on sequence B(0), B(1), …, B(10) Net 2 2. Net 2 generates a few pseudopatterns NET 2 3. Net 1 does one FF-BP pass on each NET 2 Continue until Net 1 has learned B(0), B(1), …, B(10) Sequences chosen • Twenty-two distinct random binary vectors of length 100 are created. • Half of these vectors are used to produce the first ordered sequence of items, A, denoted by A(0), A(1), …, A(10). • The remaining 11 vectors are used to create a second sequence of items, B, denoted by B(0), B(1), …, B(10). • In order to introduce a degree of ambiguity into each sequence (so that a simple BP network would not be able to learn them), we modify each sequence so that A(8) = A(5) and B(5) = B(1). Test method • First, sequence A is completely learned by the network. • Then sequence B is learned. • During the course of learning, we monitor at regular intervals how much of sequence A has been forgotten by the network. Normal Elman networks: Catastrophic forgetting (a) 100 Recall of Sequence B Incorrect output units (%) 90 (a): Learning of sequence B (after having previously learned sequence A). By 450 epochs (an epoch corresponds to one pass through the entire sequence), sequence B has been completely learned. 80 70 60 50 40 30 20 10 0 1 2 0 3 4 Seri 5 al p ositi on o f 100 6 Seq 7 uen c ch po ge n i 300 arn ce B f le 450 r o quen e Se mb Nu for s 200 8 eB 9 item s 10 (b) 100 Recall of Sequence A Incorrect output units (%) 90 80 70 60 50 40 30 20 10 0 1 2 450 3 4 Seri 5 al p ositi on o f 300 6 Seq 7 uen c s och ep g n rni e B 0 lea of uenc r e q Se mb Nu for 200 100 8 eA 9 item s 10 (b): The number of incorrect units (out of 100) for each serial position of sequence A during learning of sequence B. After 450 epochs, the SRN has, for all intents and purposes, completely forgotten the previously learned sequence A Dual-RSRN’s: Catastrophic forgetting is eliminated (a) 100 Recall performance for sequences B and A during learning of sequence B by a dual-network RSRN. Recall of Sequence B Incorrect output units (%) 90 80 70 60 50 40 30 20 10 0 1 2 0 3 4 Seri 5 al p ositi on o f s och ep g n 300 rni e B ea of l uenc 400 r e q mb Se Nu for 100 6 Seq 200 7 uen c (a): By 400 epochs, the second sequence B has been completely learned. 8 eB 9 item s 10 (b) 100 Recall of Sequence A Incorrect output units (%) 90 (b): The previously learned sequence A shows virtually no forgetting. Catastrophic forgetting of the previously learned sequence A has been completely overcome. 80 70 60 50 40 30 20 10 0 1 2 0 3 4 Seri 5 al p ositi on o f 100 6 Seq 7 uen c ch po ge n i 300 arn ce B f le 400 r o quen e m b Se Nu for 200 8 eA 9 item s 10 s Recall of Incorrect o Recall of Se Incorrect outp 50 40 30 20 40 30 20 10 10 0 Normal Elman Network: Massive forgetting % Error on Sequence A 0 1 2 3 Seri al po s 4 chs po 200 e ng rni 300 ea nce B l f o e 450 er qu mb or Se u N f 100 5 6 ition 7 of S 8 equ enc eB 9 item s 10 Dual RSRN: Seri chs al p ositi No forgetting of g epo on o in f Se arn ce B e que l nce B ite Sequence ANumbefroorfSequen ms 1 0 2 3 0 4 5 100 6 7 200 300 8 9 10 (b) 400 (b) 100 100 90 80 Recall of Sequence A Incorrect output units (%) Recall of Sequence A Incorrect output units (%) 90 70 60 50 40 30 20 10 0 80 70 60 50 40 30 20 10 1 2 3 Seri al po s 450 4 0 300 5 6 ition 7 of S 8 equ enc eA chs po e ng rni 0 B lea nce f r o que e Se mb Nu for 200 100 9 item s 10 Seq. B being learned 1 2 0 3 4 Seri 5 al p ositi on o f 100 6 Seq 7 uen c och ep g n rni B 300 lea nce f e 400 ro be Sequ m Nu for 200 8 eA 9 item s 10 s Cognitive/Neurobiological plausibility? • The brain, somehow, does not forget catastrophically. • Separating new learning from previously learned information seems necessary. • McClelland, McNaughton, O’Reilly (1995) have suggested the hippocampal-neocortical separation may be Nature’s way of solving this problem. • Pseudopattern transfer is not so far-fetched if we accept results that claim that neo-cortical memory consolidation, is due, at least in part, to REM sleep. Conclusions • The RSRN reverberating dual-network architecture (Ans & Rousset, 1997, 2000) can be generalized to sequential learning of multiple temporal sequences. • When learning multiple sequences of patterns, interleaving simple reverberated input-output pseudopatterns, each of which reflect the entire previously learned sequence(s), reduces (or eliminates entirely) forgetting of the initially learned sequence(s).
© Copyright 2026 Paperzz