Neural network model for reaching a goal state

Europaisches Patentamt
19
European Patent Office
Office europeen des brevets
EUROPEAN
© Publication number : 0 4 6 2
91 6 A 3
PATENT A P P L I C A T I O N
© Application number : 91480079.2
(22) Date of filing : 22.05.91
@ Priority: 21.06.90 US 541570
(43) Date of publication of application
27.12.91 Bulletin 91/52
@ Designated Contracting States :
BE CH DE ES FR GB IT LI NL SE
© int. ci.5: G06F 15/80, G05B 13/00,
G06F 1 5 / 1 8
(72) Inventor : Kenton, Jerome Lynne
611 Cortland Lane S.W.
Rochester, Minnesota 55902 (US)
@ Representative : Vekemans, Andre
Compagnie IBM France Departement de
Propriete Intellectuelle
F-06610 La Gaude (FR)
(88) Date of deferred publication of search report :
18.03.92 Bulletin 92/12
© Applicant : International Business Machines
Corporation
Old Orchard Road
Armonk, N.Y. 10504 (US)
© Neural network model for reaching a goal state.
©
CO
<
CO
An object, such as a robot, is located at an initial state in a finite state space area and moves under the
control of a unsupervised neural network model. The network instructs the object to move in one of
several directions from the initial state. Upon reaching another state, the model again instructs the
object to move in one of several directions. These instructions continue until either : a) the object has
completed a cycle by ending up back at a state it has been to previously during this cycle, or b) the
object has completed a cycle by reaching the goal state. If the object ends up back at a state it has been
to previously during this cycle, the neural network model ends the cycle and immediately begins a new
cycle from the present location. When the object reaches the goal state, the neural network model
learns that this path is productive towards reaching the goal state, and is given delayed reinforcement in
the form of a "reward". Upon reaching a state, the neural network model calculates a level of
satisfaction with its progress towards reaching the goal state. If the level of satisfaction is low, the
neural network model is more likely to override what has been learned thus far and deviate from a path
known to lead to the goal state to experiment with new and possibly better paths. If the level of
satisfaction is high, the neural network model is much less likely to experiment with new paths. The
object is guaranteed to eventually find the best path to the goal state from any starting location,
assuming that the level of satisfaction does not exceed a threshold point where learning ceases.
o>
CM
CO
LU
Jouve, 18, rue Saint-Denis, 75001 PARIS
>-*
"J
X
i
_
□
3 rt
c
CD r t
§ro
"
-
V
"
EP 0 462 916 A3
European ratent
Office
Application Number
EUROPEAN SEARCH REPORT
EP
UOCUMEN Is CONSIDERED TO BE RELEVANT
v iiauun oi document wren indication, where appropriate,
Kelevant
\~ategury
of relevant passages
to claim
NhUKAL NtlWUKKS vol. 2, no. 2, 1989,
1,11
pages 79-102, Elmsford, NY, US; S.
GR0SSBERG et al . : "Neural Dynamics of
Adaptive Timing and Temporal
Discrimination During A s s o c i a t i v e
Learning"
* abstract *
RUMELHART ET AL. : "PARALLEL DISTRIBUTED
PROCESSING" vol. 1, chapter 7, 1986,
^IT Press, Cambridge, Massachusetts, US
* the whole document *
91 48 0079
CLASSIFICATION OF THE
APPLICATION (Int. CI. 5)
G 06 F
G 05 B
G 06 F
15/80
13/00
15/18
1-16
IXCHNICAL FIELDS
SEARCHED (Int. ci.5)
2 06 F
3 05 B
i ne present searcn report nas Deen drawn up tor all claims
jaie oi completion ot ine searcn
3ERLIN
35-12-1991
v ■i.vivj ■vyi v.i 11.if ijyjy.v .n r. >i n
<: particularly relevant if taken alone
I' : particularly relevant if combined with another
document of the same category
V: technological background
3 : non-written disclosure
2: intermediate document
hxamiuer
«CH0LLS J
leory or principle underlying the invention
arlier patent document, but published on, or
fter the filing date
ocument cited in the application
icument cited for other reasons
&: member of the same patent family, corresponding
document