An Experiment and a Bio-Constrained Model of - PUMA

An Experiment and a Bio-Constrained Model of
Children's Action-Outcome Learning
Driven by Intrinsic Motivations
Gianluca Baldassarre, Francesco Mannella,
Vincenzo Fiore, Valerio Sperati,
Daniele Caligiore, Marco Mirolli,
Laboratory of Computational Embodied
Neuroscience,
Istituto di Scienze e Tecnologie della
Cognizione,
Consiglio Nazionale delle Ricerche,
Via San Martino della Battaglia 44,
I-00185 Roma, Italy
{gianluca.baldassarre, francesco.mannella,
vincenzo fiore, valerio sperati,
daniele.caligiore, marco.mirolli}@istc.cnr.it
Patricia Shaw,
James Law,
Mark Lee,
Intelligent Robotics
Group,
Department of
Computer Science,
Aberystwyth
University,
Ceredigion Wales
SY23 3DB, UK,
{phs, jxl,
[email protected]}
Abstract—We illustrate research run within the EU project
IM-CLeVeR (“Intrinsically Motivated Cumulative Learning
Versatile Robots”) that aims to investigate learning of actionoutcome associations based on intrinsic motivations, and action
recall based on outcome re-activation. The research is based on:
(a) the implementation of a “mechatronic board” suitable to
investigate learning based on intrinsic motivations; (b) empirical
experiments run with children; (c) a system-level bio-constrained
computational embodied model aiming to interpret the results
and to formulate an hypothesis on the brain mechanisms
generating them. Intrinsic motivations differ from extrinsic
motivations, based on events such as food intake and sex, in that
they are related to drives to explore, novelty of stimuli, the
success in achieving own goals, etc., maximally apparent in
children at play. The experiments show how intrinsic motivations
can drive the learning of action-outcome associations that, in a
later stage, allow children to readily recall the execution of an
action if its outcome becomes desirable (i.e., becomes a “goal”).
The model furnishes the first system-level operational hypothesis
on the brain mechanisms that might underly action-outcome
learning based on intrinsic motivations. The model architecture
and functioning is constrained on the basis of known relevant
system-level neuroscientific evidence, and is validated with tests
run with both the simulated and real humanoid robot iCub.
Index Terms—Goal-directed behaviour, intrinsic motivations,
mechatronic board, motor babbling.
I. INTRODUCTION
Intrinsic motivations (IM) have been first described by
psychologists [1] to explain motivational and learning
processes that could not be accounted for on the basis of the
behaviorist framework based on homeostatic regulations,
drives, and extrinsic rewards (e.g., food, pain, sex). For
example, IM can explain why animals persevere in solving
puzzles in the absence of extrinsic rewards [1], why they
engage longer with complex, unexpected, or in general
“surprising” objects [2], or why they can be motivated to
perform actions that have a strong impact (“effectance”) on
Fabrizio Taffoni,
Domenico Formica,
Flavio Keller,
Eugenio Guglielmelli,
Biomedical Robotics and
Biomicrosystem Lab,
Università Campus
Bio-Medico di Roma,
Via Álvaro del Portillo 21,
I-00128 Roma, Italy
{f.taffoni, d.formica, f.keller,
e.guglielmelli}@unicampus.itt
the environment [3]. In general, as argued in detail in [4], IM
have the function of driving the acquisition of general-purpose
knowledge and skills that can later be used to accomplish
fitness-enhancing useful tasks (impacting the visceral body
and its homeostatic regulations), although these fitnessenhancements are not present at the moment of the acquisition
of the skills and knowledge themselves. Machine learning and
modeling has recently started to pay attention to IM as
intrinsic motivations have the potential to support autonomous
cumulative learning in intelligent machines and robots [5-9].
Notwithstanding the importance of IM, there is still a lack
of understanding of how in detail they drive the acquisition of
new skills and knowledge and how these are exploited in a
later stage. Here we introduce a new experimental set-up, a
new experimental protocol, and a new computational model
with a high potential to shed new light on these phenomena.
II.A MECHATRONIC BOARD FOR STUDYING INTRINSIC
MOTIVATION-DRIVEN LEARNING IN CHILDREN AND ROBOTS
This section introduces a novel experimental device, called
the mechatronic board, specifically designed to investigate
intrinsically motivated cumulative learning in children and
robots. The basic idea behind the tool is to provide a
standardized experimental set-up where both intrinsic and
extrinsic rewards can be delivered during the execution of a
set of predefined actions. To this purpose, the platform
embeds a set of non-intrusive modular “smart” physical
interfaces (“mechatronic modules” based on gears, buttons,
levers, etc.) that, when manipulated, can generate several
surprising/novel audio-visual effects (based on light patterns
and sounds), and openings of three boxes. In particular the
boxes, which have a transparent front allowing their contents
to be visible, are used to deliver extrinsic rewards (e.g.,
stickers in the case of children). Kinematic quantitative data
can be automatically recorded by the board during the
participant's free interactions with it. The design of the board
allows insertion of different and new mechatronic modules
into three slots and also to freely program how the board
responds when these are manipulated. All these features
render the mechatronic board a highly innovative
experimental tool that can be used to study cumulative
learning based on intrinsic motivations in both children and
robots.
III. EXPERIMENTS AND QUANTITATIVE ASSESSMENT OF
INTRINSICALLY MOTIVATED LEARNING IN CHILDREN
This section introduces the results of a novel experimental
protocol conceived to study intrinsically motivated learning in
children using the mechatronic board illustrated in Sect. II.
The proposed protocol consists of two different phases. In the
first “learning phase” the participant can freely explore the
board and its functionalities to learn which action leads to
opening the reward boxes (these are always empty in this
phase). In the second “test phase” a reward (e.g., an animal
sticker) is put inside one of the closed boxes and the child is
asked to retrieve the sticker without any suggestion on which
action causes the box to open. This experimental paradigm
aims at investigating how the action-outcome relations
acquired during the learning phase on the basis of IM can be
exploited by children during the test phase to accomplish a
valuable task (getting the sticker). The results allow
understanding of how intrinsic motivations operating during
the first phase drive children to acquire important skills and
knowledge that can be readily re-used in the second phase to
accomplish the extrinsically rewarded tasks. The study
contributes to enlighten the adaptive importance of intrinsic
motivations driving learning in higher organisms, and the
specific mechanisms underlying them. The results also show
how the mechatronic board can be used as a flexible research
tool to quantitatively assess intrinsically motivated learning.
IV. A SYSTEM-LEVEL BIO-CONSTRAINED MODEL OF ACTIONOUTCOME LEARNING BASED ON INTRINSIC MOTIVATIONS
This section introduces the bio-constrained computational
embodied model built to study the experiment run with
children illustrated in Sect. III. The model aims to furnish a
system-level operational hypothesis on the brain mechanisms
that underlay IM-based learning, and in particular to explain:
(a) how IM can guide learning of the context where certain
actions can produce a certain outcome, and the associations
between such actions and outcomes; (b) how the actionoutcome associations so learned can be later used to quickly
achieve outcomes if this becomes desirable (e.g., the opening
of a box can become desirable for a child if a toy is inserted
into it). The architecture and functioning of the model was
constrained on the basis of relevant neuroscientific evidence.
A first sensorimotor component of the model is formed by
sensorimotor cortex mappings, trained on the basis of motor
babbling before the actual experiment starts, which implement
a repertoire of actions such as “look at the red button” or
“reach-and-press the button you are looking at”. This mimics
the fact that when children face the experiment they already
posses a repertoire of actions acquired in previous
experiences. The mapping is based on connections that link
neural field maps representing visual perception,
proprioception, or motor spaces. A second decision making
component of the model, receiving information from and
controlling the previous one, is formed by three basal gangliacortical loops: (a) a first loop, involving the motor cortex,
selects arm actions; (b) a second loop, involving the parietal
cortex and the frontal-eye-field cortex, selects eye movements
(attention); (c) a third loop, involving the prefrontal cortex,
encodes and selects outcomes. The model is also informed by
a dopamine-neuron component (substantia nigra pars
compacta), activated by surprising events such as the sudden
opening of a box, that during the first phase generates learning
signals driving the learning of basal ganglia-cortical loops.
The learning process based on dopamine has a transient nature
so that the system focuses its activity on novel outcomes,
learns the related action-outcomes associations, “gets bored of
them”, focuses on other surprising outcomes, and so on.
During this process the model also learns, with an Hebbian
learning rule, action-outcome associations within the corticocortical connections involving prefrontal cortex and
parietal/frontal-eye-field cortex. In the second phase of the
experiment these connections allow triggering suitable actions
if the corresponding outcomes are activated. The model was
validated with tests run with both a simulated and a real iCub
humanoid robot to test its scalability in a realistic scenario.
Although preliminary, the results of the tests show the
computational soundness of the model and, together with the
system-level biological constraints imposed on its macroarchitecture, show that it represents a promising operational
hypothesis on action-outcome learning mechanisms based on
intrinsic motivations.
ACKNOWLEDGMENTS
This research was funded by the European Community's
Seventh Framework Programme FP7/2007-2013, “Challenge
2 - Cognitive Systems, Interaction, Robotics'', grant agreement
No. ICT-IP-231722, project ''IM-CLeVeR - Intrinsically
Motivated Cumulative Learning Versatile Robots''.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
H. F. Harlow, “Learning and satiation of response in intrinsically
motivated complex puzzle performance by monkeys,” J of Comparative
and Physiological Psychology, vol. 43, pp. 289–294, 1950.
D. E. Berlyne, “Curiosity and exploration,” Science, vol. 143, pp. 25–
33, 1966
R. W. White, “Motivation reconsidered: The concept of competence,”
Psychological Review, vol. 66, pp. 297–333, 1959.
G. Baldassarre, “What are intrinsic motivations? A biological
perspective”, ICDL-EPIROB2011, 2011.
M. Schembri, M. Mirolli, and G. Baldassarre, “Evolving childhood’s
length and learning parameters in an intrinsically motivated
reinforcement learning robot,” in EpiRob2007, Lund: Lund University,
2007, vol. 134, pp. 141–148.
M. H. Lee, “Intrinsic activity: from motor babbling to play,” ICDLEPIROB2011, 2011.
J. Schmidhuber, “Formal theory of creativity, fun, and intrinsic
motivation (1990–2010),” IEEE Transactions on Autonomous Mental
Development, vol. 2, no. 3, pp. 230–247, 2010.
S. Singh, R. Lewis, A. Barto, and J. Sorg, “Intrinsically motivated
reinforcement learning: An evolutionary perspective,” IEEE
Autonomous Mental Development, vol. 2, no. 2, pp. 70–82, 2010.
P. Oudeyer, F. Kaplan, and V. Hafner, “Intrinsic motivation systems for
autonomous mental development,” IEEE Transactions on Evolutionary
Computation, vol. 11, no. 2, pp. 265–286, 2007.