Poster - Irisa

Think Aloud Imitation Learning
Fast & easy learning of complex sequences
Simulation Theory
& Imitation
Simulation theory:
Get in the “target’s shoes”
to know what it thinks or will do.
Agent 2 faces task T.
Agent 1:
“If I were A2, I would do action C.
Then probably A2 will do C”.
Machine Learning:
Neural Networks
Psychology:
Think Aloud protocols
Temporal tasks with
long term dependencies,
e.g., with Recurrent Neural Networks
Internal recurrences, hidden states,
internal memories
Ex: (ZZ)* is difficult to learn!
(with gradient descent)
Learning by imitation:
Agent 2 faces task T.
Agent 1:
“If I were A2, I wouldn’t know what to do.
A2 does action C and receives a reward.
Then I learn that I could do C in task T
to get a reward”.
Agent 1
Much information is hidden inside the agent’s
head.
Supervisor may not know how to solve the task
 Simulation theory not applicable.
Agent expresses aloud what it thinks.
 Externalize hidden states :
Internal states are forced to become external,
no internal recurrence.
Agent : “in this problem, I can see an analogy
with another problem I solved earlier…”
Supervisor only listens and does not interact.
(Agent needs only very few learning to think
aloud.)
Agent 1
Agent 2
(imitator)
Understand what people are thinking
while solving a given task.
Agent 1 : ZZZZZZZZZZZZ  class 1
Agent 1 : ZZZZZZZZZZ  class 1
Agent 1 : ZZZZZZZ
 class 0
Agent 2 : ZZZZZZZZ
 ??? (too hard)
Supervisor
Agent 1 develops NEW hidden states.
Agent 2 watches A1 to imitate it,
but cannot see these hidden states.
Environment
 Task is as difficult for A2 as for A1!
Very fast Learning
Imitation alone does not help!
Agent,
no internal
recurrence
Thinks
aloud in
envir.
Think Aloud Imitation Learning
Force recurrence to be external
Agent 1 “thinks aloud”
no internal states
(no internal recurrence)
all states are externalized.
Agent 2,
no internal
recurrence
 External feedback
Agent 2 can hear what Agent 1 thinks,
can reproduce its thoughts.
 Learning complex temporal tasks
with long term dependencies
becomes easy!
Agent 1,
no internal
recurrence
Environment
Agent 1 : Z – odd – Z – even – … – Z – odd – Z – even
Agent 2 : …………………………...– Z – odd – Z – even
Easy and fast learning !
Laurent ORSEAU, PhD Student
Supervisor : P.-Y. Glorennec
[email protected]
INSA / IRISA, Rennes, France