Think Aloud Imitation Learning Fast & easy learning of complex sequences Simulation Theory & Imitation Simulation theory: Get in the “target’s shoes” to know what it thinks or will do. Agent 2 faces task T. Agent 1: “If I were A2, I would do action C. Then probably A2 will do C”. Machine Learning: Neural Networks Psychology: Think Aloud protocols Temporal tasks with long term dependencies, e.g., with Recurrent Neural Networks Internal recurrences, hidden states, internal memories Ex: (ZZ)* is difficult to learn! (with gradient descent) Learning by imitation: Agent 2 faces task T. Agent 1: “If I were A2, I wouldn’t know what to do. A2 does action C and receives a reward. Then I learn that I could do C in task T to get a reward”. Agent 1 Much information is hidden inside the agent’s head. Supervisor may not know how to solve the task Simulation theory not applicable. Agent expresses aloud what it thinks. Externalize hidden states : Internal states are forced to become external, no internal recurrence. Agent : “in this problem, I can see an analogy with another problem I solved earlier…” Supervisor only listens and does not interact. (Agent needs only very few learning to think aloud.) Agent 1 Agent 2 (imitator) Understand what people are thinking while solving a given task. Agent 1 : ZZZZZZZZZZZZ class 1 Agent 1 : ZZZZZZZZZZ class 1 Agent 1 : ZZZZZZZ class 0 Agent 2 : ZZZZZZZZ ??? (too hard) Supervisor Agent 1 develops NEW hidden states. Agent 2 watches A1 to imitate it, but cannot see these hidden states. Environment Task is as difficult for A2 as for A1! Very fast Learning Imitation alone does not help! Agent, no internal recurrence Thinks aloud in envir. Think Aloud Imitation Learning Force recurrence to be external Agent 1 “thinks aloud” no internal states (no internal recurrence) all states are externalized. Agent 2, no internal recurrence External feedback Agent 2 can hear what Agent 1 thinks, can reproduce its thoughts. Learning complex temporal tasks with long term dependencies becomes easy! Agent 1, no internal recurrence Environment Agent 1 : Z – odd – Z – even – … – Z – odd – Z – even Agent 2 : …………………………...– Z – odd – Z – even Easy and fast learning ! Laurent ORSEAU, PhD Student Supervisor : P.-Y. Glorennec [email protected] INSA / IRISA, Rennes, France
© Copyright 2026 Paperzz