Presentation (PPT format)

Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
When an autonomous embodied system,
with a difficult animal-like mission in a
difficult environment, has a sufficiently high
level of intelligence (i.e. is able to achieve
that mission well), then it may exhibit
consciousness, either as a necessary
component for achieving the mission, or as a
by-product.
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Holland and Goodman – Caltech – Banbury 2001
Robots
Holland and Goodman – Caltech – Banbury 2001
A Simple Robot
•The Khepera miniature robot
• Features
•8 IR sensors which allow it to
detect objects
• two independently controlled
motors.
5.5 cm
Holland and Goodman – Caltech – Banbury 2001
Webots – Khepera Embodied
Simulator
Simulators
allow faster
operation
than real
robots –
particularly if
learning
involved.
Simlulator
complexity is OK
for a simple robot
like the Khepera,
but for more
complex robots,
the simulator
may be too
complex or not
simulate the real
word accurately.
Holland and Goodman – Caltech – Banbury 2001
A Generic Robot Controller Architecture
HIDDEN
UNITS
Sensory
Inputs
Including
Motors and
effectors
INPUT
UNITS
STATE
UNITS
OUTPUT
UNITS
Controller
outputs to
motors and
effectors
Recurrent
Neural
Machine
• The controller of the robot is an artificial neural network with recurrent
feedback, capable of forming internal representations of sensory information in the
form of a neural state machine.
•Sensory inputs (vision, sound, smell, etc) from sensors are fed to this structure
•Sensory inputs also include feedback from the motors and effectors.
•Controller outputs drive the locomotion and manipulators of the robot.
•The neural controller learns to perform a task, using neural network and genetic
algorithm techniques.
•But - the internal model of the controller is implicit and therefore hidden from us.
Holland and Goodman – Caltech – Banbury 2001
Understanding the internal model
•Introduce a second recurrent
neural network, separate
from the first system, which
learns the inverse relationship
between the internal activity
of the controller and the
sensory input space
HIDDEN
UNITS
Sensory
inputs,
including
feedback
from motors
& effectors
Outputs of
inverse in
same sensory
space as
inputs of
OBSERVE
forward
controller
OUTPUT
UNITS
INPUT
UNITS
STATE
UNITS
Motor &
effector
drive
outputs
Recurrent
Neural
Machine
INVERSE
Recurrent Neural
Machine
•This mechanism will allow us to represent the hidden internal state of the controller
in terms of the sensory inputs that correspond to that state.
•Thus we may claim to know something of “what the robot is thinking”.
•We assume that the controller be learned first, and that, once this is learned and
reasonably stable, the inverse can be learned.
Holland and Goodman – Caltech – Banbury 2001
Simplified Inverse
•In this experiment, we utilize a controller model which is much
less powerful than the recurrent controllers described above, but
allows us to illustrate the principle, and in particular makes
“inversion” of the forward controller extremely simple.
•The crucial simplification we make is that the controller will learn
its representation directly in the input space. Thus there is no
inverse to learn - the internal representation learned by the robot is
directly visible as an input space vector.
•The first phase is to learn or program the forward model or robot
controller. In this simple experiment we program in a simple
reactive wall-following behavior, rather than learn a complex
behavior. The robot starts with no internal model, and adaptively
learns its internal representation in an unsupervised manner as it
performs its wall following behavior.
Holland and Goodman – Caltech – Banbury 2001
The Learning Algorithm
(based on Linaker and Niklasson 2000 ARAVQ algorithm)
• A 10-dimensional feature space is formed from the 8
Khepera IR sensor signals plus the 2 motor drive
signals.
• Clusters feature-vectors by change detection, to form
prototype feature vector “models”.
• Unsupervised
• Adds new models based on two criteria:
• Novelty: Large distance from existing models
• Stability: Low variance in buffered history of features
• Adapts existing models over time
• We program in a simple “wall following” behavior to
act as a “teacher”.
Holland and Goodman – Caltech – Banbury 2001
Learning in action
Colors show learned concepts:
Black – right wall
Blue – ahead wall
Green – 45 degree right wall
Red – corridor
Light Blue – outside corner
Holland and Goodman – Caltech – Banbury 2001
Running with the model
• Switch off the wall follower
• The robot “sees” features as it moves
• Choose the closest learned model vector at each
tick
• Use the model vector motor drive values to
actually drive the motors.
Holland and Goodman – Caltech – Banbury 2001
Running with the model
Color indicates which
is the current
“best”model feature
Holland and Goodman – Caltech – Banbury 2001
Run the model in the real robot
Holland and Goodman – Caltech – Banbury 2001
Invert the motor signals back to sensory signals to
infer an egocentric “map” of the environment as
“seen” by the robot.
0.08
0.14
0.12
0.06
0.1
0.04
0.08
0.06
0.02
0.04
0
0.02
0.02
0
0.02
0.08
0.06
0.04
0.02
0
0.02
0.04
0.06
0.1
0
0.1
Holland and Goodman – Caltech – Banbury 2001
Keeping it Real
• Mapping with the real robot
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
Holland and Goodman – Caltech – Banbury 2001
Manipulating the model “mentally”
to make a decision - “planning”
• Take the sequence of learned model feature
vectors and cluster sub –sequences into higherlevel concepts
• For example:
• Blue-Green-black = Left Corner
• Red = Corridor
• Black = right wall
• At any instant ask the robot to go to “home”
• Run the model forwards mentally to decide if it is
shorter to go ahead or to go back
• Take appropriate action
Holland and Goodman – Caltech – Banbury 2001
Decision Time
Corridor corner is
home
Rotate = Home is
behind me
Flash LED’s =
Home is ahead of
me
Holland and Goodman – Caltech – Banbury 2001
Inverse Predictor Architecture
•We now allow the
inverse to be fed back
into the controller via the
switch
CONTROLLER
Switch
Real World
Sense Signals
•Thus the controller has
an image of its internal
hidden state or “self” in
the same feature space as
its real sensory inputs
•Thus it can “see” what
it “itself” is thinking.
Switch
Motor
Signals
To
Real
Robot
INVERSE
Model World
Sense Signals
•As before “we” can also
observe what the
machine is “thinking”.
Holland and Goodman – Caltech – Banbury 2001
Consequences of the architecture
•In “normal” mode - the controller is producing motor signals based on the sensory input it “sees”
(including motor/effector feedback). Normally we expect to see what it is seeing. The inverse allows for
detecting mismatch between a predicted and an actual sensory input – thus indicating a novel
experience, which in turn could focus attention and learning in the main controller. Noisy, ambiguous,
and partial inputs can be “completed”.
•In “thinking or planning” mode the real world is disconnected from the controller input, and the
mental images being output by the inverse are input to the controller instead. Thus sequences of
planned action towards a goal can take place in mental space, and executed as action. Note that by
switching between normal mode and “thinking” mode in some way, we can emulate the robot doing
both reactive control and thinking at the same (multiplexed really) time. That is, like humans do when
driving a car on “automatic” while “thinking” of something else.
•In “sleeping” mode we shut off the sensory input and allow noise to be input. Then the inverse will
output “mental images”, which themselves can be fed back into the input (because they have the same
representation) producing a complex series of “imagined” mental images or “dreams”. Note that we
can use this “sleeping” mode to actually learn (or at least update) the inverse. The input noise vector is
a “sensory input” vector like any other (whether it is structured accordingly or not), thus the inverse
should be able to output this vector like any other from the state and motor signals. Thus we can use the
error to update the inverse.
•If we do not disconnect the motors during “dreaming” we will have “sleepwalking” or “twitching”. If
we assume that the controller is continually learning, then the inverse must be continually updated. If
they get too much out of synchronization we could get irrational sequences in “thinking” or worse in
execution mode - an analog of “madness”.
Holland and Goodman – Caltech – Banbury 2001
Where’s the Consciousness?
•
•
•
•
Not there yet
More complex robots
More complex environments
More complex architecture
SONY DREAM ROBOT
Head:2 degrees of freedom
Body:2 degrees of freedom
Arms:4 degrees of freedom (x2)
Legs:6 degrees of freedom (x2)
(Total of 24 degrees of freedom)
Holland and Goodman – Caltech – Banbury 2001
Increasing complexity
Environment
Agent
Fixed environment
Moving objects
Movable objects
Objects with different values
Other agents – prey
Other agents – predators
Other agents – competitors
Other agents – collaborators
Other agents – mates
Etc
Movable body
More sensors
Effectors
Articulated body
Metabolic state
Acquired skills
Tools
Imitative learning
Language
Etc
Holland and Goodman – Caltech – Banbury 2001
Multi-stage planning
At each step:
- what actions could it take?
- what actions should it take?
- what actions would it take?
The planning system needs
- a good and current model of the world
- a good and current model of the agent’s abilities,
expressible in terms of their effects on the model world
- an associated executive system to use the information
generated by the planning system
Holland and Goodman – Caltech – Banbury 2001
A framework?
Updates
Self Model
To executive
Updates
Environment Model
Holland and Goodman – Caltech – Banbury 2001
Speculation…
There may be something it is like to be
such a self-model linked to such a
world model in a robot with a mission
Holland and Goodman – Caltech – Banbury 2001