HUMANOBS REASONING PROCESS

HUMANOBS Reasoning Process
V1.2
HUMANOBS – Humanoids That Learn Socio-­‐Communicative Skills By Observation HUMANOBS REASONING PROCESS
Version 1.3 Authors Carlos Hernández1 Manuel Rodríguez1 Eric Nivel2 1Universidad Politécnica de Madrid – Autonomous Systems Laboratory 2RU-­‐CADIA Deliverable D12 Reasoning Process Release 1 BELONGS TO WP: WP3 -­‐ Model-­‐driven Implementers and Integrators WP LEAD: UPM-­‐ASLab WP PARTICIPANTS: UPM-­‐ASLab ID # WP Orig. Date Actual Date REMARKS DATA 12 3 (M19) (M25) HUMANOBS Reasoning Process
V1.2
Table of Content
1 INTRODUCTION
3 1.1 TESTBED SCENARIO
1.2 REFERENCE MATERIAL
3 4 2 THE REASONING PROCESS
5 2.1 THE REASONING PROCESS APPLIED TO THE BASE SCENARIO
7 3 THE IMPLEMENTATION PROCESS. USING KNOWLEDGE
9 3.1 REACTION SYSTEM
3.2 HIGH LEVEL REASONING
3.3 SCHEMA
3.3.1 SCHEMA DEFINITION
3.3.2 SCHEMA IMPLEMENTATION
10 15 18 18 19 4 THE INTEGRATION PROCESS. REVISING & INJECTING
KNOWLEDGE
20 4.1 PROGRESS MONITORING
4.2 MODEL REVISION
20 24 5 REFERENCES
27 6 GLOSSARY
28 HUMANOBS Reasoning Process
V1.2
1 Introduction
This document describes the reasoning process (implementation & integration) in the HUMANOBS
Project. The main goal of the project is to develop a system that is able to learn socio-communicative skills
by observation. This observation is defined as a behaviour itself and it is described in the same fashion. In
fact, every action the system can take is considered a behaviour and is modelled accordingly.
Following the project’s goal, HUMANOBS architecture is model-based, and model-driven. It allows having
two processes in parallel, one devoted to learning and the other one devoted to acting. Thus, the
developed system will be running and interacting with the environment while learning new behaviours (as
models) and updating and improving the existing ones.
In this document both processes are thoroughly described. The first one will conform the Learning Cycle
and the second one the Operation Cycle. The Learning Cycle is focused on building new models and
updating old ones and the performing cycle is focused on using these models. The reasoning process
encompasses not only the use of the models but also the process of improving these models and
requesting new ones. The following sections explain both contributions to the reasoning process. Model
acquisition is briefly depicted but there is no specific section describing it in detail.
In order to illustrate the reasoning process a base scenario is defined next. This scenario will be used
throughout the document as a leitmotif to ease the understanding of the different explanations.
1.1
TESTBED
SCENARIO
This scenario is based on the classic Pong videogame, and will further referred to as the Pong game.
There is a board and two paddles, each one at one end of the court. Each player handles one of the
paddles. The main goal of the game is to score more points than the opponent in a given time period. A
player scores when a ball passes the opponent’s paddle. To make things more complex, in our testbed,
there can be more than one ball at a time. Balls are of two different colours, and depending on the colour
the scoring is different. The system, implemented according to the HUMANOBS architecture, will take the
role of one of the players. The other role will be performed by a human player.
HUMANOBS Reasoning Process
V1.2
Base scenario specification:
I/O data available
•
Ball position and velocity (vectors) at every sampling time (for every ball)
•
Paddle position at every sampling time (for each paddle)
•
Scoreboard at every sampling time (points of both players)
Player’s controls (available actions for the system and its opponent):
•
Self paddle position increment: up, down or rest (if no action).
Game characteristics
•
White ball scores 1 points
•
Red ball scores 3 points
•
Ball speed is constant
•
Paddles have a width
•
Walls that define the court are fixed
•
Balls rebounds on walls with perfect reflection
•
The duration of the game is known and fixed.
•
Paddle position increment a fixed amount on each keyboard stroke
Goal
•
Score more points than the other player in the game time
Masterplan (initial knowledge)
•
Ball position model (next ball position is the sum of the current one plus the current velocity
vector)
•
How to put self’s paddle in a certain coordinate: by commanding up or down.
•
Inverse model for scoring more points than the opponent:
o Score
o Avoid being scored
Knowledge to be learned
•
When scoring happens
•
Balls bounce on walls and paddles
•
Differently coloured balls score different
•
Scoring means being closer to the goal (and as a consequence an strategy to score is also to be
learned)
•
Avoiding being scored means being closer to the goal (and as a consequence an strategy to
avoid being scored is also to be learned).
1.2
REFERENCE MATERIAL
The remaining content of this document uses concepts explained in detail in other documents. In order to
follow the next chapters some reference material should be read in advance. Following there is a list of the
main reference documents:
•
Replicode language specification v1.1a. Available at:
https://projects.humanobs.org/projects/language/documents
•
HUMANOBS Architecture v1.0
https://projects.humanobs.org/projects/language/documents
•
Model Proposer V1.0 – available at:
https://projects.humanobs.org/projects/wp2/documents
•
Pezzulo, G. and Calvi, G. Schema-based design and the AKIRA Schema Language: An
Overview. In M.V. Butz and O. Sigaud, G. Pezzulo, G. Baldassarre (Eds.), Anticipatory Behavior
in Adaptive Learning Systems: Advances in Anticipatory Processing. Springer, LNAI 4520. 2007
•
Sanz, R. An Integrated Control Model of Consciousness. Proceedings of the conference Toward
a Science of Consciousness. 2002. HUMANOBS Reasoning Process
V1.2
2 The reasoning process
HUMANOBS meta-architectural approach is based on the exploitation of explicit executable models to
drive all system action. There are three main HUMANOBS architectural elements: 1) one devoted to
generate models to explain the observed reality, this is the Model Acquisition module; 2) one, the agent
controller, devoted to perform as a real human, this is the Reaction System module, which is in charge of
computing the proper action; and, finally, 3) one devoted to increase the performance of the system by
interacting with the previous elements to enhance models, this is the Model Revision module.
Figure 1 shows a detailed description of the architectural elements and how the different elements share
information. There are two cycles that run in parallel: the Learning Cycle and the Operation Cycle
(explicitly shown in Figure 2).
The overall learning cycle is as follows: data observed from the external world (without any interaction with
the system) go into an Attention Control module, which is in charge of filtering data in order to select only
the relevant data for the system in the pursuit of its specified goals. Some information (from the
Masterplan) is initially included to allow for the first selection.
Filtered data (called Area of Interest) are fed to the Model Acquisition module, whose main function is to
correlate the data in order to obtain a (reusable) model that explains some behaviour related to some
specific goal. The Model Acquisition is composed of three elements: the event detector, the correlator and
the model builder. The event detector identifies a subset of the Area of Interest (filtered data) that triggered
the production of useful data. This subset is used by the next element that correlates the data. This
correlated data constitutes a model. The final element of this module translates the obtained correlation
into a Replicode format model. These models go to the integration module that injects them into the
(running) Reaction System module
The overall Operation cycle is as follows: sensory input data from the environment is fed to another
instance of the Attention Control module. This instance filters the data, once filtered, they are inputs to the
Reaction System module. Notice that there is a distinction between observation (and Learning) and
Operation (exploiting the acquired knowledge). These two functionalities will be run in parallel.
The Reaction System is the module in charge of producing the desired system’s behaviour in the domain.
It uses the already filtered data and produces actions (system behaviour) to achieve the system’s specified
goals. The basic control structure in this module is the schema. Schemas are composed of interacting
inverse and forward models that allow the generation of the system output action to be executed on the
external world and the prediction of its result on the world respectively. The Reaction System allows to run
the schemas in simulation mode, in order to control this mode, there is an additional element called
Simulation Control. This element is in charge of setting the parameters of the simulation. Besides, there is
another element that interacts with the Reaction System: the Model Selection, which is in charge of
selecting the proper action when competing models are available to fulfil the same goal.
HUMANOBS Reasoning Process
V1.2
Reaction System’s performance has to be measured and evaluated. The Progress Monitoring is in charge
of that measurement. This module receives information from the Reaction System (RS) regarding its
current state and goals. It uses this information together with the (initially provided) domain metrics to
generate measurements of the RS performance. Besides, the Progress Monitoring, identifies relevant
information for a given goal and sends it to the Attention Control module.
Finally, the Model Revision module is in charge of analysing and evaluating the RS performance. As a
consequence of the analysis and evaluation it updates the existing models or asks the Model Acquisition
module for new ones.
filtered data
Attention
Control
(AC)
Observation
subset of
filtered data
correlated data
models
Event
Detector
C
MB
Model
Builder
(MB)
Correlator (C)
filtering
criteria
(goals)
Simulated Schema
(dashed box)
Model Acquisition (MA)
C
(learning cycle
data)
LEGEND
Schema
Replicode
models
Inverse model
Forward model
Replicode format data
Undefined format data
Non Replicode construct
Instantiator
Replicode construct
(pgm, fmd, imd, etc.)
goals/models needed
change, delete models
Integration
ENVIRONMENT
Model injection /
Model deletion / Model change
Model Revision
(MR)
Performance indexes
Progress
Monitoring
States and goals
(PM)
Reaction System (RS)
simulation
parameters
Simulation
Control
Model
Selection
Interaction
Attention
Control
(AC)
filtering criteria
(relevant info)
selection
criteria
(performance cycle
data)
filtered data
Output action
Figure 1 Main interaction flows between HUMANOBS architectural elements
As it is indicated in the legend in the figure, the red boxes mean Replicode constructs (programs, models,
etc.), the red arrows mean Replicode format information. The black boxes mean non-Replicode constructs
(they are built using any other programming language), and the black arrows mean undefined format data.
Notice that eventually some non-Replicode parts can be embedded using Replicode.
HUMANOBS Reasoning Process
V1.2
filtered data
Attention
Control
(AC)
Observation
subset of
filtered data
correlated data
models
Simulated Schema
(dashed box)
Model Acquisition (MA)
Event
Detector
C
C
MB
Model
Builder
(MB)
Correlator (C)
(learning cycle
data)
filtering
criteria
(goals)
LEGEND
Schema
Replicode
models
Inverse model
Forward model
Replicode format data
Undefined format data
Non Replicode construct
Instantiator
Replicode construct
(pgm, fmd, imd, etc.)
goals/models needed
LEARNING
CYCLE
change, delete models
ENVIRONMENT
Model Revision
(MR)
Performance indexes
Progress
Monitoring
States and goals
(PM)
Integration
Model injection /
Model deletion / Model change
Reaction System (RS)
simulation
parameters
Simulation
Control
Model
Selection
Interaction
Attention
Control
(AC)
filtering criteria
(relevant info)
(performance cycle
data)
filtered data
OPERATION
CYCLE
selection
criteria
Output action
Figure 2. Learning and operating cycles running in parallel in the HUMANOBS implementation & integration processes
2.1
THE REASONING PROCESS
APPLIED TO THE BASE SCENARIO
Following we will briefly sketch how the Humanobs Architecture would work in our testbed scenario.
Suppose the initial Masterplan given in 1.1, and the following initial considerations: the system is playing
against a human and the initial Attention Control (AC) modules for both the learning and operation cycles
create an Area of Interest including balls’ and paddles’ positions. Given the initial Masterplan, the progress
monitoring will assign a null performance to the subgoals of scoring and not getting scored present in the
Reaction System, since it does not contain any model for them. The Model Revision would take this
information and instantiate one Correlator to seek for a model of each subgoal, and would command the
AC to extend the AoI to include scoring data.
The Correlator charged with the subgoal of not getting scored would eventually come up with a correlation
between the paddle intercepting the ball and that player not being scored. The Model Builder would
convert that into a Replicode model that would be injected in the Reaction System. Once in execution, this
model would produce the goal to put the paddle in the ball’s coordinate. This goal would be achieved by a
model already present in the Masterplan.
The Progress Monitoring would assign good performance index to all the schemas and the overall system,
so the Model Revision keeps commanding the Model Acquisition to seek for a model of scoring (which is
hard to find), while there is only one ball in the game. As soon as there are two or more at a time, the
model for not being scored starts producing conflicting goals, since the system cannot command the
paddle to move to the different balls’ coordinates at the same time. Performance indexes would degrade
and the Model Revision would command the Model Acquisition to find a better model for not being scored,
which could probably be found by considering only the closest ball for interception.
Sometimes different actions for the same goal can be proposed, for example if two balls (of different)
colours are approaching the Reaction System paddle, one above the paddle and the other one below the
paddle two (conflicting) actions are proposed (moving up and down). This situation is detected by the
Model Revision, and the Model Selection would take the decision about which action to implement.
HUMANOBS Reasoning Process
V1.2
In the case of being interested of predicting the behaviour ahead in time, then the simulation control
element would start the simulation. The Model Revision is the module in charge of deciding if running the
simulation mode or not.
The remaining part of the document will explain in detail the described reasoning process. It will proceed
by specifying module by module, establishing the needed requirements and commenting on how it is
deployed in the system. It is divided into two main sections, the first one is related to the implementation
process. The second one describes the integration process. These two processes –“implementation” and
“integration”- will be identified and explained beforehand because they serve to structure the document.
These two processes are orthogonal as indicated in Figure 3.
The implementation consists on the construction of the models and its use by the Reaction System, which
is the goal-driven schema assembly hierarchy. There is a goal dependency between the layers of this
hierarchy.
The integration, on the other hand, consists on identifying the need of new models and controlling the
mechanism to integrate them with the existing ones. It is made by the Model Revision, which controls the
Reaction System according to the schema control dimension (integrated cognitive control approach) and
implemented through the integration module.
Figure 3: System architecture showing the specification to implementation (bottom up hierarchy) and integration
(integrated cognitive control -ICC).
HUMANOBS Reasoning Process
V1.2
3 The implementation Process.
Using knowledge
This section is dedicated to explain the specification to implementation process. It is related to the process
of relating models and goals in different levels of abstraction. This will be a goal based hierarchy of active
elements (schemas) conforming to what we call in the HUMANOBS Architecture the Reaction System. The
process of generating an implementation is designed as an inference process, performing the backward
chaining of inverse models: the specification of a behaviour defines origin and target states of the system
and the environment. These states constitute inputs for the inverse models controlling the transitions along
the chain leading to the achievement of the target state.
Since the architecture is expanding, incomplete implementations shall be expected in the case that the
components (models) necessary for a new skill are not developed yet (as they have to be learnt). Due to
the modular nature of the architecture (thorough the use of schemas), the implementation process will be
allowed to produce implementations with partially undefined components for which only a partial model
would be computed, i.e. a model expressing its effects on the system (forward models), and the states it is
supposed to have an impact on.
In what follows, a description of the reaction system will clarify its functionality and components. After that,
a description of the main component of both the reaction system, and of the reasoning process, the
schema, will be provided. Finally, the reasoning process procedure (how to use the forward and inverse
models) will be explained together with how it is implemented using the former described elements. HUMANOBS Reasoning Process
V1.2
3.1
REACTION SYSTEM
This component of the architecture is in charge of producing the best available action (regarding the
existing models) according to the specified goals. It is the decision system. It will hold one/several model/s
of the environment (real world) to interact with. These models are forward and backward models
embedded in schemas. It ought to be noted that the Reaction System is not exclusively reactive and it may
exhibit anticipatory behaviour as well, as it is indicated in the requirements specifications described next.
REQUIREMENTS
OR
Provide a proper
action acoording to
goals
R2
R1
R3
predict outcomes of
actions
behave reactively
R4
provide
measurements of
performance
select between
actions
get current
data
A1
simulate
system's
evolution
A1
identify
different
actions
A1
compute an
action
A2
control the
simulation
A2
select
between
actions
A2
send the
action to the
world
A3
TS
I
establish a
performance
metric
send metrics,
goals and
states
TS
A1
A2
Technical solution
OR: overall requirement
Ex
Ri: requirement i
Ai: activity i
I
Ex
Implementation details
Example
Figure 4: Requirements decomposition for the Reaction System module.
Figure 4 shows the functional decomposition of the Reaction System. Next, a description of the specific
requirements needed to fulfil the overall one is presented. Each requirement is further refined in several
activities that are described in detail. Overall requirement
The reaction system shall:
•
Provide a proper action to the perceived data from the environment according to its goals ( shall
imitate humane behaviour in a controlled environment)
Specific Requirements
The reaction system shall:
R1. Behave reactively (as indicated in the compute an action activity)
R2. Predict the outcomes of its actions (exhibit anticipatory behaviour)
R3. Select between different courses of action
R4. Provide measurements of its performance (metrics to evaluate how far from the error it is)
R5. Provide commands for the actuation devices
HUMANOBS Reasoning Process
V1.2
R1. Requirement: Behave reactively
Activities:
A1. Get current data from the external world
A2. Compute an action
A3. Send the action to the external world
A1. Activity: Get data from the external world
Data from the external world will have the Replicode format (some other element of the
architecture will pre-process the data from the sensor and will translate them to Replicode). The
reaction system will provide an interface to receive these data. Data coded as Replicode will be
markers –(mks). The interface will be a Replicode program (pgm), or set of programs. The
received data will be relevant data filtered by the Attention Control module.
Technical solution
Not applicable. If the translation from the sensor data to the replicode language is not
done elsewhere then an API should be developed.
Implementation
Data will be Replicode markers (mks). These markers will be accessed by all the existing
programs to check if they match their input patterns.
Example
In the pong game input data will be ball(s) position(s), ball velocity (vector), ball(s)
colour(s), paddle A position, paddle B position, score player A and score player B.
A2. Activity: Compute an action
The reaction system will use schemas (defined in the following section) to compute an action.
Basically, a schema is a feedback model-based controller (it complies with the internal model
control (IMC) representation). It has got two main elements, the inverse and the forward model.
The forward model provides an estimation of the system state given a set of inputs. It performs
forward chaining. Given a goal and the estimation of the system’s state, the inverse model
computes an action following backward chaining reasoning. The inverse model element of the
schema outputs the (control) action to the actual system (external world) as well as to the forward
model element. The forward model is executed and its output is compared with that of the real
world (to account for disturbances and model mismatch). The result of the comparison is the input
to the inverse model that computes (related to a goal) the (ideally perfect) control action to be
applied to the external world.
Schemas are organized hierarchically. The hierarchy will follow a functional decomposition. The
overall (more generic) goal will be at the top of the hierarchy and will be refined going down in the
hierarchy and decomposed in subgoals. Schemas (the inverse model element part) output will be
a direct actuation to the external world or a goal for another schema. Schemas inputs will be
(most of the time) data (in Replicode form) from the external world or estimations or predictions
(output of the forward model element) from other schemas.
Forward models as well as inverse models are implemented using specific Replicode constructs
(fmd,imd), used in reduction groups (defined later in this chapter). This activity is related to the
reactive behaviour of the system, anticipatory behaviour is included in the following requirement.
Technical solution
The technical solution to decide an action is a “hierarchical internal model control”
approach. Each controller will be deployed as a schema.
Implementation
Schemas will be coded using reduction groups (rgrp), models (forward and inverse) will
be coded as a special type of programs (fmd,imd), the comparison (between actual
state and estimated state) will be implemented as a program (pgm)
Example
A schema in the pong game will be the one responsible to compute the next required
position of the paddle. It will take as inputs the ball position, velocity and colour and the
paddle position. The forward model will estimate the difference (in the vertical axis)
between the ball and the paddle (the same input goes into the actual system and its
output is compared with the estimated state). The inverse model will take as input this
difference, and the velocity of the ball and the goal (make difference equal to zero) and
HUMANOBS Reasoning Process
V1.2
will output the position of the paddle as a goal for other schemas downwards the
hierarchy.
A3. Activity: Send the action to the external world
The reaction system needs an interface translate the output of the lowest schemas in the
hierarchy, which are Replicode commands, to signals for the actuation devices that operate on
the environment.
Technical solution
mBrane I/O modules will be responsible for translating command generated by the RS
into device executable actions
Implementation
Each Replicode command is uniquely identified (using an opcode). This identification is
used to select the appropriate response. Additionally, each entity part of the command is
also uniquely identified (using a number), and these identifiers are kept consistent
between the two sides of the interaction (devices and executive). Given these rules, the
implementation consists of a set of parsers triggered by opcodes. Hand-coded.
Example
In the pong game output data (actions) will typically be direction of paddle position
increment in the next game time step (up, down or none if no command).
R2. Requirement: Predict the outcomes of its actions
Activities:
A1. Simulate the evolution of the system in response to actions
A2. Control of the simulation
A1. Activity: Simulate the evolution of the system
The main purpose of the simulation is to have a rigorous ground for anticipatory behaviour, i.e. it
adds predictive capabilities to the system. A simulation of the consequences of the current action
in the future could lead to a different control action than the one devised by the reactive behaviour
module (if this initial action is not good enough, the effects of the alternative actions have to be
simulated (to check if they are expected to perform better) which means that a number (and this
number can be huge) of simulations may be needed. At the end a combination of both (reactive
and anticipatory) will be implemented like in feedforward/feedback control configurations.
When running in simulation mode there is no feedback from the external world. Initial actions are
applied to the model of the system and the new calculated actions are applied again on the
system itself (as if they had been reached) in order to advance in time. Notice that during a
simulation all the disturbances are considered constant or having some a priori assumed
dynamics (as there is no feedback from the external world.)
Technical solution
Initially the technical solution will be the same as the one applied for the reactive
behaviour. The system will be operating exactly the same although the data is not
coming from the external world (but at the initial time).
Implementation
The implementation will use Replicode constructs that allow the system to “know” that it
is running in simulation mode (which will be executed in parallel to the reactive system).
These constructs are: hypotheses, assumptions and predictions markers (for instance,
when a pgm receives at least one input amongst its input set that is a simulation marker,
then the results are tagged with simulation markers).
Example
Let’s suppose the Pong game with two balls at the same time. One ball is white and one
is red (both have different scoring values, the white one 1 point and the red one 3
points). Let’s suppose that the white ball is closer (in y coordinate) to the paddle (with a
lower y value) and that the red ball has a higher y coordinate. In this case the reactive
behaviour would command a new paddle position (move down). The simulation model
will take as inputs balls positions and velocities at the initial time (to) as well as the
paddle positions and current scores. After that time, an action is computed (next position
HUMANOBS Reasoning Process
V1.2
of the paddle) and it is fed again as input to the simulation cycle (along with the predicted
positions and velocities) at time to+1. This can be useful as the simulation may show that
it is better to move up (even at the cost of the white ball scoring) in order to be able to
reach the red ball (which wouldn’t be possible if moving down first).
A2. Activity: Control the simulation
The extent of the simulation (in depth and in time), as well as the number of simulations, has to be
fixed according to criteria concerning the amount of resources needed, the available ones and the
expected utility –in goals’ terms- of doing the simulation. Likely the in-depth extent will be most of
the time the whole system, as it will be very interrelated (meaning simulating from the lower level
of the hierarchy to the top one). The in-time extent will depend upon the time response
requirements or upon if a stationary state has been already reached. The simulation control
module (it will be implemented as a program, or set of, in Replicode) will receive the criteria to
start/stop the simulation from the Progress Monitoring. It will also receive the parameters needed
to run the simulation (which schemas will be simulated, the simulation time span, etc.).
Technical solution
The system will run basically in pure reactive mode, if the results are not acceptable then
an activity will be triggered- a message will be issued (by the Model Revision) in order to
start the simulation. Time constraints are problem dependent and shall be provided
before executing the system. When the available time is reached a new message is
issued to stop the simulation. The computed action is sent with additional information
regarding if the simulation was completed in time.
Implementation
A group will be created for each simulation. All the schemas running a simulation are
projected into this group. The simulation control will activate/deactivate the group (and
consequently the schemas) in order to start/stop the simulation. Simulation parameters
will be data available in that group (provided by a program) so all the schemas attached
to this group will have access to the parameters.
Example
In the former case of the pong game, a simulation of the initial action (moving down the
paddle) would lead to a final result of 3 points (because the red ball would have scored).
The simulation of an alternative action (moving up the paddle) would lead to a better
result (1 points, white ball scoring). In this case the simulation would run in time until all
the balls have disappeared (stationary state) unless the time required to run the
simulation would be greater than the one required for the next paddle response
(sampling time).
R3. Requirement: Select between different courses of action
Activities:
A1. Identify different actions for the same goal
A2. Select between conflicting actions A1. Activity: Identify different actions for the same goal
The running system will have a set of models implemented and active. There can be different
models for the same goal, these models are all activated and executed in parallel. This means
that after receiving new data, the system can eventually propose more than one action (for the
same goal and external data). The reaction system will provide a mechanism to detect if some of
the different actions proposed correspond to the same goal. Mainly, it will be done using
backtracking from the commands (final actions) until a final common goal is reached. Another
source of different actions for the same goal comes from the anticipatory (simulation) behaviour.
Technical solution
Using backtracking, i.e. starting from the final actions (those to be sent to the devices)
their related goals are identified. These goals come from actions of upper (in the
hierarchy) schemas. These actions are identified and subsequently their goals, and so
on. This procedure ends when a final goal is reached. If this goal is the same for the
different actions then those actions are marked as conflictive. The technical solution to
HUMANOBS Reasoning Process
V1.2
identify the anticipatory behaviour is not necessary as this action will be always marked
as such (conflictive) because there will always be the reactive counterpart.
Implementation
The backtracking will be implemented as a set of programs, it will use the notification
information that is available after a Replicode program or model is reduced (which
means that the data match the required inputs and the program or model is then
executed).
Example
In the pong game we could have different schemas to address the goal of minimising
being scored: one based on simply intercepting the closer ball, and other that takes into
account the different balls present at that moment and optimises the catching order so as
to minimise the scoring for a certain horizon. These two schemas could eventually
produce opposite movement commands.
A2. Activity: Select between conflicting actions
The reaction system will decide on which action to command to the actual device. The reaction
system will have some established criteria to make the decision (these may be success rate,
resources consumption, fast execution, etc.). This criteria come from the Model Revision.
Technical solution
The system will be initially endowed with some predefined mechanisms for conflict
resolution. These will be adapted by the metasystem according to the attainment of
goals using the performance measures provided in the domain ontology
Implementation
The system will be initially endowed with some predefined mechanisms for conflict
resolution. The Masterplan contains predefined models that are built like the attention
control models: these models are hand-crafted to identify (a) present states, (b) expected
outcomes and, (c) costs of execution. They take as input the simulation results of the
conflicting goals (more accurately: the simulation results of each goal being a member of
the set of conflicting goals). Notice also that the patterns on expected outcomes may
include (whe appropriate) a priority. As for the present version, said models control the
resilience of the conflicting goals, leaving only one alive. In case of a tie, one goal is
selected randomly.
Example
Let’s take the Pong example with the existence of two schemas for minimising being
scored: one that moves the paddle towards the closer ball, and another one that takes
into account the different balls present at that moment and optimises the catching order
so as to minimise the scoring for a certain horizon. The Reaction System will decide
which one to select by considering the simulation results of both.
R4. Requirement: Provide measurements of its performance
Activities:
A1. Provide a performance metric associated to each goal (or goal type).
A2. Send performance metrics, current states and the desired states (variables/signals/messages) to the
progress monitoring.
A1. Activity: Establish a performance metric according to each type of goal
The reaction system will provide a (measurable, numeric) metric for each goal of the system.
Technical solution
Not applicable. These metrics will be provided by the domain ontology.
Implementation
Metrics are “a priori” knowledge and as such should be included in the Masterplan (this
the –minimum- essential information needed to drive the whole system). They should be
“attached” to each goal as an attribute, using markers (mks).
Example
In the Pong game some metrics are quite straightforward. Related to the goal of “paddle
and ball have the same y coordinate”, the metric will be the scalar difference of the
vertical position of both objects.
HUMANOBS Reasoning Process
V1.2
A2. Activity: Send performance metrics, current states and goals (desired states) to the
progress monitoring.
The reaction system will make available, for each schema the current goal (desired state), the
metrics associated to that type of goal, and the current state.
Technical solution
Send a signal with the requested information using Replicode.
Implementation
These data can be sent to a group that will receive the different schema’s performance.
Any other module that needs this information (mainly the progress monitoring) will be
projected onto that group to have access to the data.
Example
3.2
HIGH LEVEL REASONING
High-level logical reasoning is performed through forward and backward chaining. Following is a
description of the Replicode language components available to implement the reasoning process.
Predicates
Facts Facts are objects that points to other objects and indicate a timestamp (the time of their occurrence),
a frequency that indicates how often the fact has been injected (in [0,1]) and a confidence value (also in
[0,1]) indicating how reliable the fact is. Replicode also provides constructs to indicate the absence of a
fact (|fact, with the same members as for a fact).
Variable Objects Such objects are objects that point to other objects: this pointer can be reassigned
dynamically during chaining: the variable is said to become bound to an actual value, a pointer to an
object.
Assumptions Such objects, as many other predicates, point to other objects and express that they result
from an inference, as opposed to established facts, predictions, goals, etc.
They are encoded as follows:
(mk.asmp a-fact a-source confidence-value)
where a-fact is a pointer to a fact which the executive has assumed the existence, a-source is the
component of the system which has performed the assumption (it can be either a model or a composite
state, see below), and confidence-value is an indication of the reliability of the assumption.
Goals Goals are objects either produced by programs or by the executive as the result of backward
chaining. A goal is the specification of a target state, in the form of a reference to a fact.
The syntax is:
(mk.goal a-fact an-actor)
where a-fact is a pointer to the target fact and an-actor is a pointer to the system that pursues the goal (i.e.
self or any other active entity in the world.
Predictions Predictions are objects either produced by programs or by the executive as the result of
forward chaining.
The syntax is:
(mk.pred a-fact confidence-value)
where a-fact is a pointer to the predicted fact and confidence-value is an indication of the reliability of the
prediction. The predicted time of the occurrence of the fact is the timestamp member of the fact.
Hypotheses Hypotheses are objects that indicate to the executive that they shall be processed in
simulation runs.
The syntax is:
(mk.hyp a-fact)
where a-fact is the hypothetical fact.
where the members are defined as for facts.
Simulation Results Such objects indicate that the predicate they refer to has been computed as the result
of a simulation, following the injection of hypotheses.
The syntax is:
HUMANOBS Reasoning Process
V1.2
(mk.sim a-fact a-source)
where a-fact is the result of the simulation and a-source is either the model or the composite state that
produced it.
High-level active predicates
Composite States A composite state is a structure meant to encapsulate a set of patterns – for example,
the position of the hand of a robot and the fact that the hand is actually a hand, belonging to a particular
entity (the robot). A composite state is an active object, i.e. it performs like a program that matches the
occurrence of all the patterns it contains and outputs an instantiation of itself, that is a version of itself
where some of the variables are bound to actual values. A composite state is defined by the following
construct:
(cst objects output_groups time_scope)
where objects is a set of abstracted objects (objects that may contain variables), out_groups is a set of
groups where the executive is to inject the productions and time_scope is the time window whose
semantics are exactly the same as for programs – inputs are matched together during the time window
called time_scope.
Models Models are structures that contains a directed pair of patterns – essentially, F(x,y,...) → G(x,y,...).
Models are also active objects, that is, they catch input objects and attempt to match them against their
left-side pattern, bind some of its variables and produce an instance of their right-side pattern. The
construct is:
(mdl objects output_groups time_scope)
with the same members as for composite states. When read from left to right, models realize the function
of forward models, when read from right to left, the function of inverse models (see reduction below).
Composite State Instances Composite state instances are specific constructs that indicate that a
composite state has been instantiated with some values, i.e. that it has matched all of its patterns. The
construct is:
(csi c-state arguments)
where c-state is the composite state that has matched its patterns and arguments is the set of values
assigned to the composite state's variables.
Model instances Model instances are to models what composite state instances are to composite states.
The construct is:
(imdl a-model arguments)
where a-model is the model that has been instantiated with the values specified in arguments.
A model is instantiated as the result of matching its left-side pattern.
Construction Composite states and models are either constructed by hand or produced by the Correlator.
Regardless of their origin, composite states and models contains objects in the form of facts. Models can
be interpreted as belonging to two types:
A – predictors: (fact F(x,y, …) ) → (fact G(x,y, …)) which reads “F(x,y, …) entails G(x,y, ...)”. If G is an
instance of a model M, this reads “ F(x,y, …) entails the success of M(x,y, ...)”
B – requirements: (|fact F(x,y, …) ) → (|fact G(x,y, …)) which reads “not F(x,y, …) entails not G(x,y, ...)”. If
G is an instance of a model M this reads “without F(x,y, …) M(x,y, ..) will fail”.
Reduction Composite states and models produce objects (facts) from the matching of their patterns.
Replicode performs inferences using models bi-directionally, i.e. matches either left-side patterns and
produces predictions (forward chaining), or matches the right-side patterns and produces goals (backward
chaining). For composite states, chaining is slightly different (as there is no directionality encode in
composite states): when all patterns of a composite state are matched within the given time scope, an
instance is produced (forward chaining). Backward chaining: (a) when a goal matches one pattern of the
composite state, the executive produces as many goals targeting each the remaining patterns; (b) when a
goal targets an instance of a composite state then one goal is generated for each of the patterns (if these
are achieved, forward chaining will instantiate the composite state thus fulfilling the initial goal). Each
reduction triggers the production of a runtime execution notification (mk.rdx) as for programs. Each
production, be it a prediction, a goal or an assumption is monitored, i.e. the executive keeps track of any
subsequent object that matches (on time) the production and reports the success or failure. We give below
the reduction rules implemented in the executive:
I – Composite States For a composite state CS containing the patterns: A(x,y, …), B(x,y, …), C(x,y, …)
IF – forward chaining:
IF1 – receiving a(xa,ya, ...) matching A(x,y, …) spawns an overlay (defined as for programs), this means
all combinations of inputs are scanned for during the specified time scope; when all patterns are matched,
an instance of the composite state is produced and injected in the specified output groups. The instance
lists all the values to which the variables where bound during the matching process.
HUMANOBS Reasoning Process
V1.2
IB – backward chaining:
IB1 – receiving a goal targeting an instance of CS with a set of arguments [xa,ya, …].
IB1a – there exist at least one active requirement (as defined above): then a goal is produced (and
injected in the ouptut groups) that targets the model encoding the requirement; this goals actually targets
an instance of said model to which the incoming set of arguments ([xa,ya, ...]) is passed – notice that some
of these arguments may be unbound. If this goal is satisfied (this is detected by monitors managed by the
executive), an instance of CS is produced and contains another set of arguments – the initial set of
arguments, some of which may have been bound by the model that responded to the requirement. At this
point, the executive issues another set of goals, each targeting one of the patterns found in CS, bound to
the latest set of arguments. If these goals are matched, then the forward chaining will eventually produce
the instance that was targeted by the initial goal.
IB1b – there is no active requirement: then the executive proceeds by issuing one goal per pattern in CS
(as in the last step of rule IB1a).
IB2 – receiving a goal targeting one of the patterns of CS – say, A(x,y, …). The variables of CS are bound
to the values provided by the goal and one goal is issued for each of the remaining patterns in CS (in our
case, for B and C).
IB3 – receiving a goal targeting the negation of on of the patterns of CS. Then the executive produces a
goal targeting the negation of an instance of CS (with the values provided by the goal).
IB4 – receiving a goal targeting the negation of an instance of CS.
IB4a – there exist at least one active requirement. Then a goal is produced targeting the negation of an
instance of the model that encodes the requirement. If this goal is satisfied, then the executive produces
one goal for the negation of each of the patterns of CS.
II – Models For a model M containing the patterns A(x,y, …) → B(x,y, ...)
IIF – forward chaining
IIF1 – receiving a(xa,ya, ...) matching A(x,y, …) produces a prediction targeting B(xa,ya, …) - injected as
usual in the specified output groups.
IIB – backward chaining
IIB1 – receiving a goal targeting b(xb,yb, ...)
IIB1a – there exist at least one active requirement: then a goal targeting an instance of M with the
incoming arguments is issued. If this goal is satisfied, then a goal targeting a(xb,yb, ..) is injected.
IIB1b – there is no active requirement: then a goal targeting a(xb,yb, …) is produced.
IIB3 – receiving a goal targeting the negation of b(xb,yb, …)
IIB3a - there exist at least one active requirement on M. Then a goal targeting an instantiation of M is
issued. If this goal is satisfied then the executive produces an assumption on the negation of a(xb,yb,
…).
IIB3b - there is no active requirement: then a goal targeting the negation of a(xb,yb, …) is produced.
III General Rules
IIIA – All goals and predictions are monitored. These are monitored by the models/composite states that
produced them. This means that shall these active constructs die or become inactive, the monitoring
will cease or be suspended – respectively. A prediction/goal is monitored by the executive as follows:
if an object matches the prediction/goal on time, a success object (fact (mk.success theprediction-or-goal)) is injected in the output groups, and the prediction/goal is deleted;
if by the predicted/target time, no object has matched the prediction/goal, a failure object is
injected (|fact (mk.success the-prediction-or-goal)) and the prediction/goal is deleted.
IIIB – In case a goal or a prediction fails, the negation of the targeted/predicted fact is injected (N.B.:
negating the negation of a fact gives a fact).
IIIC – Each time a match occurs, a reduction marker is injected (as for programs).
IIID – hypothesis, simulation, assumption and prediction markers are retained during chaining, i.e. goals
and predictions are tagged with the same markers as the inputs that triggered their production (as for
programs).
IIIE – a goal targeting a command to an I/O device triggers the execution of said command by said device.
This is performed only once, i.e. the first time the command object gets saliency (as defined by the
Replicode language). Once executed, the command object remains in the memory until its time to live
has expired (the latter is an argument of the command).
Current Limitations Models and composite states do not encode functional correlations between
variables yet. Planned future work will be dedicated during the year three of the project to implement
reversible functions (fun v0 v1 v0-to-v1 v1-to-v0) where v0 and v1 and v0-to-v1 and v1-to-v0 are
algorithms that bind v1 with the value v0-to-v1(v0), and reciprocally, v0 with v1-to-v0(v1). As of the year
two of the project, variables are compared using equality checks and time ordering in case of temporal
variables.
HUMANOBS Reasoning Process
V1.2
3.3
SCHEMA
The Reaction System has been described so far in general terms and in relation with specific
requirements. This module (the RS) contains the model/s of the system and uses it to output an action as
a response to some (external) data and according to some (established, given) goals. The main
component used in the Reaction System is the schema Alarcon 1994, Pezzulo 2007, Lyons 1989], in this
section it will be defined and an explanation on how it will be implemented and used will be provided.
3.3.1
Schema Definition
The schema is the basic control structure of the HUMANOBS foreground architecture. It is essentially an
internal-model based controller [Morari 1986]. It is composed by different elements being the forward and
the inverse model elements the main ones.
The forward model (fmd) takes as inputs the actual state of the world and predicts the target state at next
time. The prediction is compared with the actual value of the target state in order to have a measure of
how good the model is or an estimation of the disturbances to the system. The result of this comparison is,
together with the desired state (goal), the input to the inverse model (imd) which derives the proper action
to fulfil the goal.
SCHEMA
goal(set point)
FMD
IMD
+
+
fmd
imd
-
-
system
modeled
system
+
+
disturb
Figure 5: Schema structure and its relation with the actual system.
During the second year of the project, we have been using a simplified version of the schema above – a
schema without the comparator in the forward model. The functioning is as follows: the forward model
predicts the state of the controlled entity, and this is the input – along with the goal – of the inverse model.
The main difference with the fully featured schema is that the forward model does not adjust to
disturbances.
Figure 6: Simplified schema
HUMANOBS Reasoning Process
V1.2
3.3.2
Schema implementation
As mentioned earlier, Replicode models are executed forwards and backwards. The qualifiers “forward”
and “inverse” thus just describe an arrangement – pattern-matching-wise – of said models and their inputs:
shall the latter match the right-side, the model operates as an inverse model, a forward model otherwise. It
follows that the functional composition of a schema (a forward model that outputs data that matches the
input of an inverse model) is – implementation-wise – realized by two models. The general form of
implementation is as follows:
Let us assume a model M0: (fact F(x,y, …) t0) → (fact G(x,y, …) t1) and another model M1: (|fact (G(x,y,
…) t0) → (|fact iM0(x,y, …) t1). M1 is a requirement to the instantiation of M0. When an input comes in the
form of a goal targeting G(xg,yg, ...)), a subgoal targeting the instantiation of M0 is produced. According to
the reduction rules enforced by the Replicode executive, a goal targeting (fact (G(xg,yg, …) is produced as
the expression of the requirement. This goal may be fulfilled by the current state G(xc,yc, …),. thus
triggering the release of the goal F(xc,yc, …).
Schemas are naturally assembled constituting a hierarchical structure: this structure is determined at
runtime by the patern-matching affordances – in other terms, when a component is represented by a box
with one input A and one output B, and another sees its output C connected to the former's input A, this is
realized by having outputs from C matching the input A. There is no hard (or static) coupling in Replicode.
This means actually that since goals trigger the production of subgoals, models are coupled at runtime
depending on their matching abilities. This effectively forms a hierarchy since subgoals are issued on the
basis of functional dependencies (see for example reduction rules IB1a, IB4a and IIB1a presented above).
Notice also that since models are shared (overlaid reduction, as mentioned earlier), a model can be part of
several reduction hierarchies.
As mentioned earlier, future work planned for the year three of the project will implement invertible
functions, which will yield schema implementations according to the most elaborate version of the schema
(depicted in figure 5 above); however, the principle will remain the same: one single model coupled with a
requirement model will still implement a schema. HUMANOBS Reasoning Process
V1.2
4 The integration Process.
Revising & injecting
knowledge
The process of integrating new implementations in the existing architecture will be also designed as a
production process, working this time at a global scale.
As it is has been elaborated in the previous section the implementation process is in charge of taking
actions to achieve system’s goals (based on the current knowledge), and acquire new models. The former
is addressed by the Reaction System module, while the latter is addressed by the Model Acquisition
module. The integration process is in charge of deciding if new or better models are needed (based on the
current system’s performance) and inserting the new acquired models in the Reaction System without
disrupting its behaviour.
This integration process is related to the Integrated Cognitive Control, which is specifically addressed with
the Progress Monitoring and Model Revision modules of the architecture. This part explains how to "rank"
a model and how to ask for a better one (if model performance is below a threshold) to the Model
Acquisition module. Finally, it also explains how to ask for a new model (when a goal without model
attached is detected).
Although the Progress Monitoring and the Model Revision are described in the integration process
chapter, they also contribute to the implementation process, mainly acting on the Attention Control module
as it was shown in Figure 1 at the beginning of the document.
4.1
PROGRESS MONITORING
The Progress Monitoring (PM) module is charge of measuring the Reaction System’s performance. It will
measure (according to some provided metrics) the local and global performance of the system and it will
identify relevant information related to a goal (which is important in order to drive the Attention Control
module).
HUMANOBS Reasoning Process
V1.2
REQUIREMENTS
OR
Evaluate the global
performance
R1
Receive the metrics
and states
have the
proper
interface
A1
measure local
performance
R2
A1
develop
goal's local
performance
measure global
performance
develop global
system's
performance
R3
A1
identify relevant info
for each goal
identify
patterns that
trigger goals
R4
estimate resources
needed
A1
OR: overall requirement
establish a
WCET
comunicate
WCET
Ri: requirement i
R5
A1
A2
Ai: activity i
Figure 7: Requirements decomposition for the Progress Monitoring module.
In the above figure is shown the functional decomposition of the Progress Monitoring module. What follows
is a description of the specific requirements needed to fulfil the overall one is presented. Each requirement
is further refined in several activities that are described in detail. Overall requirement
The progress monitoring system shall:
•
Monitor the progress of the (reaction) system (towards the achievement of the goals)
Specific Requirements
The progress monitoring system shall:
R1. Receive the metrics, goals and states from the Reaction System
R2. Measure the local performance based on the states and metrics
R3. Measure the global performance based on the states and metrics
R4. Identify relevant information associated to each goal
R5. Estimate the resources (i.e. computation time) needed (for choosing an action)
R1.Requirement: Receive the metrics and states from the reaction
system.
Activities:
A1. To have the proper interface for the states and metrics
A1. Activity: To have the proper interface for the states and metrics
The progress monitoring system will have a program dedicated to receive the messages from the
Reaction system concerning metrics, states and goals.
Technical solution
The Progress Monitoring will have access to a signal from the Reaction System
containing, for every active schema, its current goal and associated metrics and its
current state.
Implementation
Depending on how the reaction system publishes these data. If they are placed in a
group then the progress monitoring (which will be a set of programs) will have a program
subscribed to that group in order to have access to the data.
Example
R2.Requirement: Measure the local performance based on the states and
metrics
Activities:
A1. Compute for each goal (schema) the actual performance (initially off-line and in further steps
in real time) i.e . monitor the progress (of performance) of each schema.
HUMANOBS Reasoning Process
V1.2
A1. Activity: Develop for each goal the actual performance
The progress monitoring system will build upon the information received a measure of the
performance of each schema. Basically, it will process the current and desired states with the
related metrics. The result will be an index (numerical) that will indicate the current performance
(these indexes are all commensurate with each other, i.e. they are all referred to the same
general scale, i.e. normalized). This performance index will be updated as new data arrive. The
performance evaluation will take into account the temporal evolution of the performance. This
element will also receive instantiated goals and for which there is no model developed, receiving
a null state as the current state, and therefore assigning a null performance. If a goal has a large
and stationary error then it will be marked with a zero index (indicating that the model is so bad
that a new model is needed, this evaluation is done by the Model Revision module). Initially we
will distinguish between non-existing models and very bad models (null vs. zero index). There are
several issues that will have to be taken into account, such as: the delay between the action and
its effect on the actual system, several schemas contributing to achieve a state (goal), being in
time (not only a model has to be good enough, it has to produce an output in time), time evolution
(not only the final value will be considered but the trend of the state, its time evolution).
Technical solution
Computing a difference (error) between the actual and desired states using the metrics
provided. It will output a final performance index depending on the error values and the
trends (increasing/decreasing/stationary error). Some of the standard criteria will be
applied to compute the performance index, such as: sum square of errors, integral error,
etc.
Implementation
The program that has received the metrics and goal (inputs to the program) can output a
message (local index, which is the production of the program) for the program in charge
to process this information. The program is completed with a model to calculate the
index following the chosen criteria. There will be an evaluation program per goal type.
These programs will output the performance indexes (implemented as markers) that will
be available again in a new group (performance group) to be accessed by those
programs that may need them as inputs.
Example
In the pong game a goal will be “Being the paddle and the ball at the same vertical (y
coordinate) position”. In this case the error will be simply the difference between the
paddle and ball y coordinates. In this case (and with the goal defined this way) the time
evolution of the error maybe important, as most likely this error will be decreasing in
time.
R3.Requirement: Measure the global performance based on the states
and metrics
Activities:
A1. Obtain for the global system its performance (initially off-line and in a further stage in real
time), based on the local performance and on the overall goal fulfilment, i.e. monitor the
progress (of performance) of the system.
A1. Activity: Measure global system’s performance
Based on the local performance of each schema and on an evaluation of the achievement of the
overall goal (performed using the provided metrics) an index (numerical) indicating the global
performance will be calculated. Maybe in some cases there is not an overall, general goal, in this
case the global performance will be based only on the reported local indexes. This global index
will be updated as the local performance indexes are updated. Again some issues have to be
considered as it is checking if the model is usually on time.
Technical solution
Computing a difference (error) between the actual and desired states using the metrics
provided. It will output a final global performance index. In order to take into
consideration goal time constraints, these can be included in the goal definition, now the
goal not only specifies the desired state but the time limit to reach that state. This is
applied to every goal.
Implementation
HUMANOBS Reasoning Process
V1.2
Example
In the Pong testbed, the global performance will be computed based on the local indexes
obtained for the different schemas: for moving the paddle, for catching a ball, for not
being scored... weighted accordingly to the relevance of the goals they pursue.
R4. Requirement: Identify relevant information associated to each goal.
Activities:
A1. Identify the data (patterns) that have triggered the achievement of a goal.
A1. Activity: Identify the data that have triggered the achievement of a goal.
Schemas are composed by forward and inverse models which are implemented as programs in
reduction groups. When a model is executed it is because some patterns have matched its input
section. Every schema has an associated goal. So, the patterns that have triggered that schema
are related to the goal. This information has to be notified to the model acquisition module as it is
relevant to build the Area of Interest.
Technical solution
When a goal is achieved the input data will be marked as relevant to this goal.
Implementation
When the inverse model that achieves a model is executed, its input data (patterns) are
marked as relevant. These markers are published (notified) to the Model Acquisition
module.
Example
When performing in the pong scenario, input data concerning balls in one direction would
eventually trigger predictions of scoring by a model of scoring, if there is any. This could
be used to make the Attention Control include in its Area of Interest balls with that
direction (and discard the ones in the opposite one).
R5. Requirement: Estimate the resources needed
Activities:
A1. Establish a WCET (Worst Case Execution Time)
A2. Communicate WCET
A1. Activity: Establish a WCET
The reaction system will establish a worst-case execution time in order to have a (the best at that
specific moment, the WCET) response in time.
Technical solution
Some exploration of the schemas connection (although schemas are not directly
connected, they are related through their goals and inverse models) can be performed in
order to estimate how many schemas are going to be activated and thus predicting (in
some way) the response time.
Implementation
A specific program will be devoted to perform this connection exploration. Each schema
may be loaded with an estimate of its execution cost and those costs can be employed in
the determination of the global execution time.
A2. Activity: Communicate WCET
The WCET will have to be notified to all the elements that could make use of it (most likely the
simulation control program and the final schemas that output the commands to the actuation
devices)
Technical solution
Not applicable.
HUMANOBS Reasoning Process
V1.2
Implementation
This can be a parameter (marker) for the simulation control program. Then this program
will stop the simulation (injecting that message into the simulation group).
Example
4.2
MODEL REVISION
This module is in charge of developing a generic information processing procedure. This procedure will
provide actions to increase the performance of the system. Figure 6 shows the functional decomposition of
the Model Revision module.
REQUIREMENTS
OR
Provide the actions
to increase system's
performance
analyze local/
system performance
decide upon
the info an
action on the
model
R1
A1
update model
parameters
change
parameters of
acceptabla
models
R2
A1
ask for new/better
models
send
notification for
new models
R3
identify new
information needed
A1
OR: overall requirement
identify new
info needed
notify to the
AC
R4
A1
A2
Ri: requirement i
Ai: activity i
Figure 8: Requirements decomposition for the Model Revision module.
Figure 8 shows the functional decomposition of the Model Revision module. This module is in charge of
developing a generic information processing procedure. This procedure will provide actions to increase the
performance of the system. Overall requirement
The model revision system shall:
•
Provide the actions necessary to increase the performance of the (reaction) system.
Specific Requirements
The model revision system shall:
R1. Analyse and evaluate the local/system performance
R2. Update parameters of models (schemas) in the reaction system
R3. Ask for new/better models (to the model acquisition system)
R4. Identify new information needed (for some models/goals)
R1. Requirement: Analyse the local/system performance.
Activities:
A1. Decide upon the information received if the model needs to be updated (which means
changing its parameters), needs to be changed or a new model is needed for a missing goal.
A1. Activity: Decide upon the information received if the model parameters can be
updated, needs to be changed or a new model is needed for a missing goal
The Model Revision will receive all the performance indexes at any given moments (it can be on a
time based or event based) from the Progress Monitoring system. The model revision will process
all the information in order to establish what to do. All the information received is stored in a new
module (historian module) as further processing can be done off-line with the information
recorded in a larger time span. Each schema (or structure associated with a goal) will provide at
least two types of performance indexes. These two types are the time performance index, which
evaluates if the action is on time (it complies with the time specs of the goal) and the functional
HUMANOBS Reasoning Process
V1.2
performance index, which evaluates if the action is the one required to fulfil the goal. Besides
additional information could be useful as resources consumption, etc. The time performance
index will be null if the goal has not time constraints. The result of this analysis will fall in one of
the following cases: a model performs well (over a specified threshold), a model performs bad
(this is composed of two additional subcases, there are more models in the reaction system
pursuing the same goal or there are no more models for the same goal or all the models perform
badly) and, finally, there is no model for a given goal.
Technical solution
Using the available indexes and taking into account all the relevant issues (time and
quality, performance trends, robustness, etc.) models will be classified as pertaining to
one of the above defined cases: acceptable, bad or non-existing. In the first case the
performance will be sent to a historian module, in the second case, first subcase the
action is to change parameters (next specific requirement) and in the second subcase
actions are related to the following 3 and 4 requirements. The last case has to do with
the third requirement.
Implementation
Example
In the pong game initially the Masterplan doesn’t take into account that the colour of the
ball matters (as it means that balls scores differently). The schema (model) to estimate
the score change will have a permanent error, which means a low performance index,
then being a candidate for replacement.
R2. Requirement: Update parameters of models (schemas) in the reaction
system.
Activities:
A1. Change parameters of bad models and send them to reaction system
A1. Activity: Change parameters of bad models and send them to the reaction system
Change parameters in the RS means switching from the current model used for a given
goal to another available one. This new model will override the old one. In normal
operation all the models for a single goal are activated and reduced, although only one of
the proposed actions is implemented. The remaining, not implemented, actions can be
evaluated against the real value and this difference can be used to choose the new
model that overrides the old one.
Technical solution
Implementation
Example
R3. Requirement: Ask for new/better models (to the model acquisition
system).
Activities:
A1. Send notification to the model acquisition of the model (and goal associated) needed.
A1. Activity: Send notification to the model acquisition of the model needed
For all the goals having an index 0 (or null) a request will be sent to the model acquisition for a
new model.
Technical solution
Not applicable.
Implementation
There will be a group with the goals that are not achieved, goals that need a new model.
This group will be projected on the model acquisition system, so it knows which new
models are needed.
Example
In the Pong game, the schema in charge of estimating the scoring needs a replacement.
For this schema the related information will be placed in the (“new models”) group.
Basically, the input data of the schema and the goal, as this can be a reference for
looking and acquiring the new model.
HUMANOBS Reasoning Process
V1.2
R4.Requirement: Identify new information needed (for some models/goals).
Activities:
A1. Identify new information needed for some models.
A2. Send notification to the attention control.
A1. Activity: Identify new information needed for some models.
Models that are not performing properly may not be complete and thus miss to provide good
actions in certain cases –imd models- or fail to predict correctly –fmd models. They need new
data in order to account for the unexplained (or badly explained) behaviours. In this case a better
model is needed, information regarding the models (current inputs and goals) is sent to the
attention control and to the model acquisition. The first one will try to expand the Area of Interest
in order to search for new data that makes a better model. The second one will correlate the new
model. (NB: Models are identified using Neural Networks, so it could happen that a model could
be enhanced without adding any more information but simply because when reviewing it there are
more data available for the Neural Networks)
Technical solution
Implementation
Example
A2. Activity: Send notification to the attention control.
The information will be send to the Attention Control module so it can generate a new Area of
Interest and that will be used to create a new model that will take into consideration the new
information.
Technical solution
Not applicable.
Implementation
A Replicode program.
Example
HUMANOBS Reasoning Process
V1.2
5 References
[Alarcon et al., 1994] Alarcon, I., Rodriguez-Marin, P., Almeida, L., Sanz, R., Fontaine, L., Gomez, P.,
Alaman, X., Nordin, P., Bejder, H., and de Pablo, E. (1994). Heterogeneous integration architecture for
intelligent control systems. IEEE Intelligent Systems Engineering, 3(3):138–152.
[Lyons and Arbib, 1989] Lyons, D. and Arbib, M. (1989). A formal model of computation for sensory-based
robotics. IEEE Transactions on Robotics and Automation, 5(3):280–293.
[Morari 1986] C. E. Garcia and M. Morari. Internal Model Control 1 – A Unifying Review and
Some New Results. Ind. Engr. Chen. Process Des. Dev., 25 (1986): pp. 252-65.
[Pezzulo and Calvi, 2007] Pezzulo, G. and Calvi, G. (2007). Schema-based design and the AKIRA schema
language: An overview. Anticipatory Behavior in Adaptive Learning Systems, pages 128–152.
[Wolpert, D. M., Gharamani, Z., and Jordan, 1995], M. An internal model for sensorimotor integration.
Science, 269:1179–1182. 1995.
HUMANOBS Reasoning Process
V1.2
6 GLOSSARY
Action. Signal sent from the Reaction System to an output device that actuates on the environment.
Area of Interest. Subset of the input data that results from applying some filter.
Attention control. Module that applies a filter on raw data in order to have the Area of Interest.
Backward chaining. Reasoning in reverse from a hypothesis—a potential conclusion to be proven—to the
facts that support the hypothesis (starting from the desired state (goals) deduce the input states (that
produce those goals)
Bactracking. Algorithm that allows to follow back some chain of events, like going to the origin node from
a graph, given the final arrived goal.
Forward chaining. It is reasoning from facts to the conclusion resulting from these facts (starting from the
input states arrive to some final states -goals-)
Forward model. It is a model that implements forward chaining.
Goal. Desired state.
(Replicode) Group. “The global workspace is organized as graphs of sub-workspaces called groups.
Groups are objects that contain views on other objects. What distinguishes between objects and views is
essentially that objects contain code (e.g. algorithms or data) whereas views contain data that qualify the
code. Code is qualified by control values such as activation (controls whether or not a program can run)
and saliency (controls whether or not code can be an input for a program).. [from Replicode language
specification v1.1a]
Implementation process. It consists on the construction of the models and its use by the Reaction
System
Integration process. It consists on identifying the need of new models and controlling the mechanism to
integrate them with the existing ones.
Inverse model. It is a model that implements backward chaining.
Masterplan. It is the initial knowledge, about the environment it is going to interact with, given to the
proper architecture modules (Reaction System, Attention Control).
HUMANOBS Reasoning Process
V1.2
Metrics. They are a set of values that can be used in order to determine the performance of the Reaction
System.
Model. It is a simplified representation of the behaviour of the environment. In this project there are only
forward and backward models.
Model Acquisition. It is the module of the architecture in charge of constructing Replicodes models from
the data supplied by the Area of Interest.
Model Revision. It is the module of the architecture in charge of revising the behaviour of the current
models, and deciding the need of better models or new models (to explain new behaviours)
Performance. It is the way the Reaction System (or models in it) interacts with the environment.
Global performance. It is a measure of the performance of the Reaction System.
Local performance. It is a measure of the performance of models in the Reaction System.
Program. It is a Replicode construct. “Programs react to incoming objects. A program defines a pattern of
time-constrained object sequences, including patterns and guards (conditions) on the individual incoming
objects and produces new objects built from the incoming ones, said new objects being called productions.
Programs are reactive, i.e. they perform as described whenever incoming objects match the time-pattern
and guards. The transformation of inputs into productions is called reduction.
Programs are state-less and have no side-effects.” [from Replicode language specification v1.1a]
Progress Monitoring. It is a module of the architecture in charge of measuring global and local
performances. It also provides relevant information to the Attention Control module.
Reaction System. It is a module of the architecture in charge of interacting with the environment.
Reactively. Computing an action given the inputs and goals.
Reduction. It is the procedure of executing a program or model, it happens when the input conditions are
satisfied.
Schema. It is a model based controller, composed of a pair of coupled forward and inverse models.
Simulation. It is the execution of models/schemas given an initial input and goals, removing the
connection to the environment. Actions computed in the simulation are the new inputs for subsequent
simulation cycles.
State. It is the value of a system variable.
System. It is a set of interrelated variables in a defined boundary.
WCET (Worst case execution time). It is an estimation of how long would it take the execution of a
program considering the worst scenario.