HUMANOBS Reasoning Process V1.2 HUMANOBS – Humanoids That Learn Socio-‐Communicative Skills By Observation HUMANOBS REASONING PROCESS Version 1.3 Authors Carlos Hernández1 Manuel Rodríguez1 Eric Nivel2 1Universidad Politécnica de Madrid – Autonomous Systems Laboratory 2RU-‐CADIA Deliverable D12 Reasoning Process Release 1 BELONGS TO WP: WP3 -‐ Model-‐driven Implementers and Integrators WP LEAD: UPM-‐ASLab WP PARTICIPANTS: UPM-‐ASLab ID # WP Orig. Date Actual Date REMARKS DATA 12 3 (M19) (M25) HUMANOBS Reasoning Process V1.2 Table of Content 1 INTRODUCTION 3 1.1 TESTBED SCENARIO 1.2 REFERENCE MATERIAL 3 4 2 THE REASONING PROCESS 5 2.1 THE REASONING PROCESS APPLIED TO THE BASE SCENARIO 7 3 THE IMPLEMENTATION PROCESS. USING KNOWLEDGE 9 3.1 REACTION SYSTEM 3.2 HIGH LEVEL REASONING 3.3 SCHEMA 3.3.1 SCHEMA DEFINITION 3.3.2 SCHEMA IMPLEMENTATION 10 15 18 18 19 4 THE INTEGRATION PROCESS. REVISING & INJECTING KNOWLEDGE 20 4.1 PROGRESS MONITORING 4.2 MODEL REVISION 20 24 5 REFERENCES 27 6 GLOSSARY 28 HUMANOBS Reasoning Process V1.2 1 Introduction This document describes the reasoning process (implementation & integration) in the HUMANOBS Project. The main goal of the project is to develop a system that is able to learn socio-communicative skills by observation. This observation is defined as a behaviour itself and it is described in the same fashion. In fact, every action the system can take is considered a behaviour and is modelled accordingly. Following the project’s goal, HUMANOBS architecture is model-based, and model-driven. It allows having two processes in parallel, one devoted to learning and the other one devoted to acting. Thus, the developed system will be running and interacting with the environment while learning new behaviours (as models) and updating and improving the existing ones. In this document both processes are thoroughly described. The first one will conform the Learning Cycle and the second one the Operation Cycle. The Learning Cycle is focused on building new models and updating old ones and the performing cycle is focused on using these models. The reasoning process encompasses not only the use of the models but also the process of improving these models and requesting new ones. The following sections explain both contributions to the reasoning process. Model acquisition is briefly depicted but there is no specific section describing it in detail. In order to illustrate the reasoning process a base scenario is defined next. This scenario will be used throughout the document as a leitmotif to ease the understanding of the different explanations. 1.1 TESTBED SCENARIO This scenario is based on the classic Pong videogame, and will further referred to as the Pong game. There is a board and two paddles, each one at one end of the court. Each player handles one of the paddles. The main goal of the game is to score more points than the opponent in a given time period. A player scores when a ball passes the opponent’s paddle. To make things more complex, in our testbed, there can be more than one ball at a time. Balls are of two different colours, and depending on the colour the scoring is different. The system, implemented according to the HUMANOBS architecture, will take the role of one of the players. The other role will be performed by a human player. HUMANOBS Reasoning Process V1.2 Base scenario specification: I/O data available • Ball position and velocity (vectors) at every sampling time (for every ball) • Paddle position at every sampling time (for each paddle) • Scoreboard at every sampling time (points of both players) Player’s controls (available actions for the system and its opponent): • Self paddle position increment: up, down or rest (if no action). Game characteristics • White ball scores 1 points • Red ball scores 3 points • Ball speed is constant • Paddles have a width • Walls that define the court are fixed • Balls rebounds on walls with perfect reflection • The duration of the game is known and fixed. • Paddle position increment a fixed amount on each keyboard stroke Goal • Score more points than the other player in the game time Masterplan (initial knowledge) • Ball position model (next ball position is the sum of the current one plus the current velocity vector) • How to put self’s paddle in a certain coordinate: by commanding up or down. • Inverse model for scoring more points than the opponent: o Score o Avoid being scored Knowledge to be learned • When scoring happens • Balls bounce on walls and paddles • Differently coloured balls score different • Scoring means being closer to the goal (and as a consequence an strategy to score is also to be learned) • Avoiding being scored means being closer to the goal (and as a consequence an strategy to avoid being scored is also to be learned). 1.2 REFERENCE MATERIAL The remaining content of this document uses concepts explained in detail in other documents. In order to follow the next chapters some reference material should be read in advance. Following there is a list of the main reference documents: • Replicode language specification v1.1a. Available at: https://projects.humanobs.org/projects/language/documents • HUMANOBS Architecture v1.0 https://projects.humanobs.org/projects/language/documents • Model Proposer V1.0 – available at: https://projects.humanobs.org/projects/wp2/documents • Pezzulo, G. and Calvi, G. Schema-based design and the AKIRA Schema Language: An Overview. In M.V. Butz and O. Sigaud, G. Pezzulo, G. Baldassarre (Eds.), Anticipatory Behavior in Adaptive Learning Systems: Advances in Anticipatory Processing. Springer, LNAI 4520. 2007 • Sanz, R. An Integrated Control Model of Consciousness. Proceedings of the conference Toward a Science of Consciousness. 2002. HUMANOBS Reasoning Process V1.2 2 The reasoning process HUMANOBS meta-architectural approach is based on the exploitation of explicit executable models to drive all system action. There are three main HUMANOBS architectural elements: 1) one devoted to generate models to explain the observed reality, this is the Model Acquisition module; 2) one, the agent controller, devoted to perform as a real human, this is the Reaction System module, which is in charge of computing the proper action; and, finally, 3) one devoted to increase the performance of the system by interacting with the previous elements to enhance models, this is the Model Revision module. Figure 1 shows a detailed description of the architectural elements and how the different elements share information. There are two cycles that run in parallel: the Learning Cycle and the Operation Cycle (explicitly shown in Figure 2). The overall learning cycle is as follows: data observed from the external world (without any interaction with the system) go into an Attention Control module, which is in charge of filtering data in order to select only the relevant data for the system in the pursuit of its specified goals. Some information (from the Masterplan) is initially included to allow for the first selection. Filtered data (called Area of Interest) are fed to the Model Acquisition module, whose main function is to correlate the data in order to obtain a (reusable) model that explains some behaviour related to some specific goal. The Model Acquisition is composed of three elements: the event detector, the correlator and the model builder. The event detector identifies a subset of the Area of Interest (filtered data) that triggered the production of useful data. This subset is used by the next element that correlates the data. This correlated data constitutes a model. The final element of this module translates the obtained correlation into a Replicode format model. These models go to the integration module that injects them into the (running) Reaction System module The overall Operation cycle is as follows: sensory input data from the environment is fed to another instance of the Attention Control module. This instance filters the data, once filtered, they are inputs to the Reaction System module. Notice that there is a distinction between observation (and Learning) and Operation (exploiting the acquired knowledge). These two functionalities will be run in parallel. The Reaction System is the module in charge of producing the desired system’s behaviour in the domain. It uses the already filtered data and produces actions (system behaviour) to achieve the system’s specified goals. The basic control structure in this module is the schema. Schemas are composed of interacting inverse and forward models that allow the generation of the system output action to be executed on the external world and the prediction of its result on the world respectively. The Reaction System allows to run the schemas in simulation mode, in order to control this mode, there is an additional element called Simulation Control. This element is in charge of setting the parameters of the simulation. Besides, there is another element that interacts with the Reaction System: the Model Selection, which is in charge of selecting the proper action when competing models are available to fulfil the same goal. HUMANOBS Reasoning Process V1.2 Reaction System’s performance has to be measured and evaluated. The Progress Monitoring is in charge of that measurement. This module receives information from the Reaction System (RS) regarding its current state and goals. It uses this information together with the (initially provided) domain metrics to generate measurements of the RS performance. Besides, the Progress Monitoring, identifies relevant information for a given goal and sends it to the Attention Control module. Finally, the Model Revision module is in charge of analysing and evaluating the RS performance. As a consequence of the analysis and evaluation it updates the existing models or asks the Model Acquisition module for new ones. filtered data Attention Control (AC) Observation subset of filtered data correlated data models Event Detector C MB Model Builder (MB) Correlator (C) filtering criteria (goals) Simulated Schema (dashed box) Model Acquisition (MA) C (learning cycle data) LEGEND Schema Replicode models Inverse model Forward model Replicode format data Undefined format data Non Replicode construct Instantiator Replicode construct (pgm, fmd, imd, etc.) goals/models needed change, delete models Integration ENVIRONMENT Model injection / Model deletion / Model change Model Revision (MR) Performance indexes Progress Monitoring States and goals (PM) Reaction System (RS) simulation parameters Simulation Control Model Selection Interaction Attention Control (AC) filtering criteria (relevant info) selection criteria (performance cycle data) filtered data Output action Figure 1 Main interaction flows between HUMANOBS architectural elements As it is indicated in the legend in the figure, the red boxes mean Replicode constructs (programs, models, etc.), the red arrows mean Replicode format information. The black boxes mean non-Replicode constructs (they are built using any other programming language), and the black arrows mean undefined format data. Notice that eventually some non-Replicode parts can be embedded using Replicode. HUMANOBS Reasoning Process V1.2 filtered data Attention Control (AC) Observation subset of filtered data correlated data models Simulated Schema (dashed box) Model Acquisition (MA) Event Detector C C MB Model Builder (MB) Correlator (C) (learning cycle data) filtering criteria (goals) LEGEND Schema Replicode models Inverse model Forward model Replicode format data Undefined format data Non Replicode construct Instantiator Replicode construct (pgm, fmd, imd, etc.) goals/models needed LEARNING CYCLE change, delete models ENVIRONMENT Model Revision (MR) Performance indexes Progress Monitoring States and goals (PM) Integration Model injection / Model deletion / Model change Reaction System (RS) simulation parameters Simulation Control Model Selection Interaction Attention Control (AC) filtering criteria (relevant info) (performance cycle data) filtered data OPERATION CYCLE selection criteria Output action Figure 2. Learning and operating cycles running in parallel in the HUMANOBS implementation & integration processes 2.1 THE REASONING PROCESS APPLIED TO THE BASE SCENARIO Following we will briefly sketch how the Humanobs Architecture would work in our testbed scenario. Suppose the initial Masterplan given in 1.1, and the following initial considerations: the system is playing against a human and the initial Attention Control (AC) modules for both the learning and operation cycles create an Area of Interest including balls’ and paddles’ positions. Given the initial Masterplan, the progress monitoring will assign a null performance to the subgoals of scoring and not getting scored present in the Reaction System, since it does not contain any model for them. The Model Revision would take this information and instantiate one Correlator to seek for a model of each subgoal, and would command the AC to extend the AoI to include scoring data. The Correlator charged with the subgoal of not getting scored would eventually come up with a correlation between the paddle intercepting the ball and that player not being scored. The Model Builder would convert that into a Replicode model that would be injected in the Reaction System. Once in execution, this model would produce the goal to put the paddle in the ball’s coordinate. This goal would be achieved by a model already present in the Masterplan. The Progress Monitoring would assign good performance index to all the schemas and the overall system, so the Model Revision keeps commanding the Model Acquisition to seek for a model of scoring (which is hard to find), while there is only one ball in the game. As soon as there are two or more at a time, the model for not being scored starts producing conflicting goals, since the system cannot command the paddle to move to the different balls’ coordinates at the same time. Performance indexes would degrade and the Model Revision would command the Model Acquisition to find a better model for not being scored, which could probably be found by considering only the closest ball for interception. Sometimes different actions for the same goal can be proposed, for example if two balls (of different) colours are approaching the Reaction System paddle, one above the paddle and the other one below the paddle two (conflicting) actions are proposed (moving up and down). This situation is detected by the Model Revision, and the Model Selection would take the decision about which action to implement. HUMANOBS Reasoning Process V1.2 In the case of being interested of predicting the behaviour ahead in time, then the simulation control element would start the simulation. The Model Revision is the module in charge of deciding if running the simulation mode or not. The remaining part of the document will explain in detail the described reasoning process. It will proceed by specifying module by module, establishing the needed requirements and commenting on how it is deployed in the system. It is divided into two main sections, the first one is related to the implementation process. The second one describes the integration process. These two processes –“implementation” and “integration”- will be identified and explained beforehand because they serve to structure the document. These two processes are orthogonal as indicated in Figure 3. The implementation consists on the construction of the models and its use by the Reaction System, which is the goal-driven schema assembly hierarchy. There is a goal dependency between the layers of this hierarchy. The integration, on the other hand, consists on identifying the need of new models and controlling the mechanism to integrate them with the existing ones. It is made by the Model Revision, which controls the Reaction System according to the schema control dimension (integrated cognitive control approach) and implemented through the integration module. Figure 3: System architecture showing the specification to implementation (bottom up hierarchy) and integration (integrated cognitive control -ICC). HUMANOBS Reasoning Process V1.2 3 The implementation Process. Using knowledge This section is dedicated to explain the specification to implementation process. It is related to the process of relating models and goals in different levels of abstraction. This will be a goal based hierarchy of active elements (schemas) conforming to what we call in the HUMANOBS Architecture the Reaction System. The process of generating an implementation is designed as an inference process, performing the backward chaining of inverse models: the specification of a behaviour defines origin and target states of the system and the environment. These states constitute inputs for the inverse models controlling the transitions along the chain leading to the achievement of the target state. Since the architecture is expanding, incomplete implementations shall be expected in the case that the components (models) necessary for a new skill are not developed yet (as they have to be learnt). Due to the modular nature of the architecture (thorough the use of schemas), the implementation process will be allowed to produce implementations with partially undefined components for which only a partial model would be computed, i.e. a model expressing its effects on the system (forward models), and the states it is supposed to have an impact on. In what follows, a description of the reaction system will clarify its functionality and components. After that, a description of the main component of both the reaction system, and of the reasoning process, the schema, will be provided. Finally, the reasoning process procedure (how to use the forward and inverse models) will be explained together with how it is implemented using the former described elements. HUMANOBS Reasoning Process V1.2 3.1 REACTION SYSTEM This component of the architecture is in charge of producing the best available action (regarding the existing models) according to the specified goals. It is the decision system. It will hold one/several model/s of the environment (real world) to interact with. These models are forward and backward models embedded in schemas. It ought to be noted that the Reaction System is not exclusively reactive and it may exhibit anticipatory behaviour as well, as it is indicated in the requirements specifications described next. REQUIREMENTS OR Provide a proper action acoording to goals R2 R1 R3 predict outcomes of actions behave reactively R4 provide measurements of performance select between actions get current data A1 simulate system's evolution A1 identify different actions A1 compute an action A2 control the simulation A2 select between actions A2 send the action to the world A3 TS I establish a performance metric send metrics, goals and states TS A1 A2 Technical solution OR: overall requirement Ex Ri: requirement i Ai: activity i I Ex Implementation details Example Figure 4: Requirements decomposition for the Reaction System module. Figure 4 shows the functional decomposition of the Reaction System. Next, a description of the specific requirements needed to fulfil the overall one is presented. Each requirement is further refined in several activities that are described in detail. Overall requirement The reaction system shall: • Provide a proper action to the perceived data from the environment according to its goals ( shall imitate humane behaviour in a controlled environment) Specific Requirements The reaction system shall: R1. Behave reactively (as indicated in the compute an action activity) R2. Predict the outcomes of its actions (exhibit anticipatory behaviour) R3. Select between different courses of action R4. Provide measurements of its performance (metrics to evaluate how far from the error it is) R5. Provide commands for the actuation devices HUMANOBS Reasoning Process V1.2 R1. Requirement: Behave reactively Activities: A1. Get current data from the external world A2. Compute an action A3. Send the action to the external world A1. Activity: Get data from the external world Data from the external world will have the Replicode format (some other element of the architecture will pre-process the data from the sensor and will translate them to Replicode). The reaction system will provide an interface to receive these data. Data coded as Replicode will be markers –(mks). The interface will be a Replicode program (pgm), or set of programs. The received data will be relevant data filtered by the Attention Control module. Technical solution Not applicable. If the translation from the sensor data to the replicode language is not done elsewhere then an API should be developed. Implementation Data will be Replicode markers (mks). These markers will be accessed by all the existing programs to check if they match their input patterns. Example In the pong game input data will be ball(s) position(s), ball velocity (vector), ball(s) colour(s), paddle A position, paddle B position, score player A and score player B. A2. Activity: Compute an action The reaction system will use schemas (defined in the following section) to compute an action. Basically, a schema is a feedback model-based controller (it complies with the internal model control (IMC) representation). It has got two main elements, the inverse and the forward model. The forward model provides an estimation of the system state given a set of inputs. It performs forward chaining. Given a goal and the estimation of the system’s state, the inverse model computes an action following backward chaining reasoning. The inverse model element of the schema outputs the (control) action to the actual system (external world) as well as to the forward model element. The forward model is executed and its output is compared with that of the real world (to account for disturbances and model mismatch). The result of the comparison is the input to the inverse model that computes (related to a goal) the (ideally perfect) control action to be applied to the external world. Schemas are organized hierarchically. The hierarchy will follow a functional decomposition. The overall (more generic) goal will be at the top of the hierarchy and will be refined going down in the hierarchy and decomposed in subgoals. Schemas (the inverse model element part) output will be a direct actuation to the external world or a goal for another schema. Schemas inputs will be (most of the time) data (in Replicode form) from the external world or estimations or predictions (output of the forward model element) from other schemas. Forward models as well as inverse models are implemented using specific Replicode constructs (fmd,imd), used in reduction groups (defined later in this chapter). This activity is related to the reactive behaviour of the system, anticipatory behaviour is included in the following requirement. Technical solution The technical solution to decide an action is a “hierarchical internal model control” approach. Each controller will be deployed as a schema. Implementation Schemas will be coded using reduction groups (rgrp), models (forward and inverse) will be coded as a special type of programs (fmd,imd), the comparison (between actual state and estimated state) will be implemented as a program (pgm) Example A schema in the pong game will be the one responsible to compute the next required position of the paddle. It will take as inputs the ball position, velocity and colour and the paddle position. The forward model will estimate the difference (in the vertical axis) between the ball and the paddle (the same input goes into the actual system and its output is compared with the estimated state). The inverse model will take as input this difference, and the velocity of the ball and the goal (make difference equal to zero) and HUMANOBS Reasoning Process V1.2 will output the position of the paddle as a goal for other schemas downwards the hierarchy. A3. Activity: Send the action to the external world The reaction system needs an interface translate the output of the lowest schemas in the hierarchy, which are Replicode commands, to signals for the actuation devices that operate on the environment. Technical solution mBrane I/O modules will be responsible for translating command generated by the RS into device executable actions Implementation Each Replicode command is uniquely identified (using an opcode). This identification is used to select the appropriate response. Additionally, each entity part of the command is also uniquely identified (using a number), and these identifiers are kept consistent between the two sides of the interaction (devices and executive). Given these rules, the implementation consists of a set of parsers triggered by opcodes. Hand-coded. Example In the pong game output data (actions) will typically be direction of paddle position increment in the next game time step (up, down or none if no command). R2. Requirement: Predict the outcomes of its actions Activities: A1. Simulate the evolution of the system in response to actions A2. Control of the simulation A1. Activity: Simulate the evolution of the system The main purpose of the simulation is to have a rigorous ground for anticipatory behaviour, i.e. it adds predictive capabilities to the system. A simulation of the consequences of the current action in the future could lead to a different control action than the one devised by the reactive behaviour module (if this initial action is not good enough, the effects of the alternative actions have to be simulated (to check if they are expected to perform better) which means that a number (and this number can be huge) of simulations may be needed. At the end a combination of both (reactive and anticipatory) will be implemented like in feedforward/feedback control configurations. When running in simulation mode there is no feedback from the external world. Initial actions are applied to the model of the system and the new calculated actions are applied again on the system itself (as if they had been reached) in order to advance in time. Notice that during a simulation all the disturbances are considered constant or having some a priori assumed dynamics (as there is no feedback from the external world.) Technical solution Initially the technical solution will be the same as the one applied for the reactive behaviour. The system will be operating exactly the same although the data is not coming from the external world (but at the initial time). Implementation The implementation will use Replicode constructs that allow the system to “know” that it is running in simulation mode (which will be executed in parallel to the reactive system). These constructs are: hypotheses, assumptions and predictions markers (for instance, when a pgm receives at least one input amongst its input set that is a simulation marker, then the results are tagged with simulation markers). Example Let’s suppose the Pong game with two balls at the same time. One ball is white and one is red (both have different scoring values, the white one 1 point and the red one 3 points). Let’s suppose that the white ball is closer (in y coordinate) to the paddle (with a lower y value) and that the red ball has a higher y coordinate. In this case the reactive behaviour would command a new paddle position (move down). The simulation model will take as inputs balls positions and velocities at the initial time (to) as well as the paddle positions and current scores. After that time, an action is computed (next position HUMANOBS Reasoning Process V1.2 of the paddle) and it is fed again as input to the simulation cycle (along with the predicted positions and velocities) at time to+1. This can be useful as the simulation may show that it is better to move up (even at the cost of the white ball scoring) in order to be able to reach the red ball (which wouldn’t be possible if moving down first). A2. Activity: Control the simulation The extent of the simulation (in depth and in time), as well as the number of simulations, has to be fixed according to criteria concerning the amount of resources needed, the available ones and the expected utility –in goals’ terms- of doing the simulation. Likely the in-depth extent will be most of the time the whole system, as it will be very interrelated (meaning simulating from the lower level of the hierarchy to the top one). The in-time extent will depend upon the time response requirements or upon if a stationary state has been already reached. The simulation control module (it will be implemented as a program, or set of, in Replicode) will receive the criteria to start/stop the simulation from the Progress Monitoring. It will also receive the parameters needed to run the simulation (which schemas will be simulated, the simulation time span, etc.). Technical solution The system will run basically in pure reactive mode, if the results are not acceptable then an activity will be triggered- a message will be issued (by the Model Revision) in order to start the simulation. Time constraints are problem dependent and shall be provided before executing the system. When the available time is reached a new message is issued to stop the simulation. The computed action is sent with additional information regarding if the simulation was completed in time. Implementation A group will be created for each simulation. All the schemas running a simulation are projected into this group. The simulation control will activate/deactivate the group (and consequently the schemas) in order to start/stop the simulation. Simulation parameters will be data available in that group (provided by a program) so all the schemas attached to this group will have access to the parameters. Example In the former case of the pong game, a simulation of the initial action (moving down the paddle) would lead to a final result of 3 points (because the red ball would have scored). The simulation of an alternative action (moving up the paddle) would lead to a better result (1 points, white ball scoring). In this case the simulation would run in time until all the balls have disappeared (stationary state) unless the time required to run the simulation would be greater than the one required for the next paddle response (sampling time). R3. Requirement: Select between different courses of action Activities: A1. Identify different actions for the same goal A2. Select between conflicting actions A1. Activity: Identify different actions for the same goal The running system will have a set of models implemented and active. There can be different models for the same goal, these models are all activated and executed in parallel. This means that after receiving new data, the system can eventually propose more than one action (for the same goal and external data). The reaction system will provide a mechanism to detect if some of the different actions proposed correspond to the same goal. Mainly, it will be done using backtracking from the commands (final actions) until a final common goal is reached. Another source of different actions for the same goal comes from the anticipatory (simulation) behaviour. Technical solution Using backtracking, i.e. starting from the final actions (those to be sent to the devices) their related goals are identified. These goals come from actions of upper (in the hierarchy) schemas. These actions are identified and subsequently their goals, and so on. This procedure ends when a final goal is reached. If this goal is the same for the different actions then those actions are marked as conflictive. The technical solution to HUMANOBS Reasoning Process V1.2 identify the anticipatory behaviour is not necessary as this action will be always marked as such (conflictive) because there will always be the reactive counterpart. Implementation The backtracking will be implemented as a set of programs, it will use the notification information that is available after a Replicode program or model is reduced (which means that the data match the required inputs and the program or model is then executed). Example In the pong game we could have different schemas to address the goal of minimising being scored: one based on simply intercepting the closer ball, and other that takes into account the different balls present at that moment and optimises the catching order so as to minimise the scoring for a certain horizon. These two schemas could eventually produce opposite movement commands. A2. Activity: Select between conflicting actions The reaction system will decide on which action to command to the actual device. The reaction system will have some established criteria to make the decision (these may be success rate, resources consumption, fast execution, etc.). This criteria come from the Model Revision. Technical solution The system will be initially endowed with some predefined mechanisms for conflict resolution. These will be adapted by the metasystem according to the attainment of goals using the performance measures provided in the domain ontology Implementation The system will be initially endowed with some predefined mechanisms for conflict resolution. The Masterplan contains predefined models that are built like the attention control models: these models are hand-crafted to identify (a) present states, (b) expected outcomes and, (c) costs of execution. They take as input the simulation results of the conflicting goals (more accurately: the simulation results of each goal being a member of the set of conflicting goals). Notice also that the patterns on expected outcomes may include (whe appropriate) a priority. As for the present version, said models control the resilience of the conflicting goals, leaving only one alive. In case of a tie, one goal is selected randomly. Example Let’s take the Pong example with the existence of two schemas for minimising being scored: one that moves the paddle towards the closer ball, and another one that takes into account the different balls present at that moment and optimises the catching order so as to minimise the scoring for a certain horizon. The Reaction System will decide which one to select by considering the simulation results of both. R4. Requirement: Provide measurements of its performance Activities: A1. Provide a performance metric associated to each goal (or goal type). A2. Send performance metrics, current states and the desired states (variables/signals/messages) to the progress monitoring. A1. Activity: Establish a performance metric according to each type of goal The reaction system will provide a (measurable, numeric) metric for each goal of the system. Technical solution Not applicable. These metrics will be provided by the domain ontology. Implementation Metrics are “a priori” knowledge and as such should be included in the Masterplan (this the –minimum- essential information needed to drive the whole system). They should be “attached” to each goal as an attribute, using markers (mks). Example In the Pong game some metrics are quite straightforward. Related to the goal of “paddle and ball have the same y coordinate”, the metric will be the scalar difference of the vertical position of both objects. HUMANOBS Reasoning Process V1.2 A2. Activity: Send performance metrics, current states and goals (desired states) to the progress monitoring. The reaction system will make available, for each schema the current goal (desired state), the metrics associated to that type of goal, and the current state. Technical solution Send a signal with the requested information using Replicode. Implementation These data can be sent to a group that will receive the different schema’s performance. Any other module that needs this information (mainly the progress monitoring) will be projected onto that group to have access to the data. Example 3.2 HIGH LEVEL REASONING High-level logical reasoning is performed through forward and backward chaining. Following is a description of the Replicode language components available to implement the reasoning process. Predicates Facts Facts are objects that points to other objects and indicate a timestamp (the time of their occurrence), a frequency that indicates how often the fact has been injected (in [0,1]) and a confidence value (also in [0,1]) indicating how reliable the fact is. Replicode also provides constructs to indicate the absence of a fact (|fact, with the same members as for a fact). Variable Objects Such objects are objects that point to other objects: this pointer can be reassigned dynamically during chaining: the variable is said to become bound to an actual value, a pointer to an object. Assumptions Such objects, as many other predicates, point to other objects and express that they result from an inference, as opposed to established facts, predictions, goals, etc. They are encoded as follows: (mk.asmp a-fact a-source confidence-value) where a-fact is a pointer to a fact which the executive has assumed the existence, a-source is the component of the system which has performed the assumption (it can be either a model or a composite state, see below), and confidence-value is an indication of the reliability of the assumption. Goals Goals are objects either produced by programs or by the executive as the result of backward chaining. A goal is the specification of a target state, in the form of a reference to a fact. The syntax is: (mk.goal a-fact an-actor) where a-fact is a pointer to the target fact and an-actor is a pointer to the system that pursues the goal (i.e. self or any other active entity in the world. Predictions Predictions are objects either produced by programs or by the executive as the result of forward chaining. The syntax is: (mk.pred a-fact confidence-value) where a-fact is a pointer to the predicted fact and confidence-value is an indication of the reliability of the prediction. The predicted time of the occurrence of the fact is the timestamp member of the fact. Hypotheses Hypotheses are objects that indicate to the executive that they shall be processed in simulation runs. The syntax is: (mk.hyp a-fact) where a-fact is the hypothetical fact. where the members are defined as for facts. Simulation Results Such objects indicate that the predicate they refer to has been computed as the result of a simulation, following the injection of hypotheses. The syntax is: HUMANOBS Reasoning Process V1.2 (mk.sim a-fact a-source) where a-fact is the result of the simulation and a-source is either the model or the composite state that produced it. High-level active predicates Composite States A composite state is a structure meant to encapsulate a set of patterns – for example, the position of the hand of a robot and the fact that the hand is actually a hand, belonging to a particular entity (the robot). A composite state is an active object, i.e. it performs like a program that matches the occurrence of all the patterns it contains and outputs an instantiation of itself, that is a version of itself where some of the variables are bound to actual values. A composite state is defined by the following construct: (cst objects output_groups time_scope) where objects is a set of abstracted objects (objects that may contain variables), out_groups is a set of groups where the executive is to inject the productions and time_scope is the time window whose semantics are exactly the same as for programs – inputs are matched together during the time window called time_scope. Models Models are structures that contains a directed pair of patterns – essentially, F(x,y,...) → G(x,y,...). Models are also active objects, that is, they catch input objects and attempt to match them against their left-side pattern, bind some of its variables and produce an instance of their right-side pattern. The construct is: (mdl objects output_groups time_scope) with the same members as for composite states. When read from left to right, models realize the function of forward models, when read from right to left, the function of inverse models (see reduction below). Composite State Instances Composite state instances are specific constructs that indicate that a composite state has been instantiated with some values, i.e. that it has matched all of its patterns. The construct is: (csi c-state arguments) where c-state is the composite state that has matched its patterns and arguments is the set of values assigned to the composite state's variables. Model instances Model instances are to models what composite state instances are to composite states. The construct is: (imdl a-model arguments) where a-model is the model that has been instantiated with the values specified in arguments. A model is instantiated as the result of matching its left-side pattern. Construction Composite states and models are either constructed by hand or produced by the Correlator. Regardless of their origin, composite states and models contains objects in the form of facts. Models can be interpreted as belonging to two types: A – predictors: (fact F(x,y, …) ) → (fact G(x,y, …)) which reads “F(x,y, …) entails G(x,y, ...)”. If G is an instance of a model M, this reads “ F(x,y, …) entails the success of M(x,y, ...)” B – requirements: (|fact F(x,y, …) ) → (|fact G(x,y, …)) which reads “not F(x,y, …) entails not G(x,y, ...)”. If G is an instance of a model M this reads “without F(x,y, …) M(x,y, ..) will fail”. Reduction Composite states and models produce objects (facts) from the matching of their patterns. Replicode performs inferences using models bi-directionally, i.e. matches either left-side patterns and produces predictions (forward chaining), or matches the right-side patterns and produces goals (backward chaining). For composite states, chaining is slightly different (as there is no directionality encode in composite states): when all patterns of a composite state are matched within the given time scope, an instance is produced (forward chaining). Backward chaining: (a) when a goal matches one pattern of the composite state, the executive produces as many goals targeting each the remaining patterns; (b) when a goal targets an instance of a composite state then one goal is generated for each of the patterns (if these are achieved, forward chaining will instantiate the composite state thus fulfilling the initial goal). Each reduction triggers the production of a runtime execution notification (mk.rdx) as for programs. Each production, be it a prediction, a goal or an assumption is monitored, i.e. the executive keeps track of any subsequent object that matches (on time) the production and reports the success or failure. We give below the reduction rules implemented in the executive: I – Composite States For a composite state CS containing the patterns: A(x,y, …), B(x,y, …), C(x,y, …) IF – forward chaining: IF1 – receiving a(xa,ya, ...) matching A(x,y, …) spawns an overlay (defined as for programs), this means all combinations of inputs are scanned for during the specified time scope; when all patterns are matched, an instance of the composite state is produced and injected in the specified output groups. The instance lists all the values to which the variables where bound during the matching process. HUMANOBS Reasoning Process V1.2 IB – backward chaining: IB1 – receiving a goal targeting an instance of CS with a set of arguments [xa,ya, …]. IB1a – there exist at least one active requirement (as defined above): then a goal is produced (and injected in the ouptut groups) that targets the model encoding the requirement; this goals actually targets an instance of said model to which the incoming set of arguments ([xa,ya, ...]) is passed – notice that some of these arguments may be unbound. If this goal is satisfied (this is detected by monitors managed by the executive), an instance of CS is produced and contains another set of arguments – the initial set of arguments, some of which may have been bound by the model that responded to the requirement. At this point, the executive issues another set of goals, each targeting one of the patterns found in CS, bound to the latest set of arguments. If these goals are matched, then the forward chaining will eventually produce the instance that was targeted by the initial goal. IB1b – there is no active requirement: then the executive proceeds by issuing one goal per pattern in CS (as in the last step of rule IB1a). IB2 – receiving a goal targeting one of the patterns of CS – say, A(x,y, …). The variables of CS are bound to the values provided by the goal and one goal is issued for each of the remaining patterns in CS (in our case, for B and C). IB3 – receiving a goal targeting the negation of on of the patterns of CS. Then the executive produces a goal targeting the negation of an instance of CS (with the values provided by the goal). IB4 – receiving a goal targeting the negation of an instance of CS. IB4a – there exist at least one active requirement. Then a goal is produced targeting the negation of an instance of the model that encodes the requirement. If this goal is satisfied, then the executive produces one goal for the negation of each of the patterns of CS. II – Models For a model M containing the patterns A(x,y, …) → B(x,y, ...) IIF – forward chaining IIF1 – receiving a(xa,ya, ...) matching A(x,y, …) produces a prediction targeting B(xa,ya, …) - injected as usual in the specified output groups. IIB – backward chaining IIB1 – receiving a goal targeting b(xb,yb, ...) IIB1a – there exist at least one active requirement: then a goal targeting an instance of M with the incoming arguments is issued. If this goal is satisfied, then a goal targeting a(xb,yb, ..) is injected. IIB1b – there is no active requirement: then a goal targeting a(xb,yb, …) is produced. IIB3 – receiving a goal targeting the negation of b(xb,yb, …) IIB3a - there exist at least one active requirement on M. Then a goal targeting an instantiation of M is issued. If this goal is satisfied then the executive produces an assumption on the negation of a(xb,yb, …). IIB3b - there is no active requirement: then a goal targeting the negation of a(xb,yb, …) is produced. III General Rules IIIA – All goals and predictions are monitored. These are monitored by the models/composite states that produced them. This means that shall these active constructs die or become inactive, the monitoring will cease or be suspended – respectively. A prediction/goal is monitored by the executive as follows: if an object matches the prediction/goal on time, a success object (fact (mk.success theprediction-or-goal)) is injected in the output groups, and the prediction/goal is deleted; if by the predicted/target time, no object has matched the prediction/goal, a failure object is injected (|fact (mk.success the-prediction-or-goal)) and the prediction/goal is deleted. IIIB – In case a goal or a prediction fails, the negation of the targeted/predicted fact is injected (N.B.: negating the negation of a fact gives a fact). IIIC – Each time a match occurs, a reduction marker is injected (as for programs). IIID – hypothesis, simulation, assumption and prediction markers are retained during chaining, i.e. goals and predictions are tagged with the same markers as the inputs that triggered their production (as for programs). IIIE – a goal targeting a command to an I/O device triggers the execution of said command by said device. This is performed only once, i.e. the first time the command object gets saliency (as defined by the Replicode language). Once executed, the command object remains in the memory until its time to live has expired (the latter is an argument of the command). Current Limitations Models and composite states do not encode functional correlations between variables yet. Planned future work will be dedicated during the year three of the project to implement reversible functions (fun v0 v1 v0-to-v1 v1-to-v0) where v0 and v1 and v0-to-v1 and v1-to-v0 are algorithms that bind v1 with the value v0-to-v1(v0), and reciprocally, v0 with v1-to-v0(v1). As of the year two of the project, variables are compared using equality checks and time ordering in case of temporal variables. HUMANOBS Reasoning Process V1.2 3.3 SCHEMA The Reaction System has been described so far in general terms and in relation with specific requirements. This module (the RS) contains the model/s of the system and uses it to output an action as a response to some (external) data and according to some (established, given) goals. The main component used in the Reaction System is the schema Alarcon 1994, Pezzulo 2007, Lyons 1989], in this section it will be defined and an explanation on how it will be implemented and used will be provided. 3.3.1 Schema Definition The schema is the basic control structure of the HUMANOBS foreground architecture. It is essentially an internal-model based controller [Morari 1986]. It is composed by different elements being the forward and the inverse model elements the main ones. The forward model (fmd) takes as inputs the actual state of the world and predicts the target state at next time. The prediction is compared with the actual value of the target state in order to have a measure of how good the model is or an estimation of the disturbances to the system. The result of this comparison is, together with the desired state (goal), the input to the inverse model (imd) which derives the proper action to fulfil the goal. SCHEMA goal(set point) FMD IMD + + fmd imd - - system modeled system + + disturb Figure 5: Schema structure and its relation with the actual system. During the second year of the project, we have been using a simplified version of the schema above – a schema without the comparator in the forward model. The functioning is as follows: the forward model predicts the state of the controlled entity, and this is the input – along with the goal – of the inverse model. The main difference with the fully featured schema is that the forward model does not adjust to disturbances. Figure 6: Simplified schema HUMANOBS Reasoning Process V1.2 3.3.2 Schema implementation As mentioned earlier, Replicode models are executed forwards and backwards. The qualifiers “forward” and “inverse” thus just describe an arrangement – pattern-matching-wise – of said models and their inputs: shall the latter match the right-side, the model operates as an inverse model, a forward model otherwise. It follows that the functional composition of a schema (a forward model that outputs data that matches the input of an inverse model) is – implementation-wise – realized by two models. The general form of implementation is as follows: Let us assume a model M0: (fact F(x,y, …) t0) → (fact G(x,y, …) t1) and another model M1: (|fact (G(x,y, …) t0) → (|fact iM0(x,y, …) t1). M1 is a requirement to the instantiation of M0. When an input comes in the form of a goal targeting G(xg,yg, ...)), a subgoal targeting the instantiation of M0 is produced. According to the reduction rules enforced by the Replicode executive, a goal targeting (fact (G(xg,yg, …) is produced as the expression of the requirement. This goal may be fulfilled by the current state G(xc,yc, …),. thus triggering the release of the goal F(xc,yc, …). Schemas are naturally assembled constituting a hierarchical structure: this structure is determined at runtime by the patern-matching affordances – in other terms, when a component is represented by a box with one input A and one output B, and another sees its output C connected to the former's input A, this is realized by having outputs from C matching the input A. There is no hard (or static) coupling in Replicode. This means actually that since goals trigger the production of subgoals, models are coupled at runtime depending on their matching abilities. This effectively forms a hierarchy since subgoals are issued on the basis of functional dependencies (see for example reduction rules IB1a, IB4a and IIB1a presented above). Notice also that since models are shared (overlaid reduction, as mentioned earlier), a model can be part of several reduction hierarchies. As mentioned earlier, future work planned for the year three of the project will implement invertible functions, which will yield schema implementations according to the most elaborate version of the schema (depicted in figure 5 above); however, the principle will remain the same: one single model coupled with a requirement model will still implement a schema. HUMANOBS Reasoning Process V1.2 4 The integration Process. Revising & injecting knowledge The process of integrating new implementations in the existing architecture will be also designed as a production process, working this time at a global scale. As it is has been elaborated in the previous section the implementation process is in charge of taking actions to achieve system’s goals (based on the current knowledge), and acquire new models. The former is addressed by the Reaction System module, while the latter is addressed by the Model Acquisition module. The integration process is in charge of deciding if new or better models are needed (based on the current system’s performance) and inserting the new acquired models in the Reaction System without disrupting its behaviour. This integration process is related to the Integrated Cognitive Control, which is specifically addressed with the Progress Monitoring and Model Revision modules of the architecture. This part explains how to "rank" a model and how to ask for a better one (if model performance is below a threshold) to the Model Acquisition module. Finally, it also explains how to ask for a new model (when a goal without model attached is detected). Although the Progress Monitoring and the Model Revision are described in the integration process chapter, they also contribute to the implementation process, mainly acting on the Attention Control module as it was shown in Figure 1 at the beginning of the document. 4.1 PROGRESS MONITORING The Progress Monitoring (PM) module is charge of measuring the Reaction System’s performance. It will measure (according to some provided metrics) the local and global performance of the system and it will identify relevant information related to a goal (which is important in order to drive the Attention Control module). HUMANOBS Reasoning Process V1.2 REQUIREMENTS OR Evaluate the global performance R1 Receive the metrics and states have the proper interface A1 measure local performance R2 A1 develop goal's local performance measure global performance develop global system's performance R3 A1 identify relevant info for each goal identify patterns that trigger goals R4 estimate resources needed A1 OR: overall requirement establish a WCET comunicate WCET Ri: requirement i R5 A1 A2 Ai: activity i Figure 7: Requirements decomposition for the Progress Monitoring module. In the above figure is shown the functional decomposition of the Progress Monitoring module. What follows is a description of the specific requirements needed to fulfil the overall one is presented. Each requirement is further refined in several activities that are described in detail. Overall requirement The progress monitoring system shall: • Monitor the progress of the (reaction) system (towards the achievement of the goals) Specific Requirements The progress monitoring system shall: R1. Receive the metrics, goals and states from the Reaction System R2. Measure the local performance based on the states and metrics R3. Measure the global performance based on the states and metrics R4. Identify relevant information associated to each goal R5. Estimate the resources (i.e. computation time) needed (for choosing an action) R1.Requirement: Receive the metrics and states from the reaction system. Activities: A1. To have the proper interface for the states and metrics A1. Activity: To have the proper interface for the states and metrics The progress monitoring system will have a program dedicated to receive the messages from the Reaction system concerning metrics, states and goals. Technical solution The Progress Monitoring will have access to a signal from the Reaction System containing, for every active schema, its current goal and associated metrics and its current state. Implementation Depending on how the reaction system publishes these data. If they are placed in a group then the progress monitoring (which will be a set of programs) will have a program subscribed to that group in order to have access to the data. Example R2.Requirement: Measure the local performance based on the states and metrics Activities: A1. Compute for each goal (schema) the actual performance (initially off-line and in further steps in real time) i.e . monitor the progress (of performance) of each schema. HUMANOBS Reasoning Process V1.2 A1. Activity: Develop for each goal the actual performance The progress monitoring system will build upon the information received a measure of the performance of each schema. Basically, it will process the current and desired states with the related metrics. The result will be an index (numerical) that will indicate the current performance (these indexes are all commensurate with each other, i.e. they are all referred to the same general scale, i.e. normalized). This performance index will be updated as new data arrive. The performance evaluation will take into account the temporal evolution of the performance. This element will also receive instantiated goals and for which there is no model developed, receiving a null state as the current state, and therefore assigning a null performance. If a goal has a large and stationary error then it will be marked with a zero index (indicating that the model is so bad that a new model is needed, this evaluation is done by the Model Revision module). Initially we will distinguish between non-existing models and very bad models (null vs. zero index). There are several issues that will have to be taken into account, such as: the delay between the action and its effect on the actual system, several schemas contributing to achieve a state (goal), being in time (not only a model has to be good enough, it has to produce an output in time), time evolution (not only the final value will be considered but the trend of the state, its time evolution). Technical solution Computing a difference (error) between the actual and desired states using the metrics provided. It will output a final performance index depending on the error values and the trends (increasing/decreasing/stationary error). Some of the standard criteria will be applied to compute the performance index, such as: sum square of errors, integral error, etc. Implementation The program that has received the metrics and goal (inputs to the program) can output a message (local index, which is the production of the program) for the program in charge to process this information. The program is completed with a model to calculate the index following the chosen criteria. There will be an evaluation program per goal type. These programs will output the performance indexes (implemented as markers) that will be available again in a new group (performance group) to be accessed by those programs that may need them as inputs. Example In the pong game a goal will be “Being the paddle and the ball at the same vertical (y coordinate) position”. In this case the error will be simply the difference between the paddle and ball y coordinates. In this case (and with the goal defined this way) the time evolution of the error maybe important, as most likely this error will be decreasing in time. R3.Requirement: Measure the global performance based on the states and metrics Activities: A1. Obtain for the global system its performance (initially off-line and in a further stage in real time), based on the local performance and on the overall goal fulfilment, i.e. monitor the progress (of performance) of the system. A1. Activity: Measure global system’s performance Based on the local performance of each schema and on an evaluation of the achievement of the overall goal (performed using the provided metrics) an index (numerical) indicating the global performance will be calculated. Maybe in some cases there is not an overall, general goal, in this case the global performance will be based only on the reported local indexes. This global index will be updated as the local performance indexes are updated. Again some issues have to be considered as it is checking if the model is usually on time. Technical solution Computing a difference (error) between the actual and desired states using the metrics provided. It will output a final global performance index. In order to take into consideration goal time constraints, these can be included in the goal definition, now the goal not only specifies the desired state but the time limit to reach that state. This is applied to every goal. Implementation HUMANOBS Reasoning Process V1.2 Example In the Pong testbed, the global performance will be computed based on the local indexes obtained for the different schemas: for moving the paddle, for catching a ball, for not being scored... weighted accordingly to the relevance of the goals they pursue. R4. Requirement: Identify relevant information associated to each goal. Activities: A1. Identify the data (patterns) that have triggered the achievement of a goal. A1. Activity: Identify the data that have triggered the achievement of a goal. Schemas are composed by forward and inverse models which are implemented as programs in reduction groups. When a model is executed it is because some patterns have matched its input section. Every schema has an associated goal. So, the patterns that have triggered that schema are related to the goal. This information has to be notified to the model acquisition module as it is relevant to build the Area of Interest. Technical solution When a goal is achieved the input data will be marked as relevant to this goal. Implementation When the inverse model that achieves a model is executed, its input data (patterns) are marked as relevant. These markers are published (notified) to the Model Acquisition module. Example When performing in the pong scenario, input data concerning balls in one direction would eventually trigger predictions of scoring by a model of scoring, if there is any. This could be used to make the Attention Control include in its Area of Interest balls with that direction (and discard the ones in the opposite one). R5. Requirement: Estimate the resources needed Activities: A1. Establish a WCET (Worst Case Execution Time) A2. Communicate WCET A1. Activity: Establish a WCET The reaction system will establish a worst-case execution time in order to have a (the best at that specific moment, the WCET) response in time. Technical solution Some exploration of the schemas connection (although schemas are not directly connected, they are related through their goals and inverse models) can be performed in order to estimate how many schemas are going to be activated and thus predicting (in some way) the response time. Implementation A specific program will be devoted to perform this connection exploration. Each schema may be loaded with an estimate of its execution cost and those costs can be employed in the determination of the global execution time. A2. Activity: Communicate WCET The WCET will have to be notified to all the elements that could make use of it (most likely the simulation control program and the final schemas that output the commands to the actuation devices) Technical solution Not applicable. HUMANOBS Reasoning Process V1.2 Implementation This can be a parameter (marker) for the simulation control program. Then this program will stop the simulation (injecting that message into the simulation group). Example 4.2 MODEL REVISION This module is in charge of developing a generic information processing procedure. This procedure will provide actions to increase the performance of the system. Figure 6 shows the functional decomposition of the Model Revision module. REQUIREMENTS OR Provide the actions to increase system's performance analyze local/ system performance decide upon the info an action on the model R1 A1 update model parameters change parameters of acceptabla models R2 A1 ask for new/better models send notification for new models R3 identify new information needed A1 OR: overall requirement identify new info needed notify to the AC R4 A1 A2 Ri: requirement i Ai: activity i Figure 8: Requirements decomposition for the Model Revision module. Figure 8 shows the functional decomposition of the Model Revision module. This module is in charge of developing a generic information processing procedure. This procedure will provide actions to increase the performance of the system. Overall requirement The model revision system shall: • Provide the actions necessary to increase the performance of the (reaction) system. Specific Requirements The model revision system shall: R1. Analyse and evaluate the local/system performance R2. Update parameters of models (schemas) in the reaction system R3. Ask for new/better models (to the model acquisition system) R4. Identify new information needed (for some models/goals) R1. Requirement: Analyse the local/system performance. Activities: A1. Decide upon the information received if the model needs to be updated (which means changing its parameters), needs to be changed or a new model is needed for a missing goal. A1. Activity: Decide upon the information received if the model parameters can be updated, needs to be changed or a new model is needed for a missing goal The Model Revision will receive all the performance indexes at any given moments (it can be on a time based or event based) from the Progress Monitoring system. The model revision will process all the information in order to establish what to do. All the information received is stored in a new module (historian module) as further processing can be done off-line with the information recorded in a larger time span. Each schema (or structure associated with a goal) will provide at least two types of performance indexes. These two types are the time performance index, which evaluates if the action is on time (it complies with the time specs of the goal) and the functional HUMANOBS Reasoning Process V1.2 performance index, which evaluates if the action is the one required to fulfil the goal. Besides additional information could be useful as resources consumption, etc. The time performance index will be null if the goal has not time constraints. The result of this analysis will fall in one of the following cases: a model performs well (over a specified threshold), a model performs bad (this is composed of two additional subcases, there are more models in the reaction system pursuing the same goal or there are no more models for the same goal or all the models perform badly) and, finally, there is no model for a given goal. Technical solution Using the available indexes and taking into account all the relevant issues (time and quality, performance trends, robustness, etc.) models will be classified as pertaining to one of the above defined cases: acceptable, bad or non-existing. In the first case the performance will be sent to a historian module, in the second case, first subcase the action is to change parameters (next specific requirement) and in the second subcase actions are related to the following 3 and 4 requirements. The last case has to do with the third requirement. Implementation Example In the pong game initially the Masterplan doesn’t take into account that the colour of the ball matters (as it means that balls scores differently). The schema (model) to estimate the score change will have a permanent error, which means a low performance index, then being a candidate for replacement. R2. Requirement: Update parameters of models (schemas) in the reaction system. Activities: A1. Change parameters of bad models and send them to reaction system A1. Activity: Change parameters of bad models and send them to the reaction system Change parameters in the RS means switching from the current model used for a given goal to another available one. This new model will override the old one. In normal operation all the models for a single goal are activated and reduced, although only one of the proposed actions is implemented. The remaining, not implemented, actions can be evaluated against the real value and this difference can be used to choose the new model that overrides the old one. Technical solution Implementation Example R3. Requirement: Ask for new/better models (to the model acquisition system). Activities: A1. Send notification to the model acquisition of the model (and goal associated) needed. A1. Activity: Send notification to the model acquisition of the model needed For all the goals having an index 0 (or null) a request will be sent to the model acquisition for a new model. Technical solution Not applicable. Implementation There will be a group with the goals that are not achieved, goals that need a new model. This group will be projected on the model acquisition system, so it knows which new models are needed. Example In the Pong game, the schema in charge of estimating the scoring needs a replacement. For this schema the related information will be placed in the (“new models”) group. Basically, the input data of the schema and the goal, as this can be a reference for looking and acquiring the new model. HUMANOBS Reasoning Process V1.2 R4.Requirement: Identify new information needed (for some models/goals). Activities: A1. Identify new information needed for some models. A2. Send notification to the attention control. A1. Activity: Identify new information needed for some models. Models that are not performing properly may not be complete and thus miss to provide good actions in certain cases –imd models- or fail to predict correctly –fmd models. They need new data in order to account for the unexplained (or badly explained) behaviours. In this case a better model is needed, information regarding the models (current inputs and goals) is sent to the attention control and to the model acquisition. The first one will try to expand the Area of Interest in order to search for new data that makes a better model. The second one will correlate the new model. (NB: Models are identified using Neural Networks, so it could happen that a model could be enhanced without adding any more information but simply because when reviewing it there are more data available for the Neural Networks) Technical solution Implementation Example A2. Activity: Send notification to the attention control. The information will be send to the Attention Control module so it can generate a new Area of Interest and that will be used to create a new model that will take into consideration the new information. Technical solution Not applicable. Implementation A Replicode program. Example HUMANOBS Reasoning Process V1.2 5 References [Alarcon et al., 1994] Alarcon, I., Rodriguez-Marin, P., Almeida, L., Sanz, R., Fontaine, L., Gomez, P., Alaman, X., Nordin, P., Bejder, H., and de Pablo, E. (1994). Heterogeneous integration architecture for intelligent control systems. IEEE Intelligent Systems Engineering, 3(3):138–152. [Lyons and Arbib, 1989] Lyons, D. and Arbib, M. (1989). A formal model of computation for sensory-based robotics. IEEE Transactions on Robotics and Automation, 5(3):280–293. [Morari 1986] C. E. Garcia and M. Morari. Internal Model Control 1 – A Unifying Review and Some New Results. Ind. Engr. Chen. Process Des. Dev., 25 (1986): pp. 252-65. [Pezzulo and Calvi, 2007] Pezzulo, G. and Calvi, G. (2007). Schema-based design and the AKIRA schema language: An overview. Anticipatory Behavior in Adaptive Learning Systems, pages 128–152. [Wolpert, D. M., Gharamani, Z., and Jordan, 1995], M. An internal model for sensorimotor integration. Science, 269:1179–1182. 1995. HUMANOBS Reasoning Process V1.2 6 GLOSSARY Action. Signal sent from the Reaction System to an output device that actuates on the environment. Area of Interest. Subset of the input data that results from applying some filter. Attention control. Module that applies a filter on raw data in order to have the Area of Interest. Backward chaining. Reasoning in reverse from a hypothesis—a potential conclusion to be proven—to the facts that support the hypothesis (starting from the desired state (goals) deduce the input states (that produce those goals) Bactracking. Algorithm that allows to follow back some chain of events, like going to the origin node from a graph, given the final arrived goal. Forward chaining. It is reasoning from facts to the conclusion resulting from these facts (starting from the input states arrive to some final states -goals-) Forward model. It is a model that implements forward chaining. Goal. Desired state. (Replicode) Group. “The global workspace is organized as graphs of sub-workspaces called groups. Groups are objects that contain views on other objects. What distinguishes between objects and views is essentially that objects contain code (e.g. algorithms or data) whereas views contain data that qualify the code. Code is qualified by control values such as activation (controls whether or not a program can run) and saliency (controls whether or not code can be an input for a program).. [from Replicode language specification v1.1a] Implementation process. It consists on the construction of the models and its use by the Reaction System Integration process. It consists on identifying the need of new models and controlling the mechanism to integrate them with the existing ones. Inverse model. It is a model that implements backward chaining. Masterplan. It is the initial knowledge, about the environment it is going to interact with, given to the proper architecture modules (Reaction System, Attention Control). HUMANOBS Reasoning Process V1.2 Metrics. They are a set of values that can be used in order to determine the performance of the Reaction System. Model. It is a simplified representation of the behaviour of the environment. In this project there are only forward and backward models. Model Acquisition. It is the module of the architecture in charge of constructing Replicodes models from the data supplied by the Area of Interest. Model Revision. It is the module of the architecture in charge of revising the behaviour of the current models, and deciding the need of better models or new models (to explain new behaviours) Performance. It is the way the Reaction System (or models in it) interacts with the environment. Global performance. It is a measure of the performance of the Reaction System. Local performance. It is a measure of the performance of models in the Reaction System. Program. It is a Replicode construct. “Programs react to incoming objects. A program defines a pattern of time-constrained object sequences, including patterns and guards (conditions) on the individual incoming objects and produces new objects built from the incoming ones, said new objects being called productions. Programs are reactive, i.e. they perform as described whenever incoming objects match the time-pattern and guards. The transformation of inputs into productions is called reduction. Programs are state-less and have no side-effects.” [from Replicode language specification v1.1a] Progress Monitoring. It is a module of the architecture in charge of measuring global and local performances. It also provides relevant information to the Attention Control module. Reaction System. It is a module of the architecture in charge of interacting with the environment. Reactively. Computing an action given the inputs and goals. Reduction. It is the procedure of executing a program or model, it happens when the input conditions are satisfied. Schema. It is a model based controller, composed of a pair of coupled forward and inverse models. Simulation. It is the execution of models/schemas given an initial input and goals, removing the connection to the environment. Actions computed in the simulation are the new inputs for subsequent simulation cycles. State. It is the value of a system variable. System. It is a set of interrelated variables in a defined boundary. WCET (Worst case execution time). It is an estimation of how long would it take the execution of a program considering the worst scenario.
© Copyright 2026 Paperzz