Of Dogs and Robots Where the learning states of adaptive agents are modelled using a set of automata Francesco Figari ([email protected]) supervised by Michael Rovatsos and Ed Hopkins Abstract Return to the Vacuum World Building the model In a multi-agent environment we can expect to encounter agents that are not only autonomous, but also capable of learning from the behaviour of others. A small example of the task we face: an extension of the Vacuum World. The method that we are currently researching uses finite state machines to model the states of the opponent. We compute a best response to the model while we play, until we can consider the machine complete, e.g. until we cannot improve our strategy by modifying it. Complete machines are added to the set of visible states of the opponent. The World is a (quite dirty) grid, inhabited by a Cleaning Robot that can move one cell at a time, or clean the square that it currently occupies. In this setting, the way one chooses to act can influence the decisions of other agents, suggesting solutions or showing weaknesses. One's learning process cannot therefore be decoupled from what one teaches others through his own actions, even without explicit communication. We present a learning algorithm that can model the phases of the interaction with a wide class of agents, including adaptive ones, and suggest the best strategy in order to achieve the design goal of our agent, in the setting of stochastic games. Model-based learning Inference Our Hero Cleaning is however a noisy task, and disturbs the Sleepy Dog that lives on the grid. When the Robot is vacuuming in a neighbouring square, the dog wakes up and wanders looking for a peaceful corner. If at some point the current model does not reflect the behaviour of the opponent, we start looking for a new machine. This might be already in our set, or we might need to restart the inference procedure above. We save information about this transition from one state to another. We also record our actions preceding it: in the future we might execute the sequence again, trying to reproduce the same reaction in the opponent! When facing agents with bounded resources, we can try to infer a model of the strategy they employ in the game, and compute the best response to their actions that will bring our agent closer to its goal. If necessary, we can update the model after each step to ensure that it is still consistent with the development of the game. As the interaction develops and we infer more machines and actions, we should be able to build a complete set of behaviours of the other agent, and a set of macro-actions that we can use to influence its state. Models have limited expressive power, and can only represent certain classes of agents. No general model exists yet for adaptive agents. The question But if the Robot tries to clean the square where the Dog is resting, the scared pet will run around the grid, leaving dirt everywhere on its path! Definitely counter-productive for our Robot... In front of such adaptive opponents, how can an agent learn to act best, in order to achieve its design objective? We can assign a value to these states, depending on their utility toward our goal, and their stability. We will be able to learn which macroactions are effective, and discover the best states in the model. If we are lucky, we might find an equilibrium, a state from which both players have no incentive to deviate. Would it be possible to model the adaptive process of another agent, use this knowledge to develop a corresponding game strategy, and possibly also for cooperation or exploitation of specific behaviours? Our approach We use a hierarchical model to collect information on the way the opponent plays and adapts to our agent's actions. Moreover, the Dog is an adaptive agent, and it might develop a new strategy to achieve is goal (and sleep) at any time during the "game". For example, it could start chasing the Robot around the grid, spreading dirt of course, until the Robot stops in a square for some time. On the bottom layer of the hierarchy, we model the visible states of the opponent, the stationary behaviours that it exhibits while playing. On top of these, we build a graph of the relations among these states, and the cause-effect dependencies from our actions. If our Robot wants to clean the grid efficiently, it needs to learn the different reactions of the dog. If it succeeds, it might even find a way to clean under the dog! Additionally, the Robot should never let its guard down, and be ready to react to unexpected behaviours. Concluding (buzz)words The specific context of our research is multi-agent reinforcement learning in repeated games. Both the model and the algorithm are still under development. Extensive empirical evaluation will be necessary to ascertain if the method achieves at least best response against the target class of relatively stationary opponents, and avoids exploitation.
© Copyright 2026 Paperzz