João A. Fabro, Luis P. Reis and Nuno Lau Using Reinforcement Learning techniques to select the best Action in Setplays with multiple possibilities in Robocup Soccer Simulation teams. * Abstract— Setplays are predefined collaborative coordinate actions that players from any sport can use to gain advantage over its adversaries. Recently, a complete framework for creation and execution of this kind of coordinate behavior by teams composed of multiple independent agents was launched as free software (the Setplay Framework). In this paper, an approach based on Reinforcement Learning(RL) is proposed, that allows the use of experience to devise the better course of action in setplays with multiple choices. Simulations results show that the proposed approach allows a team of simulated agents to improve its performance against a known adversary team, achieving better results than previously proposed approaches using RL. I. INTRODUCTION Robotic soccer [1][2] is a research problem proposed as a challenge to the robotics community, using the soccer game as a standard comparison platform. In order to tackle this challenge, several “leagues” have been proposed, each one using a set of specific, simplified rules, to evaluate the different proposed approaches. One of the first of these “leagues” was the simulation one, which focuses on Artificial Intelligence problems such as: how to obtain collaborative behavior in an environment with multiple agents, limited communication capability, and noisy sensing of the environment. The simulation category thus abstracts problems such as computer vision, mechanic and electronic construction, to focus on the development of distributed control algorithms. Eleven simulated soccer-playing agents (plus a coach agent) have to successfully execute complex cooperative behavior, in order to behave according to the rules, and outmatch adversarial teams (by defending, passing the ball, an ultimately score more goals than the adversary). Each agent must situate itself in the field based on incomplete information sensed from the simulated environment, plan and execute its actions, and cooperate with teammates, and try to avoid that the adversary team do the same. Since the proposition of the challenge [1], learning techniques were considered for these skills [3], and in recent years, the use of learning techniques have been evaluated in several specific situations with relative success [4-7]. One way to obtain complex collaborative behavior in an environment with limited communication is to “predefine” the behavior. This is accomplished by human soccer team by using set pieces (also known as Setplays)1. Setplays are just organized sequences of moves and actions that, when executed cooperatively in a synchronized way by a set of players, can enable the team to achieve certain objectives * J. A. Fabro is with UTFPR- Federal University of TechnologyParaná(UTFPR), Av. Sete de Setembro, 3165, CEP 80230-901, Curitiba, Paraná, Brazil (corresponding author: +55 (41) 3310-4743; e- mail: [email protected]). L. P. Reis is with DSI/School of Engineering, University of Minho - Guimarães, and is also with LIACC – Artificial Intelligence and Computer Science Lab., University of Porto, Portugal (e-mail: [email protected]). N. Lau is with IEETA, UA – Inst. Eng. Elect and Telematics of Aveiro, University of Aveiro, Portugal (e-mail: [email protected]) 1 See “http://www.professionalsoccercoaching.com/free-kicks/soccerfree kicks2” for a description and example of a complete Setplay. during the match. Commonly, setplays can be used in direct or indirect free kicks, after faults. By executing the setplay, players of the team can surprise the adversary, and get to advantageous situations and opportunities to score goals. Mota et al. [8] have recently developed a set of tools to specify and execute setplays in the context of Robocup soccer-playing agents, testing the tools in both simulation and real robots [9-11]. In this paper, a new form of learning is proposed, that brings together the advantages of having pre-planned setplay behavior, with the adaptability provided by a learning approach. Recently, a complete framework for the specification, execution and graphic design of Setplays was introduced [12]. Setplays are sequences of actions that should be executed cooperatively by a set of players in order to achieve an objective during the match. The most common setplays are used for corner kicks and direct free kicks (for example after a fault in the game). After this brief introduction, the remaining of this paper is organized as follows. Section 2 presents the Setplay Framework, a set of free software libraries and tools that allow for the graphical definition and posterior execution by 2D simulation Robocup teams. In section 3, a Reinforcement Learning approach to learn the best selection of actions in setplays with multiple options is presented and detailed. Section 4 present some experimental results in simulated games, and section 5 presents a discussion of the results and some conclusions. II. THE SETPLAY FRAMEWORK The Setplay Framework is a set of free software tools recently made available, that is composed of a library of classes (fcportugalsetplay)2, a graphic specification tool (FCPortugal Splanner3 [13]), and a complete example team 4, based on Agent2D [14], that can execute Setplays (FCPortugalSetplaysAgent2D). The FCPortugalSetplays Agent2D team was developed using as base the Agent2D source code (version 3.1.1), available in [14]. In order to use the Setplay Framework, the first step is to use the graphic tool (SPlanner) to specify the coordinated behavior. In Figure 1, a screen shot from the tool is presented. Using this tool, it is possible to develop complete specifications of coordinated behavior from any possible start-point of a setplay: throw-in after a ball escapes by the lateral of the field, corner kicks, and direct or indirect free-kicks starting on any specified position of the field [13]. This tool can export all the relevant information to a standardized XML specification file, that can be parsed and executed by the Setplay Library [15]. This library provides all the functions necessary to execute the coordination of actions among different robotic agents, within any robot soccer league. This library 2 http://www.sourceforge.org/projects/fcportugalsetplays 3 http://www.sourceforge.org/projects/fcportugalSPlanner 4 http://www.sourceforge.org/projects/fcportugalsetplaysagent2d has been successfully used to provide setplay ability already in 3 different leagues, 2D [11] and 3D simulation, and a Middle Size team in Robocup [15]. The second step in the execution of a setplay is the start signal, that can be indicated by any soccer playing robot. The integration of the player with the setplay library occurs when the player uses the library to evaluate the availability of a setplay for execution, and then use the library to obtain the actions that should be executed in order to carry on the setplay execution. This is done by implementing a series of functions that inform the library about the state of the environment (for example returning the estimated position of the player, the position of the ball, position of nearby teammates, possibility of executing a pass or a kick to the goal). It is also necessary to implement communication functions (allowing the exchange of simple messages among teammates), and action functions (implementing actions such as dribble, pass, kick). When a player identifies the necessary conditions to start a setplay (calling the function “feasibleSetplays” from the library), it can decide to start it by using the library to select the players that will participate (according to their distance to the positions in the beginning of the setplay - step 0), and them send messages informing them to get to their initial positions. After the “wait time” of the first step, the execution starts, and each player receives a message informing the current step and, by consulting the setplay library, fetch and execute their action for this step. Only the player currently in possession of the ball can send the message informing the transition to another step. The possible actions that this player can execute are: a direct pass to another teammate, a forward pass to an advanced position, to be intercepted by another player, a dribble, carrying the ball towards a predefined position, or a shoot to the goal. At each step, there can be more than one possibility of action. In Figure 1 a complete example of setplay with multiple options is presented, as graphically described with the Splanner graphical tool. After a direct free kick inside the central region of the field, the player in possession of the ball (kicker) kick it towards a companion (receiver 1), while two other players (receivers 2 and 3) run to predefined positions preparing themselves for a possible pass in the second step. When receiver 1 receives the ball, there are two options for the continuity of the setplay: to pass to receiver 2, or to pass to receiver 3. This exact situation appears in the graph description of Figure 2. In this step, the player in possession of the ball must choose the best option, but this decision depends on a series of conditions: the positions of the players, clear spaces for the passes, and the positions of the receivers. Since the game is fluid, this decision is hard to make. The approach proposed in this paper is to use a Reinforcement Learning technique to “learn” which decision works better. In the next section, a brief explanation of the reinforcement learning technique (based on the Q-Learning mechanism) is presented, and after that the complete approach is detailed. III. REINFORCEMENT LEARNING Reinforcement Learning [16] was introduced as a formalism to allow agents to learn directly from their interaction with the environment, through feedback received about the results of actions. The technique is based on an algorithm that seeks to find the best action among a set of possible, given a specific situation. The mapping between situations and actions is called a “policy”, and the selection of actions is random at first, but a “reward” function is used to adjust this selection, given higher “strength” to actions that result in bigger rewards. Learning takes place during the interaction between the agent and its environment, using evaluations of the outcome of decisions to provide the feedback. The outcome of the decisions are reinforced if the results are considered “good”. There are several such techniques, but the simplest is the Q-learning algorithm, that is extensively used in related works on simulated robots soccer on Robocup [4-7]. A. Q-Learning The Q-learning method is better suited to use in situations where the interaction between the agent and its environment can be modeled in a finite and discrete way. In the specific case of Robocup soccer simulation, the interaction between the soccer playing agents and the simulated environment is executed one step at a time, in a discrete and synchronized way: at each iteration, the agent receives local information about the environment in its vicinity by a set of simulated sensors (for example the positions of players, ball and relevant markers of the world that are in its field of vision, auditory messages from teammates in its vicinity, commands from the automated referee). In order to execute any action in the environment, the agent must send commands to the simulator, that vary according with the category, but usually is an action that must be executed during the next time step. By discretization of the observed state of the world, and by choosing just one actions among a finite set of available ones, it is possible to model the mapping between world-states and actions, and then use a reinforcement learning technique to find a policy, if there is any way of providing a reward that is proportional to the results of the actions [17]. Each step of the Q-learning update algorithm is defined by the following expression: Q(st,at) = Q(st,at) + α[rt+ γ maxQa(st+1,a)-Q(st,at)] (1) - where st corresponds to the current state; - at is the action taken at state st; - rt is the reward received by taking the action at at the state st; - st+1 is the next state; - γ(gama) is the discount factor (0< γ <1); - α(alpha) is the learning rate (0 <= α <1). The function Q(st, at) is the value associated with the stateaction pair (st, at) and represents how good is the choice of this action in maximizing the cumulative return function. The action-value function Q(st, at), that store the reinforcements received, is updated from its current value for each state-action pair. Thus, a part of the reinforcement received in a state will be transferred to the state prior to this. Figure 1: Setplays definition tool - SPlanner - and example of setplay. In Q-learning algorithm, the choice of action to be performed in a given state of the environment can be made with any criteria of exploration/exploitation, including randomly. A policy that is a widely used is called є-greedy, where the agent can choose a random action, or the action that has the largest value increase in Q(st, at) [17]. The choice with the єgreedy policy occurs as follows (equation 2): arandom if q≤ε maxQ (s t , a t ) otherwise (2) that lead to goals scored. Details of the learning procedure used can be found in [18]. In section 4 the application of this algorithm to the selection of Transitions when in multiple-choice Steps of a Setplay is presented. The approach proposed for the team FCP_GPR_2014 is to enable the selection of the next Transition when on a given Step of a Setplay using Machine Learning. The example of Setplay presented in Fig. 1 has several such States in which there multiple Transitions (Steps 2, 3, 7 and 9). The graphical representation of the setplay as a directed graph in the lower left corner of the figure is detailed in Fig. 2. If the value of q chosen at random is less than the value ε set, a random action is selected, otherwise the action in table Q(st, at) with the largest reinforcement value assigned is selected. IV. THE PROPOSED APPROACH – APPLYING MACHINE LEARNING TO ACTION SELECTION ON MULTIPLE-CHOICE SETPLAYS Based on a complete team available in [14], a new decision making procedure was implemented allowing the agents that where inside the attack area to take action based on previous experience – a reinforcement learning technique. Only the player in possession of the ball would take action based on previous experience using the reinforcement learning approach – the Q-Learning algorithm – that allows the simulated agents to select the best action when in possession of the ball. The reinforcement approach “reinforces” actions Figure 2: Graph of the Setplay with multiple options of Transitions - example. In the original Setplay framework, the selection of which transition to execute was defined by conditions, i.e., if the receiving player is positioned, or if the pass have low probability of being intercepted, the Transition was chosen. If more than one Transition was “enabled”, usually the first one was executed. The proposed approach is to let an adaptive procedure select the best transitions to execute (when there are more than one possibility) by reinforcement learning. In the graph of figure 2, this happens in states 2 (with possible transitions to states 3 or 7), state 3 (possible transitions to states 4, 5 or 6), state 7 (possible transitions to 8 or 9) and state 9 (possible transitions to 10 or 11). Using the Q-Learning algorithm to evaluate the rate of success of each decision, it is possible to infer the option with higher chance of success, given enough opportunities for the learning to take place. To obtain this kind of adaptive behavior, a matrix correlating Steps and Transitions is proposed, as exemplified in Table 1. For the setplay shown in figure 1, and detailed in figure 2, 10 states were defined, and 11 possible actions (transitions) leading to a 10x11 matrix. This matrix is then used by the Q-learning algorithm to infer the best policy, i.e., which are the best transition to choose in each state. Currently, if there is only one option, such as in states 0 and 1, the algorithm has no effect in the decision. But in the multi-option states, it is possible to infer the most rewarding options, simply providing reinforcement to every successfully completed action. Thus, during the execution of the Setplay, the player currently in possession of the ball evaluate the possible actions, choosing the one with the best accumulated reward. If the action is correctly performed (for example, the pass is correctly received by the teammate), this option is rewarded, providing reinforcement (actually, the reinforcement can be different for each action, but in our current approach every reinforcement is constant, and equal to 100). In order to evaluate every possible action, an ε-greedy allows that a random to be chosen according to a defined percentage (in our case, 20%). Q-LEARNING MATRIX FOR THE SETPLAY OF FIG. 2 . TABLE I. Transition 1 to→ 2 3 4 5 6 7 8 9 10 11 Step:↓ 0 100 - - - - - - - - - - 1 - 100 - - - - - - - - - 2 - - 75 - - - 20 - - - - 3 - - - 55 85 35 - - - - - 4 - - - - - - - - - - - 5 - - - - - - - - - - - 6 - - - - - - - - - - - 7 - - - - - - - 20 60 - - 8 - - - - - - - - - - - 9 - - - - - - - - - 70 40 10 - - - - - - - - - - - As it can be seen in Table I, there is only one used cell in each column of the matrix. This is due to a restriction defined by the Setplay Framework: the “graph” representing the execution of the setplay must not have cycles, thus been better represented by a “tree”. In this case, it is possible to represent all the reinforcement elements of the Q-Learning matrix in one single line, where each position represents the “quality” of the transition to that state. In Table II, the same information present in Table I is re-adjusted to use only one line of the matrix. It is now necessary to use some additional information about the “tree” in order to execute the correct calculations (i.e., in order to choose the best transition in the matrix represented in Table 1, it suffices to find index of the column with the highest value. To execute the same calculations in the single line of Table II, it is also necessary to know which transitions are “possible” from each state, and choose the highest among them). TABLE II. Q-LEARNING MATRIX (REPRESENTED AS JUST A LINE) FOR THE COMPLETE SETPLAY OF FIG. 2. Transition 1 to→ Setplay:↓ 0 2 3 4 5 6 7 8 9 10 11 100 100 75 55 85 35 20 20 60 70 40 By applying this optimized representation, it is possible to accommodate a complete set of different setplays in one single learning matrix, using the unique setplay ID number as the index of the line, and as many columns as needed by the biggest setplay(in terms of number of steps). Usually, a team should have one setplay for each of the following situations: Kick-off, keeper catch, goal kick, and corner kick (although it is possible to have two for each situation, if it is necessary to have different behavior depending on the side of the field – left or right. Usually just allowing the setplay to be invertible is enough). But for the following situations, there can exist one setplay for each position where it begins: throw in, direct and indirect-free-kicks. There are 6 different positions where a throw-in can start (our back, our middle, our front, their front, their middle and their back). In the case of direct or indirect free kicks, the initial position can be any combination among these 6 positions, and 6 transverse areas (far left, mid left, center left, center right, mid right and far right) giving a total amount of up to 36 possibilities. Despite this large amount of possible setplays, the framework limits the amount of different setplays that can be used simultaneously by a team to only 63, thus the biggest matrix needed would be of 63 lines by the maximum number of states of the biggest setplay. During the “training phase” of the Q-Learning algorithm, this matrix has to be continuously updated, every time a setplay action is successfully executed. But since only the player in possession of the ball is responsible for this update, there are no “race conditions” that can lead to corruption of the matrix, so it is saved and read from a standard text file, stored on a shared file system to which all players have reading and writing access. After the learning reaches a stabilization point, the matrix can be “frozen”, and then need only to be read once, during the initialization of each agent, and used to select the best action in each state of any setplay. V. EXPERIMENTS In this section, some experiments were realized in order to evaluate the proposed approach. To provide a first baseline for the experiments, initially the FCPortugalSetplaysAgent2D code was evaluated. Using the default set of setplays provided by the author, 100 games where simulated against the well known and widely used Agent2D 3.1.1. During this experiments, it was concluded that the FCPortugalSetplaysAgent2D code, by providing the use of coordinated behavior, obtained an improvement over Agent2D, with 62 victories and only 32 defeats during regular play time (6 games finished in a draw, and during the penalty shootout each team won 3 games). By adjusting these results to probabilities, it can be clearly seen that the use of setplays lead to a 65% of victories, against the expected 50% in the case of teams with the same level of gameplay. After these simulations, the setplays where modified to allow multiple-choice options, and 1000 games where simulated, between this new version (that will be called FCP_Setplays_RL from now on) and the Agent 2D 3.1.1 code, in order to accomplish the stabilization of the Q-Learning matrix. After that, another 100 games were simulated, but with the frozen matrix learned in the previous 100 simulations. The results improved, with FCP_Setplays_RL obtaining 91 victories against only 7 defeats (and two draws), resulting in an impressive winning percentage of 93%. Against another team that also uses Reinforcement Learning, GPR2013, the results where 52 victories and 21 defeats, with 27 draws, and a percentage of victories of 71%. Against Against the three best placed teams, otherwise, the results, although promising, weren't so good. Against WrightEagle2013 (current champion), 8 victories and 88 defeats (6 draws), resulting in 8.5% chance of winning. Against Helios2013(vice-champion), 6 victories and 84 defeats (10 draws), with a chance of 6.67% of winning, and against Yushan2013 (third place), 18 victories and 51 defeats (31 draws), with a chance of 26% of winning a match. VI. CONCLUSIONS AND DISCUSSION The approach proposed in this paper joints the coordinated behavior proposed by FCPortugalSetplays Agent2D with a reinforcement learning algorithm used to decide the best actions according to specific situations during the game. The results obtained during simulation experiments demonstrate the improvement obtained by allowing adaptive behavior during the decision making process, when setplays have more than one possible option of action to choose from. In order to evaluate the proposed approach, several setplays with multiple options where created (using a toolset recently developed and made available as free software), and evaluated using a simple QLearning algorithm. Experimental simulation show that the results are promising, but more experiments must be executed, with new multi-choice setplays, in order to fully evaluate the advantages that can be obtained by the use of the proposed approach (in this evaluation, only one setplay, presented as example, was used in order to evaluate the improvement of the team evaluation after the use of Q-Learning to select the “best” choices of actions). In comparison with Agent2D 3.1.1, the inclusion of Reinforcement Learning in the team with Setplay capability increased the performance from 65% (using only a “static” setplay) to 93% (with use of the “multiple-choice” setplay and learning to select the best option). Against intermediate teams, such as FCPortugal2013 and GPR2D2013, the use of multiple-choice setplays and RL also presented good results, with winning chances of 75% (against FCPortugal2013) and 71% (against GPR2D2013). But against the best teams, the results still have a lot of room to improve, with less than 10% of winning chance against both the current champion (WrightEagle2013) and vice-champion (Helios2013), but with 25% of chance of winning against the third placed team, Yushan2013. In the near future, it is planned to use other approaches to improve the adaptability of the approach, allowing the training against several different adversary teams, and using an automatic procedure to identify the style of play of the adversary, in order to select the matrix that provides the best results against that specific adversary. Other ideas include the use of some kind of heuristic search to automatically adjust the positions of the players during the execution of each setplay. ACKNOWLEDGMENT The first author would like to thank CAPES for his scholarship, process No. BEX 9292/13-6. All the authors would also like to acknowledge the Robocup community for its support, specially team Helios [14], and Luís Mota, for the development of the Setplay Framework and support in the development of this work. REFERENCES [1] [2] [3] [4] [5] [6] Kitano H., Asada M., Kuniyoshi Y., Noda I. “The robocup synthetic agent challenge”. International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, Japan (1997). Asada, M. and Kitano, H. “The RoboCup Challenge”, Robotics and Autonomous Systems, Volume 29, Issue 1, Pages 3-12, October, 1999. Stone, P. and Veloso, M. “A Layered Approach to Learning Client Behaviors in the RoboCup Soccer Server”. Applied Artificial Intelligence, 12:165–188, 1998. Farahnakian, F.; Mozayani, N. "Reinforcement Learning for Soccer Multi-agents Sytem" International Conference on Computational Intelligence and Security. CIS '09, Beijing, China, pp. 50-52, December 11-14, 2009. Xiong, L., Wei, C.; Jing, G.; Zhenkun, Z.; Zekai, H. “A new passing strategy based on Q-learning algorithm in RoboCup”, International Conference on Computer Science and Software Engineering, pp. 524527, December 12-14, 2008. Rabiee, A. And Ghasem-Aghaee, N. “A Scoring Policy for Simulated Soccer Agents using Reinforcement Learning”, 2nd International [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] Conference on Autonomous Robots and Agents, Palmerston North, New Zealand , December 13-15, 2004. Leng, J; Fyfe, C; Jain, L. “Simulation and reinforcement learning with soccer agents ”, in Multiagent and Grid Systems , Vol. 4, N. 4, pp. 415436, 2008. Mota, L.; Lau, N.; Reis, L.P.; Co-ordination in RoboCup's 2D simulation league: Setplays as flexible, multi- robot plans, 2010 IEEE Conf. on Robotics, Automation and Mechatronics, RAM 2010, pp. 362-367. Mota, L.; Reis, L.P.; An Elementary Communication Framework for Open Co-operative RoboCup Soccer Teams, in Sapaty P; Filipe J (Eds.) 4th Int. Conf. on Informatics in Control, Automation and Robotics ICINCO 2007, pp. 97-101, Angers, France, May 9-12, 2007 Mota, L.; Reis, L.P.; Setplays: Achieving Coordination by the appropriate Use of arbitrary Pre-defined Flexible Plans and inter-robot Communication, RoboComm 2007 - First Int. Conf. on Robot Communication and Coordination, Athens, Greece, October 15-17, 2007. Lau, N.; Reis, L. P.; Mota, L.; Almeida, F. FC Portugal 2D Simulation: Team Description Paper, online, available at: http://staff.science.uva.nl/~arnoud/activities/robocup/RoboCup2013/ Symposium/TeamDescriptionPapers/SoccerSimulation/Soccer2D/, consulted on Jan/2014. Mota, L.; Reis, L.P.; A Common Framework for Cooperative Robotics: an Open, Fault Tolerant Architecture for Multi-league RoboCup Teams, Int. Conf. Simulation Modeling and Progr. for Aut. Robots (SIMPAR), Springer, LNCS/LNAI series, pp. 171-182, Venice, Italy, Nov, 2008. Cravo, J. G. B. (2012). SPlanner: a graphical application for the Flexible definition of Setplays in Robocup (in portuguese). MsC. Dissertation, Integrated Master in Computer and Informatics Engineering, Faculty of Engineering, University of Porto. Online, available at: http://repositorioaberto.up.pt/bitstream/10216/62120/1/000149781.pdf, consulted on Jan/2014. Akiyama, H. “Helios RoboCup Simulation League Team”, online, available at: http://rctools.sourceforge.jp/pukiwiki/, Acessed: Jan/2014. Mota, L. “Multi-robot Coordination using Flexible Setplays: Applications in RoboCup's Simulation and Middle-Size Leagues”, PhD Thesis, LIACC – Artificial Intelligence and computer Science Lab., Faculty of Engineering, Porto University. Advisors: L. P. Reis and N. Lau, 2012. Sutton, R. S.; Barto, A. G. Reinforcement Learning: An Introduction. Massachusetts: MIT Press, Cambridge, 1998. Dayan, P. Technical Note Q-learning, Centre for Cognitive Science, University of Edinburgh, Scotland: University of Edinburgh, 1992. Neri, J.R.F.; Zatelli, M.R.; Farias dos Santos, C.H.; Fabro, J.A.; , A Proposal of QLearning to Control the Attack of a 2D Robot Soccer Simulation Team, 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium (SBR-LARS), pp.174-178, 16-19 Oct. 2012. Fabro, J. A.; Botta, A. L. C.; Parra, G. A. P.; Neri, J. R. F.; The GPR-2D 2013 Team Description Paper, online, available at: http://staff.science.uva.nl/~arnoud/activities/robocup/RoboCup2013/ Symposium/TeamDescriptionPapers/SoccerSimulation/Soccer2D/, consulted on Jan/2014. Lau, N.; Lopes, L. S.; Corrente, G. and Filipe, N.; Multi-Robot Team Coordination Through Roles, Positioning and Coordinated Procedures, Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems – IROS 2009, St. Louis, USA, Oct. 2009. Lau, N.; Lopes, L. S.; Corrente, G. and Filipe, N.; Roles, Positionings and Set Plays to Coordinate a MSL Robot Team, Proc. 14th Port. Conf. on Artificial Intelligence, EPIA'2009, Aveiro, LNAI 5816, Springer, pp 323-337, October 12-15, 2009.
© Copyright 2026 Paperzz