Using Reinforcement Learning techniques to select the best Action

João A. Fabro, Luis P. Reis and Nuno Lau
Using Reinforcement Learning techniques to select the best Action in Setplays with
multiple possibilities in Robocup Soccer Simulation teams. *
Abstract— Setplays are predefined collaborative coordinate
actions that players from any sport can use to gain advantage
over its adversaries. Recently, a complete framework for
creation and execution of this kind of coordinate behavior by
teams composed of multiple independent agents was launched
as free software (the Setplay Framework). In this paper, an
approach based on Reinforcement Learning(RL) is proposed,
that allows the use of experience to devise the better course of
action in setplays with multiple choices. Simulations results
show that the proposed approach allows a team of simulated
agents to improve its performance against a known adversary
team, achieving better results than previously proposed
approaches using RL.
I. INTRODUCTION
Robotic soccer [1][2] is a research problem proposed as a
challenge to the robotics community, using the soccer game
as a standard comparison platform. In order to tackle this
challenge, several “leagues” have been proposed, each one
using a set of specific, simplified rules, to evaluate the
different proposed approaches. One of the first of these
“leagues” was the simulation one, which focuses on Artificial
Intelligence problems such as: how to obtain collaborative
behavior in an environment with multiple agents, limited
communication capability, and noisy sensing of the
environment. The simulation category thus abstracts
problems such as computer vision, mechanic and electronic
construction, to focus on the development of distributed
control algorithms. Eleven simulated soccer-playing agents
(plus a coach agent) have to successfully execute complex
cooperative behavior, in order to behave according to the
rules, and outmatch adversarial teams (by defending, passing
the ball, an ultimately score more goals than the adversary).
Each agent must situate itself in the field based on
incomplete information sensed from the simulated
environment, plan and execute its actions, and cooperate
with teammates, and try to avoid that the adversary team do
the same. Since the proposition of the challenge [1], learning
techniques were considered for these skills [3], and in recent
years, the use of learning techniques have been evaluated in
several specific situations with relative success [4-7].
One way to obtain complex collaborative behavior in an
environment with limited communication is to “predefine”
the behavior. This is accomplished by human soccer team by
using set pieces (also known as Setplays)1. Setplays are just
organized sequences of moves and actions that, when
executed cooperatively in a synchronized way by a set of
players, can enable the team to achieve certain objectives

* J. A. Fabro is with UTFPR- Federal University of TechnologyParaná(UTFPR), Av. Sete de Setembro, 3165, CEP 80230-901,
Curitiba, Paraná, Brazil (corresponding author: +55 (41) 3310-4743; e-
mail: [email protected]).
L. P. Reis is with DSI/School of Engineering, University of Minho -
Guimarães, and is also with LIACC – Artificial Intelligence and
Computer Science Lab., University of Porto, Portugal (e-mail:
[email protected]).
N. Lau is with IEETA, UA – Inst. Eng. Elect and Telematics of
Aveiro, University of Aveiro, Portugal (e-mail: [email protected])
1
See “http://www.professionalsoccercoaching.com/free-kicks/soccerfree
kicks2” for a description and example of a complete Setplay.
during the match. Commonly, setplays can be used in direct
or indirect free kicks, after faults. By executing the setplay,
players of the team can surprise the adversary, and get to
advantageous situations and opportunities to score goals.
Mota et al. [8] have recently developed a set of tools to
specify and execute setplays in the context of Robocup
soccer-playing agents, testing the tools in both simulation
and real robots [9-11].
In this paper, a new form of learning is proposed, that
brings together the advantages of having pre-planned setplay
behavior, with the adaptability provided by a learning
approach.
Recently, a complete framework for the specification,
execution and graphic design of Setplays was introduced
[12]. Setplays are sequences of actions that should be
executed cooperatively by a set of players in order to achieve
an objective during the match. The most common setplays
are used for corner kicks and direct free kicks (for example
after a fault in the game).
After this brief introduction, the remaining of this paper
is organized as follows. Section 2 presents the Setplay
Framework, a set of free software libraries and tools that
allow for the graphical definition and posterior execution by
2D simulation Robocup teams. In section 3, a Reinforcement
Learning approach to learn the best selection of actions in
setplays with multiple options is presented and detailed.
Section 4 present some experimental results in simulated
games, and section 5 presents a discussion of the results and
some conclusions.
II.
THE SETPLAY FRAMEWORK
The Setplay Framework is a set of free software tools
recently made available, that is composed of a library of
classes (fcportugalsetplay)2, a graphic specification tool
(FCPortugal Splanner3 [13]), and a complete example team 4,
based on Agent2D [14], that can execute Setplays
(FCPortugalSetplaysAgent2D). The FCPortugalSetplays
Agent2D team was developed using as base the Agent2D
source code (version 3.1.1), available in [14].
In order to use the Setplay Framework, the first step is to
use the graphic tool (SPlanner) to specify the coordinated behavior. In Figure 1, a screen shot from the tool is presented.
Using this tool, it is possible to develop complete specifications of coordinated behavior from any possible start-point of
a setplay: throw-in after a ball escapes by the lateral of the
field, corner kicks, and direct or indirect free-kicks starting
on any specified position of the field [13]. This tool can export all the relevant information to a standardized XML
specification file, that can be parsed and executed by the Setplay Library [15]. This library provides all the functions necessary to execute the coordination of actions among different
robotic agents, within any robot soccer league. This library
2
http://www.sourceforge.org/projects/fcportugalsetplays
3
http://www.sourceforge.org/projects/fcportugalSPlanner
4
http://www.sourceforge.org/projects/fcportugalsetplaysagent2d
has been successfully used to provide setplay ability already
in 3 different leagues, 2D [11] and 3D simulation, and a
Middle Size team in Robocup [15].
The second step in the execution of a setplay is the start
signal, that can be indicated by any soccer playing robot. The
integration of the player with the setplay library occurs when
the player uses the library to evaluate the availability of a setplay for execution, and then use the library to obtain the actions that should be executed in order to carry on the setplay
execution. This is done by implementing a series of functions
that inform the library about the state of the environment (for
example returning the estimated position of the player, the
position of the ball, position of nearby teammates, possibility
of executing a pass or a kick to the goal). It is also necessary
to implement communication functions (allowing the exchange of simple messages among teammates), and action
functions (implementing actions such as dribble, pass, kick).
When a player identifies the necessary conditions to start a
setplay (calling the function “feasibleSetplays” from the library), it can decide to start it by using the library to select
the players that will participate (according to their distance to
the positions in the beginning of the setplay - step 0), and
them send messages informing them to get to their initial positions. After the “wait time” of the first step, the execution
starts, and each player receives a message informing the current step and, by consulting the setplay library, fetch and execute their action for this step. Only the player currently in
possession of the ball can send the message informing the
transition to another step. The possible actions that this
player can execute are: a direct pass to another teammate, a
forward pass to an advanced position, to be intercepted by
another player, a dribble, carrying the ball towards a predefined position, or a shoot to the goal. At each step, there can
be more than one possibility of action.
In Figure 1 a complete example of setplay with multiple
options is presented, as graphically described with the Splanner graphical tool. After a direct free kick inside the central
region of the field, the player in possession of the ball
(kicker) kick it towards a companion (receiver 1), while two
other players (receivers 2 and 3) run to predefined positions
preparing themselves for a possible pass in the second step.
When receiver 1 receives the ball, there are two options for
the continuity of the setplay: to pass to receiver 2, or to pass
to receiver 3. This exact situation appears in the graph description of Figure 2. In this step, the player in possession of
the ball must choose the best option, but this decision depends on a series of conditions: the positions of the players,
clear spaces for the passes, and the positions of the receivers.
Since the game is fluid, this decision is hard to make. The
approach proposed in this paper is to use a Reinforcement
Learning technique to “learn” which decision works better.
In the next section, a brief explanation of the
reinforcement learning technique (based on the Q-Learning
mechanism) is presented, and after that the complete
approach is detailed.
III.
REINFORCEMENT LEARNING
Reinforcement Learning [16] was introduced as a formalism to allow agents to learn directly from their interaction
with the environment, through feedback received about the
results of actions. The technique is based on an algorithm
that seeks to find the best action among a set of possible,
given a specific situation. The mapping between situations
and actions is called a “policy”, and the selection of actions is
random at first, but a “reward” function is used to adjust this
selection, given higher “strength” to actions that result in
bigger rewards. Learning takes place during the interaction
between the agent and its environment, using evaluations of
the outcome of decisions to provide the feedback. The outcome of the decisions are reinforced if the results are considered “good”. There are several such techniques, but the simplest is the Q-learning algorithm, that is extensively used in
related works on simulated robots soccer on Robocup [4-7].
A.
Q-Learning
The Q-learning method is better suited to use in situations
where the interaction between the agent and its environment
can be modeled in a finite and discrete way. In the specific
case of Robocup soccer simulation, the interaction between
the soccer playing agents and the simulated environment is
executed one step at a time, in a discrete and synchronized
way: at each iteration, the agent receives local information
about the environment in its vicinity by a set of simulated
sensors (for example the positions of players, ball and relevant markers of the world that are in its field of vision, auditory messages from teammates in its vicinity, commands
from the automated referee). In order to execute any action in
the environment, the agent must send commands to the simulator, that vary according with the category, but usually is an
action that must be executed during the next time step. By
discretization of the observed state of the world, and by
choosing just one actions among a finite set of available ones,
it is possible to model the mapping between world-states and
actions, and then use a reinforcement learning technique to
find a policy, if there is any way of providing a reward that is
proportional to the results of the actions [17].
Each step of the Q-learning update algorithm is defined by
the following expression:
Q(st,at) = Q(st,at) + α[rt+ γ maxQa(st+1,a)-Q(st,at)]
(1)
- where st corresponds to the current state;
- at is the action taken at state st;
- rt is the reward received by taking the action at at the
state st;
- st+1 is the next state;
- γ(gama) is the discount factor (0< γ <1);
- α(alpha) is the learning rate (0 <= α <1).
The function Q(st, at) is the value associated with the stateaction pair (st, at) and represents how good is the choice of
this action in maximizing the cumulative return function.
The action-value function Q(st, at), that store the reinforcements received, is updated from its current value for each
state-action pair. Thus, a part of the reinforcement received
in a state will be transferred to the state prior to this.
Figure 1: Setplays definition tool - SPlanner - and example of setplay.
In Q-learning algorithm, the choice of action to be performed in a given state of the environment can be made with
any criteria of exploration/exploitation, including randomly.
A policy that is a widely used is called є-greedy, where the
agent can choose a random action, or the action that has the
largest value increase in Q(st, at) [17]. The choice with the єgreedy policy occurs as follows (equation 2):
arandom if q≤ε
maxQ (s t , a t ) otherwise
(2)
that lead to goals scored. Details of the learning procedure
used can be found in [18]. In section 4 the application of this
algorithm to the selection of Transitions when in multiple-choice Steps of a Setplay is presented.
The approach proposed for the team FCP_GPR_2014 is
to enable the selection of the next Transition when on a given
Step of a Setplay using Machine Learning. The example of
Setplay presented in Fig. 1 has several such States in which
there multiple Transitions (Steps 2, 3, 7 and 9). The
graphical representation of the setplay as a directed graph in
the lower left corner of the figure is detailed in Fig. 2.
If the value of q chosen at random is less than the value ε
set, a random action is selected, otherwise the action in table
Q(st, at) with the largest reinforcement value assigned is selected.
IV.
THE PROPOSED APPROACH – APPLYING MACHINE
LEARNING TO ACTION SELECTION ON MULTIPLE-CHOICE
SETPLAYS
Based on a complete team available in [14], a new decision making procedure was implemented allowing the agents
that where inside the attack area to take action based on previous experience – a reinforcement learning technique. Only
the player in possession of the ball would take action based
on previous experience using the reinforcement learning approach – the Q-Learning algorithm – that allows the simulated agents to select the best action when in possession of
the ball. The reinforcement approach “reinforces” actions
Figure 2: Graph of the Setplay with multiple options of
Transitions - example.
In the original Setplay framework, the selection of which
transition to execute was defined by conditions, i.e., if the
receiving player is positioned, or if the pass have low
probability of being intercepted, the Transition was chosen. If
more than one Transition was “enabled”, usually the first one
was executed.
The proposed approach is to let an adaptive procedure
select the best transitions to execute (when there are more
than one possibility) by reinforcement learning. In the graph
of figure 2, this happens in states 2 (with possible transitions
to states 3 or 7), state 3 (possible transitions to states 4, 5 or
6), state 7 (possible transitions to 8 or 9) and state 9 (possible
transitions to 10 or 11). Using the Q-Learning algorithm to
evaluate the rate of success of each decision, it is possible to
infer the option with higher chance of success, given enough
opportunities for the learning to take place. To obtain this
kind of adaptive behavior, a matrix correlating Steps and
Transitions is proposed, as exemplified in Table 1. For the
setplay shown in figure 1, and detailed in figure 2, 10 states
were defined, and 11 possible actions (transitions) leading to
a 10x11 matrix. This matrix is then used by the Q-learning
algorithm to infer the best policy, i.e., which are the best
transition to choose in each state. Currently, if there is only
one option, such as in states 0 and 1, the algorithm has no
effect in the decision. But in the multi-option states, it is
possible to infer the most rewarding options, simply
providing reinforcement to every successfully completed
action. Thus, during the execution of the Setplay, the player
currently in possession of the ball evaluate the possible
actions, choosing the one with the best accumulated reward.
If the action is correctly performed (for example, the pass is
correctly received by the teammate), this option is rewarded,
providing reinforcement (actually, the reinforcement can be
different for each action, but in our current approach every
reinforcement is constant, and equal to 100). In order to
evaluate every possible action, an ε-greedy allows that a
random to be chosen according to a defined percentage (in
our case, 20%).
Q-LEARNING MATRIX FOR THE SETPLAY OF FIG. 2 .
TABLE I.
Transition 1
to→
2
3
4
5
6
7
8
9
10 11
Step:↓
0
100
-
-
-
-
-
-
-
-
-
-
1
-
100
-
-
-
-
-
-
-
-
-
2
-
-
75
-
-
-
20
-
-
-
-
3
-
-
-
55
85
35
-
-
-
-
-
4
-
-
-
-
-
-
-
-
-
-
-
5
-
-
-
-
-
-
-
-
-
-
-
6
-
-
-
-
-
-
-
-
-
-
-
7
-
-
-
-
-
-
-
20
60
-
-
8
-
-
-
-
-
-
-
-
-
-
-
9
-
-
-
-
-
-
-
-
-
70
40
10
-
-
-
-
-
-
-
-
-
-
-
As it can be seen in Table I, there is only one used cell in
each column of the matrix. This is due to a restriction
defined by the Setplay Framework: the “graph” representing
the execution of the setplay must not have cycles, thus been
better represented by a “tree”. In this case, it is possible to
represent all the reinforcement elements of the Q-Learning
matrix in one single line, where each position represents the
“quality” of the transition to that state. In Table II, the same
information present in Table I is re-adjusted to use only one
line of the matrix. It is now necessary to use some additional
information about the “tree” in order to execute the correct
calculations (i.e., in order to choose the best transition in the
matrix represented in Table 1, it suffices to find index of the
column with the highest value. To execute the same
calculations in the single line of Table II, it is also necessary
to know which transitions are “possible” from each state, and
choose the highest among them).
TABLE II.
Q-LEARNING MATRIX (REPRESENTED AS JUST A LINE) FOR
THE COMPLETE SETPLAY OF FIG. 2.
Transition 1
to→
Setplay:↓
0
2
3
4
5
6
7
8
9
10 11
100 100 75 55 85 35 20 20 60 70 40
By applying this optimized representation, it is possible to
accommodate a complete set of different setplays in one
single learning matrix, using the unique setplay ID number
as the index of the line, and as many columns as needed by
the biggest setplay(in terms of number of steps). Usually, a
team should have one setplay for each of the following
situations: Kick-off, keeper catch, goal kick, and corner kick
(although it is possible to have two for each situation, if it is
necessary to have different behavior depending on the side of
the field – left or right. Usually just allowing the setplay to be
invertible is enough). But for the following situations, there
can exist one setplay for each position where it begins: throw
in, direct and indirect-free-kicks. There are 6 different
positions where a throw-in can start (our back, our middle,
our front, their front, their middle and their back). In the case
of direct or indirect free kicks, the initial position can be any
combination among these 6 positions, and 6 transverse areas
(far left, mid left, center left, center right, mid right and far
right) giving a total amount of up to 36 possibilities. Despite
this large amount of possible setplays, the framework limits
the amount of different setplays that can be used
simultaneously by a team to only 63, thus the biggest matrix
needed would be of 63 lines by the maximum number of
states of the biggest setplay. During the “training phase” of
the Q-Learning algorithm, this matrix has to be continuously
updated, every time a setplay action is successfully executed.
But since only the player in possession of the ball is
responsible for this update, there are no “race conditions”
that can lead to corruption of the matrix, so it is saved and
read from a standard text file, stored on a shared file system
to which all players have reading and writing access. After
the learning reaches a stabilization point, the matrix can be
“frozen”, and then need only to be read once, during the
initialization of each agent, and used to select the best action
in each state of any setplay.
V. EXPERIMENTS
In this section, some experiments were realized in order to
evaluate the proposed approach. To provide a first baseline
for the experiments, initially the FCPortugalSetplaysAgent2D code was evaluated. Using the default set of setplays provided by the author, 100 games where simulated against the
well known and widely used Agent2D 3.1.1. During this experiments, it was concluded that the FCPortugalSetplaysAgent2D code, by providing the use of coordinated behavior, obtained an improvement over Agent2D, with 62 victories and only 32 defeats during regular play time (6 games
finished in a draw, and during the penalty shootout each
team won 3 games). By adjusting these results to probabilities, it can be clearly seen that the use of setplays lead to a
65% of victories, against the expected 50% in the case of
teams with the same level of gameplay. After these simulations, the setplays where modified to allow multiple-choice
options, and 1000 games where simulated, between this new
version (that will be called FCP_Setplays_RL from now on)
and the Agent 2D 3.1.1 code, in order to accomplish the stabilization of the Q-Learning matrix. After that, another 100
games were simulated, but with the frozen matrix learned in
the previous 100 simulations. The results improved, with
FCP_Setplays_RL obtaining 91 victories against only 7 defeats (and two draws), resulting in an impressive winning
percentage of 93%. Against another team that also uses Reinforcement Learning, GPR2013, the results where 52 victories
and 21 defeats, with 27 draws, and a percentage of victories
of 71%. Against Against the three best placed teams, otherwise, the results, although promising, weren't so good.
Against WrightEagle2013 (current champion), 8 victories
and 88 defeats (6 draws), resulting in 8.5% chance of winning. Against Helios2013(vice-champion), 6 victories and 84
defeats (10 draws), with a chance of 6.67% of winning, and
against Yushan2013 (third place), 18 victories and 51 defeats
(31 draws), with a chance of 26% of winning a match.
VI.
CONCLUSIONS AND DISCUSSION
The approach proposed in this paper joints the
coordinated behavior proposed by FCPortugalSetplays
Agent2D with a reinforcement learning algorithm used to
decide the best actions according to specific situations during
the game. The results obtained during simulation
experiments demonstrate the improvement obtained by
allowing adaptive behavior during the decision making
process, when setplays have more than one possible option of
action to choose from. In order to evaluate the proposed
approach, several setplays with multiple options where
created (using a toolset recently developed and made
available as free software), and evaluated using a simple QLearning algorithm.
Experimental simulation show that the results are
promising, but more experiments must be executed, with new
multi-choice setplays, in order to fully evaluate the
advantages that can be obtained by the use of the proposed
approach (in this evaluation, only one setplay, presented as
example, was used in order to evaluate the improvement of
the team evaluation after the use of Q-Learning to select the
“best” choices of actions). In comparison with Agent2D
3.1.1, the inclusion of Reinforcement Learning in the team
with Setplay capability increased the performance from 65%
(using only a “static” setplay) to 93% (with use of the
“multiple-choice” setplay and learning to select the best
option).
Against intermediate teams, such as
FCPortugal2013 and GPR2D2013, the use of multiple-choice
setplays and RL also presented good results, with winning
chances of 75% (against FCPortugal2013) and 71% (against
GPR2D2013). But against the best teams, the results still
have a lot of room to improve, with less than 10% of winning
chance
against
both
the
current
champion
(WrightEagle2013) and vice-champion (Helios2013), but
with 25% of chance of winning against the third placed
team, Yushan2013.
In the near future, it is planned to use other approaches to
improve the adaptability of the approach, allowing the
training against several different adversary teams, and using
an automatic procedure to identify the style of play of the
adversary, in order to select the matrix that provides the best
results against that specific adversary. Other ideas include
the use of some kind of heuristic search to automatically
adjust the positions of the players during the execution of
each setplay.
ACKNOWLEDGMENT
The first author would like to thank CAPES for his
scholarship, process No. BEX 9292/13-6. All the authors
would also like to acknowledge the Robocup community for
its support, specially team Helios [14], and Luís Mota, for
the development of the Setplay Framework and support in
the development of this work.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Kitano H., Asada M., Kuniyoshi Y., Noda I. “The robocup synthetic
agent challenge”. International Joint Conference on Artificial Intelligence
(IJCAI), Nagoya, Japan (1997).
Asada, M. and Kitano, H. “The RoboCup Challenge”, Robotics and
Autonomous Systems, Volume 29, Issue 1, Pages 3-12, October, 1999.
Stone, P. and Veloso, M. “A Layered Approach to Learning Client
Behaviors in the RoboCup Soccer Server”. Applied Artificial
Intelligence, 12:165–188, 1998.
Farahnakian, F.; Mozayani, N. "Reinforcement Learning for Soccer
Multi-agents Sytem" International Conference on Computational
Intelligence and Security. CIS '09, Beijing, China, pp. 50-52, December
11-14, 2009.
Xiong, L., Wei, C.; Jing, G.; Zhenkun, Z.; Zekai, H. “A new passing
strategy based on Q-learning algorithm in RoboCup”, International
Conference on Computer Science and Software Engineering, pp. 524527, December 12-14, 2008.
Rabiee, A. And Ghasem-Aghaee, N. “A Scoring Policy for Simulated
Soccer Agents using Reinforcement Learning”, 2nd International
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Conference on Autonomous Robots and Agents, Palmerston North, New
Zealand , December 13-15, 2004.
Leng, J; Fyfe, C; Jain, L. “Simulation and reinforcement learning with
soccer agents ”, in Multiagent and Grid Systems , Vol. 4, N. 4, pp. 415436, 2008.
Mota, L.; Lau, N.; Reis, L.P.; Co-ordination in RoboCup's 2D simulation
league: Setplays as flexible, multi- robot plans, 2010 IEEE Conf. on
Robotics, Automation and Mechatronics, RAM 2010, pp. 362-367.
Mota, L.; Reis, L.P.; An Elementary Communication Framework for
Open Co-operative RoboCup Soccer Teams, in Sapaty P; Filipe J (Eds.)
4th Int. Conf. on Informatics in Control, Automation and Robotics ICINCO 2007, pp. 97-101, Angers, France, May 9-12, 2007
Mota, L.; Reis, L.P.; Setplays: Achieving Coordination by the appropriate
Use of arbitrary Pre-defined Flexible Plans and inter-robot
Communication, RoboComm 2007 - First Int. Conf. on Robot
Communication and Coordination, Athens, Greece, October 15-17, 2007.
Lau, N.; Reis, L. P.; Mota, L.; Almeida, F. FC Portugal 2D Simulation:
Team
Description
Paper,
online,
available
at:
http://staff.science.uva.nl/~arnoud/activities/robocup/RoboCup2013/
Symposium/TeamDescriptionPapers/SoccerSimulation/Soccer2D/,
consulted on Jan/2014.
Mota, L.; Reis, L.P.; A Common Framework for Cooperative Robotics:
an Open, Fault Tolerant Architecture for Multi-league RoboCup Teams,
Int. Conf. Simulation Modeling and Progr. for Aut. Robots (SIMPAR),
Springer, LNCS/LNAI series, pp. 171-182, Venice, Italy, Nov, 2008.
Cravo, J. G. B. (2012). SPlanner: a graphical application for the Flexible
definition of Setplays in Robocup (in portuguese). MsC. Dissertation,
Integrated Master in Computer and Informatics Engineering, Faculty of
Engineering, University of Porto. Online, available at: http://repositorioaberto.up.pt/bitstream/10216/62120/1/000149781.pdf, consulted on
Jan/2014.
Akiyama, H. “Helios RoboCup Simulation League Team”, online,
available at: http://rctools.sourceforge.jp/pukiwiki/, Acessed: Jan/2014.
Mota, L. “Multi-robot Coordination using Flexible Setplays: Applications
in RoboCup's Simulation and Middle-Size Leagues”, PhD Thesis,
LIACC – Artificial Intelligence and computer Science Lab., Faculty of
Engineering, Porto University. Advisors: L. P. Reis and N. Lau, 2012.
Sutton, R. S.; Barto, A. G. Reinforcement Learning: An Introduction.
Massachusetts: MIT Press, Cambridge, 1998.
Dayan, P. Technical Note Q-learning, Centre for Cognitive Science,
University of Edinburgh, Scotland: University of Edinburgh, 1992.
Neri, J.R.F.; Zatelli, M.R.; Farias dos Santos, C.H.; Fabro, J.A.; , A
Proposal of QLearning to Control the Attack of a 2D Robot Soccer
Simulation Team, 2012 Brazilian Robotics Symposium and Latin
American Robotics Symposium (SBR-LARS), pp.174-178, 16-19 Oct.
2012.
Fabro, J. A.; Botta, A. L. C.; Parra, G. A. P.; Neri, J. R. F.; The GPR-2D
2013
Team Description Paper, online, available at:
http://staff.science.uva.nl/~arnoud/activities/robocup/RoboCup2013/
Symposium/TeamDescriptionPapers/SoccerSimulation/Soccer2D/,
consulted on Jan/2014.
Lau, N.; Lopes, L. S.; Corrente, G. and Filipe, N.; Multi-Robot Team
Coordination Through Roles, Positioning and Coordinated Procedures,
Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems – IROS
2009, St. Louis, USA, Oct. 2009.
Lau, N.; Lopes, L. S.; Corrente, G. and Filipe, N.; Roles, Positionings
and Set Plays to Coordinate a MSL Robot Team, Proc. 14th Port. Conf.
on Artificial Intelligence, EPIA'2009, Aveiro, LNAI 5816, Springer, pp
323-337, October 12-15, 2009.

Download Report

Using Reinforcement Learning techniques to select the best Action

Paperzz.com

Your Paperzz