Framework for Classical Conditioning in a Mobile Robot

Project Report
Framework for Classical Conditioning in a Mobile
Robot: Development of Pavlovian Model and
Development of Reinforcement Learning Algorithm
to Avoid and Predict Noxious Events
Quentin Delahaye
Technology
Studies from the Department of Technology at Örebro University
örebro 2014
Framework for Classical Conditioning in a Mobile
Robot: Development of Pavlovian Model and
Development of Reinforcement Learning Algorithm to
Avoid and Predict Noxious Events
Studies from the Department of Technology
at Örebro University
Quentin Delahaye
Framework for Classical Conditioning
in a Mobile Robot: Development of
Pavlovian Model and Development of
Reinforcement Learning Algorithm to
Avoid and Predict Noxious Events
Supervisors:
Examiner:
Dr. Andrey Kiselev
Dr. Amy Loutfi
Prof. Franziska Klügl
© Quentin Delahaye, 2014
Title: Framework for Classical Conditioning in a Mobile Robot:
Development of Pavlovian Model and Development of Reinforcement
Learning Algorithm to Avoid and Predict Noxious Events
Abstract
Nowadays, robots have more and more sensors and the technologies allow
using them with less contraints as before. Sensors are important to learn about
the environment. But the sensors can be used for classical conditioning, and
create behavior for the robot. One of the behavior developed in this thesis is
avoiding and predicting obstacles.
The goal of this thesis is to propose a model which consists of developing a
specific behavior to avoid noxious event, obstacles.
i
Contents
1 Introduction
1
2 Background and Related Works
2.1 Ultimate Scenario and Tools . . . . . . . . . . . . . . . . . . . .
2.2 Type of Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Methods to Detect Obstacles . . . . . . . . . . . . . . . . . . . .
3
3
4
5
3 Method
3.1 Reinforcement Learning and Model of Classical Conditioning .
3.2 Different Model to Compute the Assiocative Strength . . . . . .
3.3 Rescorla-Wagner model of a pavlovian model and reinforcement
learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
7
7
4 Implementation
4.1 Global Architecture . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Storing the V-Values on the Map of the Environement . . . . . .
4.3 Estimation of the Value of the Constants . . . . . . . . . . . . .
4.4 Rescorla-Wagner Implemantation . . . . . . . . . . . . . . . . .
4.5 Code Implementation . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Loop Algorithm . . . . . . . . . . . . . . . . . . . . . .
4.6 Compute the Position of Obstacle . . . . . . . . . . . . . . . . .
4.6.1 Implementation of Algorithm to Compute the Position
of Events . . . . . . . . . . . . . . . . . . . . . . . . . .
16
5 Evaluations
5.1 Observations Results of Computing Position of Events
5.2 Observations Results . . . . . . . . . . . . . . . . . . .
5.2.1 Experiment 1 . . . . . . . . . . . . . . . . . . .
5.2.2 Experiment 2 . . . . . . . . . . . . . . . . . . .
18
18
19
19
20
6 Discussion and Future Works
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
10
12
13
13
13
15
26
ii
CONTENTS
References
iii
28
List of Figures
2.1 TurtleBot [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
3.1 Representation of pavlovian conditionning . . . . . . . . . . . .
6
4.1 Global Architecture . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Different modules implemented and used for the Conditioning
Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Picture of the matrix draw with value of V in each cell . . . . .
4.4 Assiociative Strength value after 30 trials . . . . . . . . . . . . .
4.5 Assiociative Strength value after 30 trials . . . . . . . . . . . . .
4.6 Algorithm of the loop . . . . . . . . . . . . . . . . . . . . . . .
4.7 Position of obstacles according to the event . . . . . . . . . . .
10
5.1 Matrix of the environment after each forward bumper hit an
obstacle (size of cell: 40x40cm) . . . . . . . . . . . . . . . . . .
5.2 Picture of the experiment 1 . . . . . . . . . . . . . . . . . . . . .
5.3 Schema explained the different paths did by the robot . . . . . .
5.4 V-value displayed on the matrix with the position of the robot
in green . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Picture of the measurement with a box . . . . . . . . . . . . . .
5.6 Plan of the moving of the turtlebot to measure the position of
the box in the first hour . . . . . . . . . . . . . . . . . . . . . .
5.7 Evolution of V-value of each cell from step 1 to 8 (Red blue lines
correspond to walls) . . . . . . . . . . . . . . . . . . . . . . . .
5.8 Evolution of V-value of each cell from step 1 to 11 (Red blue
lines correspond to walls) . . . . . . . . . . . . . . . . . . . . .
5.9 Evolution of V-value (from step 3 to 11) of the cell which corresponds to the hit with the box at step 3. . . . . . . . . . . . . .
iv
11
11
12
12
14
16
18
19
20
21
21
22
23
24
25
List of Algorithms
1
2
Pseudo
matrix
Pseudo
matrix
code
. . .
code
. . .
to increase the associative strength V-value in the
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
to decrease the associative strength V-value in the
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
15
15
Chapter 1
Introduction
The notion of reflex was introduced by Thomas Willis in the 17th [11]. A reflex is an involuntary response of a stimulus; for example we retreat quickly the
hand when it is touching something scorching. Ivan P. Pavlov presented two different types of reflexes: unconditioned reflex (UR) and conditioned reflex (CR)
(which are acquired individually) [11]. The UR is a reaction for an unconditioned stimulus (US) and CR is a reaction for a conditioned stimulus (CS). This
physiologist demonstrated that after a few tries with CS and US simultaneous,
only the CS was enough to create the response of the reflex. After that Skinner
showed that the response to a CS can be reinforced by its consequences [11]
and modifies the behavior, this is Operant conditioning.
In neuroscience, many models have been developed which allows having
a mathematical approach for classical conditioning [5]. In this thesis I am inspired by the work of Robert A. Rescorla and Allan R. Wagner [15]. However a
few other algorithms being inspiring by Rescorla-Wagner have been developed
like Temporal-difference (TD) [13] or the Q-Learning which "is a method for
solving reinforcement learning problems" [7]. However each method has some
advantages and drawbacks; we will see which one corresponds to our goal later
in the thesis.
In our case the time and the "recognition of place" of the robot has an
important impact on the development of conditioned reflexes. However the
time can be a significant aspect to develop conditionned reflexes. Indeed in
animal world, the effect of US is reinforced when there is a regularity in the
time. Then the time acts as a CS. This is the case in a study on the effect of
drugs on rats. This study demonstrated that the effect of the drug depended
of the periodicity and the hour the rats got the injection of amphetamine [3].
And more the injection was regular, the bigger was the effect. This notion of
periodicity for robot could be used to implemant a reinforcement learning. The
robot is moving while it is avoiding obstacles or people in a crowded area in
public space at specific time because it had already met it the day before at the
same hour.
1
2
CHAPTER 1. INTRODUCTION
The goal of this thesis is to propose a model which consists of developing a
specific behavior to avoid noxious event, obstacles.
This thesis reports the work which has been done in cooperation with
two other students: Kaushik Raghavan and Rohith Rao Chennamaneni. Their
works are respectively about "Integration of Various Kinds of Sensors and Giving Reliable Information About the State of the Environment" and "Behavior
and Path Planning". This thesis is focused on the implemantation of algorithm
to develop condtioning reflexes with reinforcement learning on obstacles.
The report is organized as follows. Chapter 2 defines the goal. Chapter 3
describes in details our proposed method. The implementation s is given in
Chapter 4. Evaluation of experiment is given in Chapter5 and a disussion of
the theisis in Chapter 7.
Chapter 2
Background and Related Works
2.1 Ultimate Scenario and Tools
Nowadays it is easy for a robot to stock the map of a building, and move while
determining its current position.
In the project we use a TurtleBot [2] (Fig 2.1) – a differential-drive robot
which has multiple sensors, and runs the ROS [1].
In this project, we consider the ultimate scenario of robot-guide for blind
people which is capable of offering the best possible route for a person to reach
the destination. The small size of this robot may convenient to guide blind people to move into an unknown building where tall indications are display on
signs. However to guide people the robot has to elaborate a strategy and analyze the best comfortable path to the person. When it should assist a blind man,
it will have to generate the comfortable route by already having a knowledge
of the path and being able to predict the position of eventual obstacles. To correspond to this example the best way will be to have a robot with a behavior
similar to a dog which has some basic conditioned and unconditioned reflexes
and memory to recognize the place. We were inspired by this type of situation
to develop useful tools using the sensors, driver motor and other components
of the robot to analyze the environment, avoid collision, and improve reliability.
We use ROS middleware to communicate with robot hardware and build
our application in the way so it can be easily ported to other robots. The different tools we used are:
• RGB image from the Kinect with a field of view of ±45◦ in diagonal
• Depth cloud from the Kinect with a field of vision of ±45◦ in diagonal
• Three Forward bumpers
• The velocity
• The wheel drop
3
CHAPTER 2. BACKGROUND AND RELATED WORKS
4
Figure 2.1: TurtleBot [2]
• Motor driver
• Goal trajectory (local planning between two close points)
• TF odometry
• 2 degrees of freedom of the turtle bot
• Battery Info
Sensor data is pre-processed, combined and interpreted to provide input
event for developing condtioning reflexes.
2.2 Type of Obstacles
In dynamic environments there are two types of obstacles: Static and dynamic
obstacles [4]. However another type of obstacle is the crowd flow. It can have
the same position everyday during the same time. This is the case of traffic
in building during busy hours. Most of traffic areas are crowded at different
hours (like the arrival, the lunch and the departure). There might be some fixed
patterns of crowd flow at some place in the same time. And the most of the
time this crowd is congested in the corners or before a turn [9].
2.3. METHODS TO DETECT OBSTACLES
5
2.3 Methods to Detect Obstacles
SLAM (Simultaneous Location and Mapping) is implemented in many robots.
It consists in adding methods to build a map and estimating the position of the
robot. Each obstacles detected are directly placed in the map [10]. According to
the Article [4], SLAM detects static obstacles and dynamic obstacles, but only
the dynamic obstacles are analyzed and filtered from the data generated by the
SLAM.
According to Amund Skavhaug and Adam L.Kleppe there are three ways to
store a description of an obstacle in a map: "The vector approach and the grid
approach" [10] and particles filters.
The vector approach is used in GraphSLAM [20]. In this case the obstacle is
represented like a vector which contains different parameters describing it. But
according to Amund and Adam it is hard to find the parameters that describes
the obstacle.
The grid map represents the world in a two dimensional map split in cells of
equal size. According to the article [6], it is used for indoor applications. Each
cell corresponds to an area. It contains a value which estimates the probability
of the cell being occupied or empty. However this system does not store any
information about obstacles.
The last method is a mix between the two previous. Particles filter are floating points and are randomly drawn in the map [19]. when an obstacle comes
in contact with the particles a point will be marked on the SLAM to represent
this obstacle. The position of these marked points are stored. "The advantage
of this method is that it is multimodal" [10].
There are different methods to detect dynamic obstacles. In this Article [6]
about "Real-time Detection of Dynamic Obstacle Using Laser Radar" the authors used spatial and temporal difference with grid map to detect dynamic
obstacles. They determine dynamic obstacles by comparing in real time the
cost of each cell with three different grid-maps at three different times.
In our case we will proposed a solution close from the grid map, and we’ll
use the classical conditioning model to develop cost for each cell.
Chapter 3
Method
3.1 Reinforcement Learning and Model of Classical
Conditioning
"Reinforcement learning is learning by interacting with an environment" [21].
It is one branch of Operant conditioning. It is subdivided in two other branches:
Positive or Negative reward [18]. In our case we use a negative reinforcement
learning in order to avoid noxious rewards. For example, the robot can avoid
hitting a person which means for the robot to avoid receiving data from the
bumper (the bumper is consider as a noxious reward).
To allow the robot to learn the position and the time of crowded areas
we propose to create associations between: one conditioning and one reward
and, one unconditioning and one reward. The result consists in predicting the
noxious Reward event by just using a CS.
In this thesis, we will create a connection between Positioning/Time (CS)
with reward and we will create another connection between the events interpreted from sensors (US) and Reward (see figure 3.1). After the CS occurred a
few times, the associative strength between it and the Reward is more and more
strong.
Figure 3.1: Representation of pavlovian conditionning [8]
6
3.2. DIFFERENT MODEL TO COMPUTE THE ASSIOCATIVE STRENGTH
7
3.2 Different Model to Compute the Assiocative
Strength
There are different methods to compute the strength value of the association
(between Reward and CS):
• Rescorla Wagner method (RW)
• TD: Temporal-Difference learning
• Q-learning
The particularity of TD using time between CS and US [13, 5], is called
a real-time model [16]. In the thesis, we will not use this method because we
directly link the CS and Reward. We won’t compute the time between the two
agents (CS and US).
The Q-learning computes the delay of prediction of an reward ("immediate
reward", "delayed reward", and "pure-delayed reward" [7]). In our situation
we just use one type of reward which is periodic and which is not a delayed
reward. Indeed a delay reward is linked to the time between an event and a
reward and not to the periodicity of the event.
The Rescorla Wagner has the advantage to be simple to implement, and
allows developing another aspect which is close to animals: inhibition of conditioning stimulus [17]. To simplify the implementation we will develop palvovian conditionning by using Rescorla-Wagner formula [14] and we will propose
a model of reinforcement learning adapted to our subject.
3.3 Rescorla-Wagner model of a pavlovian model
and reinforcement learning
According to Rescorla-Wagner [15] , the equation 3.1 and 3.2 give "the associative strength of a given stimulus" [17]. The variables use in the equations are:
• λ "is the maximum conditioning US" [15]: 100 if an US occurred and 0
otherwise
• α is "rate parameters dependent" [15] on the CS
• β is "rate parameters dependent" [15] on the US
• V: Strength of the association between CS and the reward
∆V = αβ(λ − VT otal )
(3.1)
CHAPTER 3. METHOD
8
V n+1 = V n + αβ(λ − V n−1 )
(3.2)
As it is shown in the thesis about the effect of amphitamine on rats [3], the
time factor amplifies the effect and so the link between CS and CR. We tried to
represent this aspect by being inspired by the movie "Groundhog Day" released
by Bill Murray. It tells the story of a guy who is living the same day and same
event everyday. As event are the same every day, he will try to avoid or predict
them. We can interpret this movie for the robot like: if there is an obstacle at
the same position at the same hour like yesterday the associative strength (Vvalue) will increase. The robot will work every day from Monday to Friday. As
on weekend the robot is off, we don’t include it.
If the robot is at the same place and someone touches it with an interval of
1 second it considers that like a new situation of conditioning. So we compute
again the RW equation with the precedent value of the associative strength between the position/time with CR.
Chapter 4
Implementation
4.1 Global Architecture
Fig 4.1 shows three modules we developed: Input Unit, Conditional Unit and
Behavior Unit.
The World model contains the data about environment and internal states
of the robot and is used for information sharing with other modules. The data
contained by World Model are:
• Clock: simulation of a clock which contains hour and day.
• Result of Conditioning Unit stocked in matrix (Paragraph 4.2)
• Current Position of the robot
The Interpreter Unit consists in analysing the data from the different sensors
(bumper, RGB kinect ...) of the robot and interpreting sensor data according
to situation. It defines the type of each event occurred by using some basic
sentences like "I hit something on the left", "I hit something on the right" or
"I approach something" etc. These sentences correspond to the unconditional
stimulus. The Interpreter Unit sends to the Conditioning Unit and the Behavior
Unit all events interpreted.
The Conditioning Unit computs the value of the associative strength (V) between the position and the time with the reward (which is here a punishment).
For instance if there is a dynamic obstacle which occures a few time at the same
hour and same position, it will develop conditioning such the robot will predict
and avoid it the next time. This module reads the type of events (send by Interpreter Unit) and update V-value (Fig 4.2). It reads the time and the position of
the robot (figure 4.2).
The Behavior Unit consists in generating the best path for the robot. It is
composed of two parts: Behavior planner and Path planner. Behavior planner
allows the robot to develop basic action reflexes by reading data send by Interpreter Unit; for example the robot moves back when someone hits a Bumper.
9
10
CHAPTER 4. IMPLEMENTATION
Figure 4.1: Global Architecture
Path planner consists in computing the best path. It reads data from the Conditioning Unit and generates the path while comparing the V value of each cell.
4.2 Storing the V-Values on the Map of the
Environement
To store data we use matrices, a model similar to the grid map. Matrices are
used for storing the result of V-values and types of events sent by the Interpreter
Unit. Several matrices are used, one matrix for each hours.
We don’t need to store the V-values by using many cells. Indeed, to analyze
crowd flows it is preferable to use a cell size 40cm because it depends of human
anatomy [12] and it is upper than the size of Turtlebot which is 36cm. Events
are also stored in the matrix and linked to the cell and hour they are corresponding. A same event can’t be saved more than twice in one cell at a specific
hour. To save the data of each matrix, a text file is generated.
Fig 4.3 is a picture of the matrix coding in JAVA, and the cost of each cell
(from 0 to 100) and the position of the robot in green.
We compute the position of each type of event as we can see in paragraph
4.6. However to update V of a cell we need to know the number of the cells
4.2. STORING THE V-VALUES ON THE MAP OF THE ENVIRONEMENT
Figure 4.2: Different modules implemented and used for the Conditioning Unit
Figure 4.3: Picture of the matrix draw with value of V in each cell
11
12
CHAPTER 4. IMPLEMENTATION
Figure 4.4: Assiociative Strength value after 30 trials [14] with α = 1 and λ = 100
Figure 4.5: Assiociative Strength value after 30 trials [14] with α = 1 and λ = 0 after
the 6th trial
which contain the event (or obstacle). That’s why we have a function which
allows converting position X/Y to cell number.
4.3 Estimation of the Value of the Constants
To determine the value of β which depends of the event, on Fig 4.4 is drew the
evolution of the associative strength (V) after 30 trials by using RW alogrithm.
As we can see with β = 0, 3 after two trials the strength value of the association V-value is upper than 40. This value means that the association is enough
strong to create a direct link between CS and the reward.
Figure 4.5 shows that when no event are met after a conditioning, with
β = 0, 3 the curve is under 40 after 1 trials. So the link between CS and the
reward will be not small after only two failures.
4.4. RESCORLA-WAGNER IMPLEMANTATION
13
4.4 Rescorla-Wagner Implemantation
We simplified the Rescorla-Wagner equation seen on the paragraph 3.3 with
α= 1 :
X
Vnow = Vprevious + β(λ −
Vprevious )
(4.1)
The value of β is according to the importance of the type of event send by the
Interpreter Unit. Indeed some events like "I am in danger" are more important
and have more impact than "something approaches me". To show this difference we reduced the number of trial before to get a V-value upper than 50 as
we saw in the paragraph 4.3 while changing β.
• β = 0.3 (2 trials before to have V>40): "I hit on the left" or "I hit on the
right" or "I hit in front of me" or "I aproach something" or "something
approaches me"
• β = 0.4 (1 trials before to have V>40): "I am in danger"
4.5 Code Implementation
4.5.1
Loop Algorithm
To do the experimentation we compute main algorithm (Fig 4.6) every second.
V-value increases if new events are met. First we Read the buffer which
contains all event which met in the last second. After that for each events we
compute (see detail in the algoirthm( 1)):
• Position of the event in the Matrix
• We keep V-value from the Matrix about the cell concerned
• We compute V-value with λ = 100
V-value decreases if no events are met. First, we read the Position of the
robot in the matrix. After that we tested if the position of the robot is different
from the precedent or if the hour is different or not. If it is true we compute
for each event which are met in the cell the new value of V-value with λ = 0
(Algorithm 2).
Algorithm (2) and (1) are coded in C++. Here we explain the main lines of
the implemantation of this two algorithms which allow increasing and decreasing V-value:
14
CHAPTER 4. IMPLEMENTATION
Figure 4.6: Algorithm of the loop
4.6. COMPUTE THE POSITION OF OBSTACLE
15
Algorithm 1 Pseudo code to increase the associative strength V-value in the
matrix
Require: Type of Event.
Require: Current Cell.
Require: Hour.
Ensure: V.
1:
2:
3:
4:
5:
6:
7:
8:
9:
for all type of Event which are met while the last second do
Get cell number of the event
Get V from the cell of the Matrix
compute β value according to the type of Event.
compute V with R = 100.
end for
if Type of Event is new according to the cell number and the hour then
Stock Type of Event in Matrix Data
end if
Algorithm 2 Pseudo code to decrease the associative strength V-value in the
matrix
Require: Current V.
Require: Current Cell.
Require: Hour.
Ensure: V.
1:
2:
3:
4:
5:
6:
7:
for all type of Event STOCKED in the matrix at a specific hour and cell
number do
compute β value according to the type of Event.
compute V with λ = 0.
end for
if V < 10 then
Erase all event are met and stocked at a specific hour and cell number
end if
4.6 Compute the Position of Obstacle
The turtlebot has to compute the position of obstacles according to the type of
event. Indeed the turtle bot has 3 forward bumpers:
• Bumper left
• Bumper front
• Bumper right
CHAPTER 4. IMPLEMENTATION
16
Figure 4.7: Position of obstacles according to the event
The position of the obstacle must be computed according to the bumper activated. On the figure 4.7 we can see two obstacles which are on different cells
(cell 4 and cell 6).
We compute the position of obstacles by using trigonometry:
Xobstacle = Xrobot + cos(θrobot + θobstacle ) × d
Yobstacle = Yrobot + sin(θrobot + θobstacle ) × d
(4.2a)
(4.2b)
Where the variable from equations 4.2 are:
• Angular of the robot θrobot according to the origin
• Angular of the obstacle, define according to the type of event
• Position of the robot Xrobot and Yrobot according to the origin
• d: distance between the center of the robot and the extremity of the sensor
4.6.1
Implementation of Algorithm to Compute the Position of
Events
To compute the position of the event in a first time we compute the orientation
of the Turtle bot regarding to the origin axis.The TF package from ROS allows
getting easily the angular. It displays the result from 0 to 3, which equals to 0◦
to 180◦ from the centre forward to the left of the robot. And vice versa in the
direction to the right with using negative value (0 to -3 =⇒ 0◦ to -180◦ ). After
the conversion in degree is done we apply the algorithm seen in paragraph 4.6.
4.6. COMPUTE THE POSITION OF OBSTACLE
17
We define for each type of event the theoretical position that the robot deduce
from its sensors. In table 4.1 we add an angular for the following event:
Type of event
obstacle in left
obstacle in right
obstacle in front Someone approachs
wheel drop
Angular
45◦
-45◦
0◦
0◦
Table 4.1: Angular according to the type of event
Chapter 5
Evaluations
5.1 Observations Results of Computing Position of
Events
We tested the computing of the position of the obstacles each time the Robot
meet an obstacle. Fig 5.1 shows the results we got according to the three forward bumpers (The red point is the center point of the base of the Turtlebot
given by the Table 5.1):
Table 5.1 shows three postions given by the program which contains the
algorithm to compute the position of event. We used a length between the center
position of the robot and the event of 25cm.
By using a cell size of 40cm we can see that some obstacles detected by the
bumpers sensors can be put in the same cell. This is the case for event detected
by Right and Front Bumper in Fig 5.1.
Figure 5.1: Matrix of the environment after each forward bumper hit an obstacle (size
of cell: 40x40cm)
18
5.2. OBSERVATIONS RESULTS
19
Table 5.1: Position of object and event computed by the algorithm and the Matrix
function
Object/Event
Position X,Y,θ
Robot
Bumper LEFT
Bumper FRONT
Bumper RIGHT
0.91,0.79,85.56◦
0.75,0.98
0.93,1.04
1.10,0.95
Position of the center
of the cell concerned
1.00,0.60
0.60,1.00
1.00,1.00
1.00,1.00
Figure 5.2: Picture of the experiment 1
5.2 Observations Results
5.2.1
Experiment 1
The following experiment consists in testing all modules implement together
in a Turtlebot. The robot will hit an obstacle during its course: the Input Interpreter, the Conditioning Unit and the Bhavior planner will have to communicate together to develop after this event a new behavior. Fig 5.3 diplays the
different paths that take the Turtlebot.
• Trial 1: The Turtlebot is directly going to the goal, there are no obstacle
on its way
CHAPTER 5. EVALUATIONS
20
Figure 5.3: Schema explained the different paths did by the robot
• Trial 2: The Turtlebot hits an obstacle on its path, and generates another
path to avoid the obstacle
• Trial 3: The Turtlebot genereted a new path but doesn’t include the cell
where the obstacle was in trial 2
Fig 5.4 shows at the trial 3 the result at the end of the experiment. The
Turtlebot hit an obstacle (here a foot 5.2) which generated events from its sensor. Some V-value increased according to the position of the obstacle and stored
it at a specify hour. So from the results of the experiment 1, we can see that the
Path planner developed a new behavior which consists in avoiding some cells
and taking another path.
5.2.2
Experiment 2
As discussed in Chapter 2.3, the crowds flow area are very concentred in the
corners. We decided to create a similar situation by using a box of size 48 ×
32cm. We put this box at the entrance of one small corridor of size 132cm.
The box size has the advantage to have a size close to the cell size used by the
Matrix program (Para 4.2). Fig 5.5 shows the entrance with the box which
takes only a fourth of the entrance length.
The box here represents a person who suddenly appeared in the corner next
to the wall. And only the bumper of the turtlebot detected this person after
collision because the field of view of Kinect sensor didn’t allow seing the person
behind the wall.
Fig 5.6 shows the path of the turtlebot for the measurement of V-value.
At each return from the start point we change the position of the box to do
5.2. OBSERVATIONS RESULTS
21
Figure 5.4: V-value displayed on the matrix with the position of the robot in green
Figure 5.5: Picture of the measurement with a box
CHAPTER 5. EVALUATIONS
22
Figure 5.6: Plan of the moving of the turtlebot to measure the position of the box in the
first hour
three trials with three different positions of the box. The postition of the box
changes after each return to start point, thanks to that we will have the result
of the movement of a crowd flow in two directions. The numbers on Fig 5.6
reprensent the different steps of the robot. The first goal is situated on the left
(step 2), the second on the right (step 6) and the last goal is on the left (step 10).
To control the turtlebot the program was executed which allows moving it with
the keyboard teleop manually. We decided for the turtlbot to take the shortest
path to go to one point; that’s to say it will have to go close to the corners. But
if each cell situated on the corners already have a V-value, we decided to make
the turtlebot move to the cells where the V-value is the smallest. The goal of
this scenario is to demonstrate:
• The increase of V-value at different cells
• The decrease of V-value at one precise cell while it moves a few time on
this cell (Blue path between step 9 and 10 )
• Test the detection of event with the three forward bumpers
Fig 5.7 displays the result from step 1 to 8. We have the cost of different
cells which inceased in each corners. After the scenario is over, Fig 5.8 displays
the result from step 1 to 11. The TurtleBot hits an obstacle in front of him. At
this moment the best way to go to the step 11 is to take on the right. As there
are no obstacles on this cell corresponding to the old position of box at trial 1,
5.2. OBSERVATIONS RESULTS
23
Figure 5.7: Evolution of V-value of each cell from step 1 to 8 (Red blue lines correspond
to walls)
V-value decrease to 24, 99 after one return to the start point. We can observe
the evolution of V-value on Fig 5.9 according to the number of trial.
At each second the Turtlebot computes the strength of associative V-value
according to the event met. Next V-value is located in a cell according to the
current position of the robot and the type of event. Table 5.2 allow showing
that the different positions of the box in the real environment are in agreement
with the corresponding cell where V-value is different to 0. According to the
Fig 5.8 we can observe three areas. Moreover according to Table 5.2 these areas
are linked to the different position of the boxes at different trials in the same
hour.
CHAPTER 5. EVALUATIONS
24
Figure 5.8: Evolution of V-value of each cell from step 1 to 11 (Red blue lines correspond
to walls)
Table 5.2: Comparaison betwwen Box position and Cell where V-value increased
BOX
Center position
position X,Y of the
Box in the real
environment
Center position of
the cells according
to the different
steps of the
measurement
Step 1: 1.40,1.80;
Step 3: 1.80,2.20
BOX trial 1
1.45,2.29
BOX trial 2
2.47,2.29
Step 5: 2.60,1.80;
Step 7: 2.60,2.20
BOX trial 3
1.95,2.29
Step 9: 2.2,1.80
Bumper used
Bumper Left,
Bumper Right
(twice)
Bumper
Right,Bumper
Left
Bumper Front
5.2. OBSERVATIONS RESULTS
25
Figure 5.9: Evolution of V-value (from step 3 to 11) of the cell which corresponds to the
hit with the box at step 3.
Chapter 6
Discussion and Future Works
In this thesis, we proposed a model to develop a specific behavior which allow
avoiding noxious events like obstacles.
On Classical conditioning part the Rescorna-Wagner model has been used
to increase or decrease the associative strength V-value and has been tested
(Fig 5.9). The reinforcement learning has been implemented by using a matrix
to store every event which is met with their position and the hour. The Fig 4.6
is a solution we implemented in the robot to predict the event.
The position of obstacles is computed using the three forward bumpers.
For that, we used the method seen in Section 4.6, next the event is sent to the
corresponding cell of the matrix. The results displayed in table 5.2 tells this
method is correct.
The experiment 1 shows us a situation of reinforcement learning by developing a new behavior (avoid the cell which contained the obstacle in the
previous trial). The result in Fig 5.8 illustrates the use of this model. This scenario is limited by the fact that it is teleoperated, instead of the path planner
and it does not take into account the hour. However the experiment 2 demonstrates the evolution of V-value according to the movement of the crowd and
the position of the event.
To return to the ultimate scenario of robot-guide for blind people, it is advantageous to integrate the hour and not only the position. Even though it has
not been achieved, basic tools has been developed to pursue this goal (reinforcement learning and Rescorla-Wagner model).
To store V-values, we used a model based on the Grid map. This method,
even if it has the advantage to save resources, the precision is low to detect
the shape of the static obstacle. The other way could be to use particles filter
method.
The choice about the β-value has been chosen arbitrarily in this thesis to
have only two trials before creating a direct link between CS and the reward.
The events will get stacked and will be published every second, so some events
will be published with a delay. This is one of the limitation to this system.
26
27
Moreover we can have V-Value increased one or two times for the same obstacle, because we don’t compute the time between each event.
According to the experiment results, in order to upgrade the model proposed, the following improvements can be applied:
• Compute time between each event occured in the same cell
• Add other sensors (Kinect, sound ...)
• Develop inhibition of conditioning stimulus
References
[1] ROS. http://www.ros.org/. Accessed: 2014-05-13. (Cited on page
Referenced on page: 3.)
[2] Turtlebot. http://www.turtlebot.com/. Accessed: 2014-05-13. (Cited
on pages
Referenced on page: 3 and 4.)
[3] J. Sullivan A. Arvanitogiannis and S. Amir. Time Acts as a Conditioned
Stimulus to Control Behavioral Sensitization to Amphetamine in Rats.
PhD thesis, Concordia University, Montreal, Quebec, 2000. (Cited on
pages
Referenced on page: 1 and 8.)
[4] Zhirong Zou Baifan Chen, Lijue Liu and Xiyang Xu. A Hybrid Data
Association Approach for SLAM in Dynamic Environments. pages 1–7,
2012. (Cited on pages
Referenced on page: 4 and 5.)
[5] Christian Balkenius. Computational models of classical conditioning: a
comparative study. (Kamin 1968):1, 1998. (Cited on pages
Referenced on page: 1 and 7.)
[6] Baifan Chen, Zixing Cai, Zheng Xiao, Jinxia Yu, and Limei Liu. Real-time
detection of dynamic obstacle using laser radar. In Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for, pages
1728–1732, Nov 2008. (Cited on pages
Referenced on page: 5 and 5.)
[7] Chris Gaskett. Q-Learning for Robot Control. 1:21–27, 2002. (Cited on
pages
Referenced on page: 1 and 7.)
[8] P. Gaussier. Sciences cognitives et robotique: le defi de l apprentissage
autonome. pages 35–39. (Cited on page
Referenced on page: 6.)
28
REFERENCES
29
[9] K. Katabira, T. Suzuki, H. Zhao, Y. Nakagawa, and R. Shibasaki. An
analysis of crowds flow characteristics by using laser range scanners. page
955. (Cited on page
Referenced on page: 4.)
[10] Adam Leon Kleppe and Amund Skavhaug. Obstacle Detection and Mapping in Low-Cost, Low-Power Multi-Robot Systems using an Inverted
Particle Filter. pages 1–15, 2013. (Cited on pages
Referenced on page: 5, 5, and 5.)
[11] Dominique Lecourt. Dictionnaire d’histoire et philosophie des sciences.
Presses universitaires de France, edition 2003. (Cited on pages
Referenced on page: 1, 1, and 1.)
[12] Masakuni Muramatsu, Tunemasa Irie, and Takashi Nagatani. Jamming
transition in pedestrian counter flow. Physica A: Statistical Mechanics and
its Applications, 267:487–498, 1999. (Cited on page
Referenced on page: 10.)
[13] Yael Niv. Reinforcement learning in the brain. pages 1–38, 1997. (Cited
on pages
Referenced on page: 1 and 7.)
[14] Michael J Renner. Learning the Rescorla-Wagner Model of Pavlovian
Conditioning: An Interactive Simulation. 2004. (Cited on pages
Referenced on page: 7, 12, and 12.)
[15] R. Rescorla. Rescorla-Wagner model. 3(3):2237, 2008. revision #91711.
(Cited on pages
Referenced on page: 1, 7, 7, 7, and 7.)
[16] Andrew G.Barto Richard S.Sutton. A Temporal-Difference Model of Classical Conditioning. (Cited on page
Referenced on page: 7.)
[17] Jean Marc Salotti and Florent Lepretre. Classical and operant conditioning as roots of interaction for robots. edition 2013. (Cited on pages
Referenced on page: 7 and 7.)
[18] J. E. R. Staddon and Y. Niv. Operant conditioning. 3(9):2318, 2008.
revision #91609. (Cited on page
Referenced on page: 6.)
[19] S. Thrun. Particle filters in robotics. In Proceedings of the 17th Annual
Conference on Uncertainty in AI (UAI), 2002. (Cited on page
Referenced on page: 5.)
REFERENCES
30
[20] S. Thrun and M. Montemerlo. The GraphSLAM algorithm with applications to large-scale mapping of urban structures. International Journal on
Robotics Research, 25(5/6):403–430, 2005. (Cited on page
Referenced on page: 5.)
[21] F. Woergoetter and B. Porr. Reinforcement learning. 3(3):1448, 2008.
revision #91704. (Cited on page
Referenced on page: 6.)