Action Tables (cont.)

Learning-Based Automatic
Generation of Collision Avoidance
Algorithms for Multiple
Autonomous Mobile Robots
Yukiyoshi Fujita
Ichiro Suzuki
Satoshi Fujita
Hajime Asama
Masafumi Yamashita
Ami Shakked
Abstract
• This is a discussion about an automatic
generation of a collision avoidance
algorithm:
– Effective algorithm for two robots that
simulates human trial and error
– Usage of a reward function that is also
learned by the robots
– Sole usage of the sensor's output
- 236805Seminar in CS (Robotics)
2
Ami Shakked
Abstract (cont.)
• How a robot can use its gained
“experience” for a more complex
environment
• Usage of reduced state space
• Usage of Omni-directional robots
• Comparison of theoretical results to
the actual results
- 236805Seminar in CS (Robotics)
3
Introduction
Ami Shakked
• An autonomous multi-robot system is on in
which:
– No fixed “leader” - each robot is self driven
only by it’s own design & data
– Each robot adjusts itself independently.
• This is an advantage when it comes to
failures, scalability, communication
overhead etc.
• On the other hand the design of algorithms
is more difficult
- 236805Seminar in CS (Robotics)
4
Ami Shakked
Introduction (cont.)
• The discussed robots have eight
sensors
• Each sensor can detect:
– A nearby object (robot or wall)
– Direction of the object’s motion (out of eight)
– Speed of the object (out of three)
• The above results a state space of the
sensors outputs that consist (8*3+2)8
states
- 236805Seminar in CS (Robotics)
5
Introduction (cont.)
Ami Shakked
• This motivates a combined
research of:
– Collision avoidance algorithm
– Automatic reduction of states (by
automatic state merging)
- 236805Seminar in CS (Robotics)
6
Ami Shakked
Introduction (cont.)
• A robot in an unknown environment
repeatedly evaluates its performance
• The more successful actions (from the
past) are more likely to be chosen
• We will investigate how the above robots
autonomously organize the state space
& generate a collision avoidance
algorithm based on reduced state space
- 236805Seminar in CS (Robotics)
7
Ami Shakked
Introduction (cont.)
• We will examine a simulated naive
human trial & error learning algorithm
and see it presents relatively good
results
• All algorithm parameters are adjusted
without any external intervention
• A discussion about how robots can use
their experience for a more complicated
environment (three robots) will be held
- 236805Seminar in CS (Robotics)
8
Ami Shakked
Introduction (cont.)
• In addition to the theoretical discussions
and experiments we will hold physical
experiments as well
• Results will show very high probability of
collision avoidance - especially for two
robots
• The algorithm works reasonably well for
the case of three robots
- 236805Seminar in CS (Robotics)
9
Ami Shakked
The Model of the Robots
• The discussed Omni-directional
robots have 8 infra-red sensors
(trans. & receiv.) and can detect the
position of robot i in a relative
movement angle j.
• For convenience sake we will discard
the other detection possibilities (like
detecting a wall)
- 236805Seminar in CS (Robotics)
10
Ami Shakked
The Model of the Robots (cont.)
• Let us assume  is a distinct output of
the sensors and each is being a vector of
the sensors output
• A state space is a partition Q of 
• For each state qQ we prepare an
action table Sq whose kth element Sq(k) is
the probability that a robot in state q will
 S (k )  1
move in direction k (
)
K 0
q
7
- 236805Seminar in CS (Robotics)
11
Ami Shakked
The Model of the Robots (cont.)
• Each robot decides to move according
to its sensors output  meaning it
moves in direction k under the
distribution Sq for each qQ
• The task of the robots is to
autonomously build a partition Q and an
action table Sq for each qQ
• In the future ak notes: the action of
moving in direction k
- 236805Seminar in CS (Robotics)
12
The Model of the Robots (cont.)
Ami Shakked
• An examplary view of the robots
and how their positioning is marked:
1
Direction
of motion
2
0
3
7
0
4
6
1
7
0
2
6
5
Robot B
S(6,1)
3
5
4
Robot A
S(1,6)
- 236805Seminar in CS (Robotics)
13
Ami Shakked
Collision Avoidance for Two
Robots
Construction of Action
Tables by Learning
- 236805Seminar in CS (Robotics)
14
Ami Shakked
Action Tables
• We start with a case of two robots
• The state =(i,j) denotes sensor i is
facing sensor j of the other robot 
|2|=64
• Q2={{}|2} is a partition of 2
• pijk is the value of the kth element of
action table S(i,j) - the probability that a
robot will take action ak when the
sensor’s output is (i,j)
- 236805Seminar in CS (Robotics)
15
Ami Shakked
Action Tables (cont.)
• To create an un-biased system we
assign k ; pijk=1/8
• Evaluation of the influence of ak is done
by:
reward=( (ft-ft+1)+(1-)(dt+1-dt))
• ft: distance between the robot and the
target at time t
• dt: distance between the two robots at
time t
- 236805Seminar in CS (Robotics)
16
Ami Shakked
Action Tables (cont.)
• 01 and >0 will be determined by
the robots
• The reward shows the need to get as
close as possible to the target without
getting near another robot
• A robot that takes action ak from state
(i,j) updates the action table S(i,j) by:
7
– pijk=max{pijk+reward, 0} while  S (k )  1 holds
K 0
- 236805Seminar in CS (Robotics)
q
17
Ami Shakked
Action Tables (cont.)
• Simulation: =0.5, =0.05, d0=1.0 and a
state (i,j)
• Move them one step and update the
action table
• Repeat this 64k times & update all 64
states about 1k times
• The vectors of S(i,j) converges to
pijk=1.0 for a single k for most states (i,j)
- 236805Seminar in CS (Robotics)
18
Ami Shakked
Action Tables (cont.)
• The following table shows k for all i & j
• The actions in parentheses show the
highest probability where convergence
to a single number didn’t occur
j
i
0
1
2
3
4
5
6
7
- 236805Seminar in CS (Robotics)
0
1
6
7
7
0
0
1
(6)
(1)
1
7
7
7
0
1
1
1
2
6
6
0
0
0
1
1
1
3
6
7
7
0
0
1
1
1
4
2
7
7
0
0
0
1
2
5
2
7
7
7
0
0
1
1
6
2
7
7
7
7
0
1
2
7
2
6
7
7
0
0
1
1
19
Action Tables (cont.)
Ami Shakked
• Let us test the algorithm’s
performance (in a simulation):
– Each robot is a 1.0 radius disc
– A sensor can feel a distance of 2.0
– A robot can move in steps of 0.5
– The initial distance between the robots is
2.0
– The target for each robot is at a distance
of 10.0 in direction 0
- 236805Seminar in CS (Robotics)
20
Ami Shakked
Action Tables (cont.)
• CASE (i,j) states an experiment in
which the initial state of one of the
robots is (i,j)
• Each robot moves according to
the action selection table unless
he can move directly towards its
target
- 236805Seminar in CS (Robotics)
21
Action Tables (cont.)
Ami Shakked
• Results of the simulation show
success in all 64 cases
• Below are the more difficult cases:
– CASE(0,0), CASE(1,6), CASE (1,7),
CASE (2,7)
- 236805Seminar in CS (Robotics)
22
Ami Shakked
Action Tables (cont.)
• For comparison we will simulate
a heuristic algorithm in which
the robot chooses the first free
direction (0,1, ... ,7)
• There is no difference in
performance between the two
- 236805Seminar in CS (Robotics)
23
Tuning  and 
Ami Shakked
•  which is used to update the
probability table implies the robots
collision avoidance policy:
– A greater  - move forward in direction
0 (less avoidance)
– A smaller  - stronger avoidance
- 236805Seminar in CS (Robotics)
24
Tuning  and  (cont.)
Ami Shakked
•  implies the “strength” of the last
experience:
– A larger  - stronger consideration to
the last experience
– A smaller  - a slower learning
process
• An ideal learning process should be
without human assistance
- 236805Seminar in CS (Robotics)
25
Ami Shakked
Tuning  and  (cont.)
• We use the following  tuning process
(assuming the robots reach their target
within 30 steps without a collision)
starting with =1.0 (and a fixed value of
):
– With the current  value build the 64 action
tables S(i,j) from the previous chapter in
30k updates for random states (i,j)
– Evaluate the algorithm for CASE(0,0) to
CASE(7,7) while changing  until the
robots readh their target in 30 steps or less
- 236805Seminar in CS (Robotics)
26
Tuning  and  (cont.)
Ami Shakked
• The rules for changing :
– If a collision occurs in one of the 64
possibilities decrease  by 
– If no collision occurs in all 64
possibilities but the robots can’t reach
their target in 30 steps - increase  by 
•  begins as 0.1 and is halved every
time  uses its last value
- 236805Seminar in CS (Robotics)
27
Tuning  and  (cont.)
Ami Shakked
• Figure 3 shows the results of this
experiment
•  is eventually stabilized on 0.4
- 236805Seminar in CS (Robotics)
28
Tuning  and  (cont.)
Ami Shakked
• Assumption:
– The robots “want” to create the set of
64 action tables S(i,j) within 20k to 30k
updates
• We start with =1.0 (and a fixed
value of )
• If more than 30k updates occur,
=/2 and if less than 20k updates
occur =2
- 236805Seminar in CS (Robotics)
29
Automatic state space creation
Ami Shakked
• Reminder:
– Q2={{}|2} is a state space
– S(i,j) and the action tables (slide 19) are built
*
– Q2 can be created by merging two adjacent
*
states with the same action  |Q2 |=24
*
• The algorithm based on Q2 has the same
performance as the original
*
• Q2 can be built automatically at the end of
the learning process of the action tables
- 236805Seminar in CS (Robotics)
30
Ami Shakked
Collision avoidance for three
robots
• A similar approach can be used for a
more complex environment based on
simpler environment results
• We will compare the method from the
previous chapters to a simpler
learning method
- 236805Seminar in CS (Robotics)
31
Ami Shakked
Direct learning algorithm
• For three robots the robots sensor’s output  is
((i1,j1),(i2,j2))
• (ik,jk) where k=1,2 means sensor ik is facing
sonsor ik and (i2,j2) is undefined if only one robot is
visible
• Assume Q3 is a partition of 3 and that
Q3={{}|3} and build (i1,j1,i2,j2) instead
((i1,j1),(i2,j2))
• We will concentrate on cases with two robots in
sight since in the case of one we can adopt the
previous action tables
- 236805Seminar in CS (Robotics)
32
Ami Shakked
Direct learning algorithm (cont.)
• We chose a state ((i1,j1),(i2,j2)) & update
S(i1,j1,i2,j2) after a single step with the
previously described reward
• Repeated the process 1,792k times
(each table is updated ~1k times)
• =0.5, =0.05
• From the results we can deduce an
action selection table similar to the one
we saw
- 236805Seminar in CS (Robotics)
33
Ami Shakked
Direct learning algorithm (cont.)
• CASE (0,0,1,0), CASE
(0,1,1,6), CASE (1,0,7,0)
• The second figure
compares the first
(heuristic) algorithm with
the learning-based one
we just saw
• We clearly see that the
latter outperforms when
the first can’t handle
some of the cases well
- 236805Seminar in CS (Robotics)
34
Reduced state learning for 3 robots
• We adopt the Q2 from our previous
*
*
discussion and turn it into Q2 * Q2 as a
state space for three robots
• We get 300 states instead of 1792
states (including a single robot vision)
• Repeat the learning process as
discussed in the last three slides with
reduced state space
Ami Shakked
*
- 236805Seminar in CS (Robotics)
35
Ami Shakked
Reduced state learning for 3
robots (cont.)
• The table shows the
actions that their
probability converged
to some value
• Parentheses show
the action with the
highest probability but
no convergence took
place
- 236805Seminar in CS (Robotics)
class
1
2
3
4

21
22
23
24
25
26
27
28

295
296
297
298
299
300
i1
0
0
0
0

7
7
7
7
0
0
0
0

7
7
7
7
7
7
j1
0
1
2-3
4-7

7-3
4
5
6
0
0
0
0

7-3
7-3
7-3
4
4
5
i2

0
0
0
1

7
7
7
7
7
7
j2

1
2-3
4-7
0-1

4
5
6
5
6
6
action
6
(6)
6
2

1
2
1
2
(6)
(2)
2
6

1
(1)
1
2
1
(0)
36
Ami Shakked
Reduced state learning for 3
robots (cont.)
• As we can see the reduced state space has
an advantage
• We believe the advantage is the outcome of
the need for a single update which can be
equivalent to many updates
- 236805Seminar in CS (Robotics)
37
Experiment with Physical Robots
Ami Shakked
• We installed the obtained algorithms on
the omni-robots
• Two robots:
– 10 experiments for each (CASE(0,0),
CASE(1,7), CASE(1,6))
– 8, 6 & 7 avoidances, respectavly
• Three robots:
– Without showing the results, the robots avoided
collisions 5 out of 10 experiments but for some
cases the algorithm didn’t perform well when
compared to the reduced state (time-wise)
- 236805Seminar in CS (Robotics)
38
Experiment with Physical
Robots (cont.)
Ami Shakked
• We attribute the differences to the
following:
– Non discrete movement of the robots
– Non syncronized movement of the robots
- 236805Seminar in CS (Robotics)
39
Ami Shakked
Conclusions
• The robots built (without any intervention)
a collision avoidance algorithm
• We demonstrated how good algorithms
can be used for a more complex
enviorment containing more than three
robots
• Most of the time the resulting algorithm
gives good results
• We didn’t discuss the memory demands
- 236805Seminar in CS (Robotics)
40
Ami Shakked
Future Study
• A more complex state space
• Copying methods from a simple to a
complex enviroment
• Improve the simulation model
- 236805Seminar in CS (Robotics)
41