1 - swerl

Scaleable Multi Agent Systems
Master Thesis of:
Dirk-Jan van Dijk
October 2006
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Preface
This report is my M.Sc. thesis in Computer Science at Delft University of Technology. It
is an overview of the work I did at TNO Defensie en Veiligheid (DenV).
TNO DenV provides innovative contributions to the advance of comprehensive security.
TNO DenV is a strategic partner of the Dutch Ministry of Defence. TNO DenV has about
thousand employees and three research locations.
During my M.Sc. project I’ve studied multi agent systems and frameworks with the main
purpose of modifying the Spyse agent framework to allow the creation of scaleable
MAS’.
During my M.Sc. project I’ve been advised by Dr. A. Meyer and Ir. H. Geers which I
would like to thank here.
Dirk-Jan van Dijk
Den Haag
24 October 2006
2
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Table of Contents
Table of Contents ................................................................................................................ 2
1.
Introduction ............................................................................................................... 6
2.
Spyse ........................................................................................................................... 8
2.1.
Terminology........................................................................................................ 9
2.2.
Modules of Spyse .............................................................................................. 10
2.3.
Base classes of Spyse ........................................................................................ 12
2.3.1.
Agent and Behaviours ............................................................................... 12
2.3.2.
AID and HAP ............................................................................................ 14
2.3.3.
AMS .......................................................................................................... 14
2.3.4.
DF and Service .......................................................................................... 15
2.3.5.
MTS and ACLMessage............................................................................. 16
2.3.6.
App and Platform ...................................................................................... 16
2.4.
Environment model of Spyse ............................................................................ 17
3.
Scaling Spyse – Multi tasking ................................................................................ 19
3.1.
Multi Task Techniques ..................................................................................... 19
3.1.1.
Threads ...................................................................................................... 19
3.1.2.
Micro threads ............................................................................................ 20
3.2.
Multi Task Methods Research .......................................................................... 20
3.2.1.
Expectations .............................................................................................. 21
3.2.2.
Count to X Results .................................................................................... 24
3.2.3.
Cooperative Results .................................................................................. 26
3.2.4.
Conclusions ............................................................................................... 28
4.
Scaling Spyse – Distribution .................................................................................. 30
4.1.
Single System.................................................................................................... 30
4.1.1.
Initialization .............................................................................................. 30
4.1.2.
Starting an Agent ...................................................................................... 31
4.1.3.
Sending a Message .................................................................................... 32
4.2.
Distribution ....................................................................................................... 33
4.2.1.
What to distribute...................................................................................... 33
4.2.2.
Distributed AMS ....................................................................................... 34
4.2.3.
Directory Facilitator .................................................................................. 37
4.2.4.
Environment .............................................................................................. 37
4.2.5.
Libraries .................................................................................................... 38
4.2.6.
Test Results ............................................................................................... 41
4.2.7.
Conclusion ................................................................................................ 43
5.
Semantic networks and SNE .................................................................................. 45
5.1.
Semantic Network Model ................................................................................. 45
5.2.
Current system - SNE ....................................................................................... 47
3
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
6.
Proposed System - IBAS......................................................................................... 50
6.1.
Terminology...................................................................................................... 52
6.2.
Analysis............................................................................................................. 52
6.2.1.
Use Cases .................................................................................................. 52
6.2.2.
Functional Requirements .......................................................................... 55
6.2.3.
Non functional requirements..................................................................... 56
6.3.
Design ............................................................................................................... 56
6.3.1.
Fipa Request Protocol ............................................................................... 57
6.3.2.
IBAS Agents ................................................................................................. 58
6.3.3.
IBAS Objects ................................................................................................ 60
6.3.4.
Class model ................................................................................................... 62
6.3.5.
Sequence Diagrams ....................................................................................... 63
6.4.
Suspending Agents............................................................................................ 66
6.5.
Load Balancing ................................................................................................. 67
6.6.
Modifications to Spyse ..................................................................................... 68
7.
IBAS Prototype ....................................................................................................... 70
7.1.
Test Scenarios ................................................................................................... 72
7.1.1.
Scenario 1.................................................................................................. 72
7.1.2.
Scenario 2.................................................................................................. 74
7.1.3.
Scenario 3.................................................................................................. 75
8.
Conclusion ............................................................................................................... 77
References ........................................................................................................................ 79
Figure 2.1 Distributed Spyse............................................................................................... 9
Figure 2.2 Modules of Spyse ............................................................................................ 10
Figure 2.3 Important Spyse Classes .................................................................................. 12
Figure 2.4 Agent life cycle................................................................................................ 13
Figure 2.5 Environment Model ......................................................................................... 17
Figure 3.1 Count to 100 .................................................................................................... 24
Figure 3.2 Count to 1000 .................................................................................................. 25
Figure 3.3 Count to 1 million............................................................................................ 26
Figure 3.4 Count to 1 million............................................................................................ 28
Figure 4.1 Initialization of Spyse ...................................................................................... 30
Figure 4.2 Starting an agent with different scheduling methods ...................................... 31
Figure 4.3 Sending a message ........................................................................................... 32
Figure 4.4 Broadcast Update - Creating an agent ............................................................. 34
Figure 4.5 Broadcast Retrieve - Finding an agent ............................................................ 35
Figure 4.6 Central - Creating and finding an agent .......................................................... 36
Figure 4.7 Test 1 Absolute Time ...................................................................................... 41
Figure 4.8 Test 1 Time Compared to Empty Message ..................................................... 42
Figure 4.9 Test 2 ............................................................................................................... 42
Figure 5.1 Example of names ........................................................................................... 46
Figure 5.2 Example of a statement ................................................................................... 46
4
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 5.3 Statement with inverse .................................................................................... 46
Figure 5.4 Statement with attribute................................................................................... 47
Figure 5.5 SNE Overview ................................................................................................. 49
Figure 6.1 IBAS Use Case Diagram ................................................................................. 53
Figure 6.2 Agents in IBAS................................................................................................ 56
Figure 6.3 FIPA Request Interaction Protocol .................................................................. 57
Figure 6.4 Node Class Diagram ........................................................................................ 60
Figure 6.5 NodeIndex Class Diagram ............................................................................... 61
Figure 6.6 IBASApp Class Diagram ................................................................................ 62
Figure 6.7 Class Model of IBAS....................................................................................... 63
Figure 6.8 Sequence diagram: Search agent ..................................................................... 64
Figure 6.9 Sequence diagram: View agent ....................................................................... 65
Figure 6.10 Sequence diagram: Add node ........................................................................ 66
Figure 7.1 IBA life cycle 1 ............................................................................................... 71
Figure 7.2 IBA Life Cycle 2 ............................................................................................. 72
Figure 7.3 IBAS Scenario 1 Comparison between behaviours......................................... 73
Figure 7.4 IBAS Scenario 1 Comparison of different runs .............................................. 74
Figure 7.5 IBAS Scenario 2 Response times .................................................................... 75
Figure 7.6 IBAS Load balancing ...................................................................................... 76
Table 1 Linear Regression - Count to 100 ........................................................................ 25
Table 2 Linear Regression - Count to 1000 ...................................................................... 25
Table 3 Linear Regression - Cooperative count ............................................................... 27
Table 4 Linear Regression - Cooperative Runtime........................................................... 28
Table 5 IBAS Scenario 1 Average response times ........................................................... 74
Table 6 IBAS Scenario 2 Average response times ......................................................... 743
Equation 3.1 ...................................................................................................................... 22
Equation 3.2 ...................................................................................................................... 22
Equation 3.3 ...................................................................................................................... 23
Equation 3.4 ...................................................................................................................... 23
Equation 3.5 ...................................................................................................................... 23
Equation 3.6 ...................................................................................................................... 24
Equation 6.1 Selfish load balancing algorithm ................................................................. 67
Equation 6.2 Selfish load balancing upper bound ............................................................ 68
5
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
1. Introduction
Computer software is getting larger and more complex everyday. The demands for
present day software are so high that it’s becoming very difficult to create a single
program that encapsulates all the functionality.
Apart from the increasing complexity of computer software, computer software also
keeps requiring more computing power. A lot of programs require more computing
power then a simple computer system can offer.
To solve this problem computer software is distributed across multiple computer systems.
This again increases the complexity of the software.
It all results in programs that are complex, have taken a long time to develop and still
contain many bugs.
An increasingly popular approach is to split up functionality over smaller programs; thus
not only spreading computing power, but also program logic. The small programs can be
developed and operated separately.
A Multi Agent System (MAS) is a clear example of this approach. An agent can be seen
as a small computer program that operates autonomous and is designed to fulfill a
specific task.
Multi agent systems use a collection of agents to accomplish a goal that’s larger then
simply the sum of all the tasks of the agents.
One of the problems of MAS’ is that the number of agents can grow rapidly and again
may be too large for a single computer system to handle. Therefore the MAS needs to be
distributed amongst multiple computer systems.
Distributing a MAS is in general easier then distributing an arbitrary computer program,
because functionality is already divided in small parts. The only thing that has to be taken
care of is to keep the links between these small parts even when they are operated on
different computer systems.
Different MAS’ often have a lot of functionality in common. Agents need to be able to
find each other, send messages to each other, share some global information, etc.
In order to help with these common tasks agent frameworks are created. An agent
framework provides all common functionality for a MAS. An agent developer can create
his MAS using the tools the framework offers, not having to worry about how messages
are send from a to b, but only caring about the functionality of his MAS.
6
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Spyse is such an agent framework that is developed at TNO DenV and DECIS Lab.
Throughout this research project we will examine different aspects of agent frameworks
like how to run efficiently many agents on one or multiple computer systems. Solutions
we find will be implemented in Spyse.
To conclude this study we will look at a software project that applied an agent approach
namely a semantic network. Semantic networks have the tendency to become very large
and are in need of a distributed approach.
We will describe what agent technology can do for such a semantic network and will
implement a prototype of an agent based semantic network.
7
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
2. Spyse
Spyse stands for Smart Python Simulation Environment. Spyse is a software framework
and platform for building multi-agents systems (MAS [1]) that are compliant with FIPA
[2] and the Semantic Web [3].
Spyse tries to be as simple as possible in the spirit of ‘power through simplicity’. With
Spyse it should be easy to create an advanced MAS.
As the name implies, is Spyse created using the Python programming language. Python is
chosen for different reasons. The ideology of keeping things simple matches with the
ideals of Python. Python is also developed with the goal of being easy and fast to learn
and use.
Furthermore Python is an open source project just as Spyse. Developing an open source
agent framework with an open source programming language using freely available
standards should greatly help the acceptance of the Spyse framework.
Python is, like Java, a platform independent programming language; therefore the Spyse
framework runs on different platforms as well. It’s interpreted at runtime, no compilation
is needed before running, which makes it a nice language to quickly develop software.
In order to create Spyse agents one can use the Python language, but Spyse also has
support for higher level languages like 3Apl [4]. Using a high level programming
language should enable people that are not software development experts to easily create
a MAS.
At the moment Spyse can be used to create a small MAS. The limits of the current design
however are quickly reached. Without giving exact numbers, as these are dependant on
the type of MAS, a standard computer system can reach its limit while running only a
moderate amount of agents. Some MAS’ exhaust a system’s resources while running tens
of agents while it would be desirable to run with thousands of agents.
To allow Spyse to deal with more agents we will undertake two steps. The first step will
be to have a look at how the agents can be executed in an efficient way on a single
computer. Knowing that no matter how powerful a computer might be it limits will
eventually be reached when enough agents are deployed on it.
Our second step will consist of distributing the agents across multiple computer systems.
We will have a look at methods that we can use to accomplish this and will implement
those that are most suited.
The internals of Spyse will be explained at the following pages. We will introduce the
basic terminology, then the different parts of the framework and how the framework
8
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
operates on a single computer system will be explained. Later on the changes that need to
be made to make Spyse operate distributed on multiple systems will be explained.
2.1. Terminology
Some terms used in the literature about distributed systems and multi agent systems look
ambiguous. We’ll give a definition, for the main terms that will be used throughout the
remainder of this document.
-
System: System stands for a single computer system and all peripherals, like
mouse, keyboard, monitor, etc. that may be attached to it.
-
(Spyse) Framework: The entire collection of libraries, tools, etc that can be used
to create a MAS.
-
(Spyse) Container: An initialized and running version of a framework running on
one single system. It’s called a container as it ‘contains’ agents. Please note that
it’s technically possible to run multiple containers on one system, but that usually
only one container will be executed on a single system.
-
(Spyse) Platform: A collection of containers running on multiple systems, but
acting as a single entity to other programs and/or users.
Figure 2.1 Distributed Spyse
9
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The figure above shows the relation between systems, agents, containers and platform.
Agents are always run on top of containers. Multiple containers can form a single
Platform. The (Spyse) containers that form the platform are execute from different
systems.
2.2. Modules of Spyse
Figure 2.2 Modules of Spyse
In the picture above the different modules of Spyse are shown, the core module being the
most important.
-
aaapl: contains classes for parsing 3apl files that can be used to implement agents.
app: contains classes that assist in creating a Spyse application.
demo: contains some sample applications that show some of the Spyse features.
10
Scaleable MASs
-
Dirk-Jan van Dijk
7/13/2017
util: contains utility classes for all the other modules. For instance a vector or
matrix class.
The core module contains the most important classes and consists of several other
modules.
Agents
Agents contains the agent class which defines the main attributes of the agent. This
module also contains some specializations of the agent class like the wxAgent which
uses the wx library for visualization.
Behaviours
Behaviours are used to implement tasks, roles and protocols that an agent has. Apart
from the default behaviour class, some specializations like SendBehaviour,
ReceiveBehaviour, TickerBehaviour, etc. are present.
Content
This module deals with handling the content of the messages that and their encoding
in some specific form like XML or binary.
MTS
MTS stands for Message Transport System, it takes care of transporting a message.
Platform
Contains all the managing classes like an Agent Management System (AMS) to
handle agents, a Directory Facilitator (DF) to handle services that are offered by
agents and a platform class that takes care of initializing the Spyse platform.
Protocols
The protocols module has some advanced behaviours that implement FIPA protocols
like the Contract Net Interaction Protocol [5].
Semant
The semant modules takes care of the environment in which the agents reside. It
offers classes to view and/or change the environment.
11
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
2.3. Base classes of Spyse
Figure 2.3 Important Spyse Classes
In the picture above some of the more important classes of the Spyse framework are
shown. Since it will take to much room to show and discuss all classes in Spyse we will
limit ourselves to the ones shown here. Starting from the Agent class we will discuss all
the classes ending with the App class.
2.3.1. Agent and Behaviours
The agent class, naturally, represents a single agent. An agent has five different states:
12
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 2.4 Agent life cycle
-
Initiated: The agent is just created and ready to start running.
Active: The agent has been invoked and is running.
Suspended: The agent is (temporarily) paused and waiting to be activated again.
Agents can chose to enter this state or the AMS can enforce it on the agent.
Waiting: The agent is (temporarily) paused and waiting to be activated again.
Only agents themselves can chose to enter this state.
Transit: The agent is in a state where it’s safe to move the agent to another
(Spyse) container or platform.
An agent can be destroyed from any state.
The agent class only contains the attributes that define the identity of the agent. The
actions of an agent are delegated to the collection of behaviours which specify how the
agent is acting. Behaviours are used to implement tasks, roles or protocols that an agent
can use.
A behaviour class is used to implement any kind of action the agent should perform.
From simply letting an agent increment a counter, to looking around for agents that can
offer a certain service, or even more complex tasks like reasoning about given facts.
A set of common behaviours are already implemented in Spyse. Some examples of these
behaviours are:
-
ReceiveBehaviour: Task that takes care of receiving incoming messages.
TickerBehaviour: Task that executes a predefined action after a fixed interval.
CompositeBehaviour: A behaviour that’s created by combining several other
behaviours.
13
Scaleable MASs
-
Dirk-Jan van Dijk
7/13/2017
AAAPLBehaviour: Allows the agent to perform behaviour that’s specified in a
3Apl script.
ContractNetInitiatorBehaviour: Lets the agent contact other agents to negotiate
about a contract.
A developer can subclass these behaviours for more specialized tasks. The
ReceiveBehaviour can be extended by adding instructions on what to do when a message
is received.
Behaviours are executed in steps. A single step in the TickerBehaviour would be to check
if enough time has expired to execute its predefined action. The behaviours that are added
to agents are scheduled in a round robin fashion. The agent will execute a single step of a
behaviour and then move on to the next one.
A behaviour can be a one shot action. One could add a ReceiveBehaviour to an agent
with the goal of only receiving one single message. The instruction on what to do when
this message is received should take care of ending this behaviour. An agent will
continue to execute as long as it has at least one active behaviour remaining.
2.3.2. AID and HAP
Every agent has an Agent Identifier (AID) to identify the agent. The AID is used
primarily for sending messages to agents. The AID of the receiving agent is used to
indicate the receiver for the message.
An AID has, among others, an unique name which usually consists of a shortname, the
name of the agent inside the platform, and a Home Agent Platform (HAP), which is the
address of the platform on which the agent is created, separated by an @ sign
(shortname@HAP).
The HAP is composed by the hostname or IP followed by the port number at which the
platform is listening for incoming messages separated by a colon (hostname:port).
2.3.3. AMS
The agents are managed by the Agent Management System (AMS) which takes care of
creating, running and destroying agents. The AMS is a special subclass of the Agent
class. There can be only one AMS present on a single platform even if the platform is
distributed across multiple systems.
In a distributed situation multiple AMS objects will act as a single AMS. The AMS has a
unique reserved name of ‘ams@hap_name’.
14
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Besides managing the agents the AMS also offers a white page service to search for an
agent, using the name of the agent.
2.3.4. DF and Service
Agents can offer services to other agents. One agent for instance can be capable of selling
books and might be willing to offer this service to other agents.
Another agent might have the desire of purchasing a book. If the book buyer agent knows
a book seller agent it could send it a message telling it, it wants to purchase a book.
In most cases however an agent that needs a service will not know any agent offering this
service. In order to let agents find other agents offering a specific service a special agent,
the Directory Facilitator (DF), is introduced.
Services offered by the agents are managed by the DF. The DF is, just like the AMS, a
special subclass of the Agent class. Other then the AMS there can more then one DF
running on one platform. The default DF has a reserved name of ‘df@hap_name’
In Spyse a service is described by a collection of parameters:
-
title: Name of the service.
lease: How long the service will be offered.
ontology: An ontology1 describing the service.
language: Content language2 in which the service is offered.
protocol: Protocol that is used to offer the service.
One is however free to omit all of these parameters, except for the title which is the
minimal information to describe a service, and add extra parameters if needed.
As an example we can specify a book selling service as follows:
Title: Book selling
Lease: Until 17:00 22 November 2006
Ontology: http://semantic-web.com/ontologies/bookselling.rdf
Language: FIPA SL
Protocol: FIPA Contract Net Interaction Protocol
The DF provides a yellow page service, meaning it can keep track of services offered by
agents and agents can query the DF to find agents based on the services they offer. A new
agent has to register itself at this facility if it wants to offer its services to other agents.
1
A schematic representation to specify a part of the world.
Content languages are used by agents to describe the things they talk about. Examples of content
languages are: FIPA Semantic Language, Knowledge Interchange Format, Resource Description
Framework and the Web Ontology Language
2
15
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
2.3.5. MTS and ACLMessage
The Message Transport System is responsible for delivering messages from one agent to
another.
Messages follow the Agent Communication Language (ACL) specification [6] from the
FIPA, which is based on the speech act theory [7], and may contain 13 different
parameters. Out of these 13 parameters there is only one mandatory parameter namely the
performative parameter. Most messages will however also have a sender, receiver and
content parameter.
The performative parameter specifies the intention of the message like ‘accept’, ‘reject’,
’inform’, etc. [8]. It is one of the most used parameters for agents to decide on how to
handle a certain message.
An agent can contact the MTS and ask it to send a message. All incoming messages will
be processed by the MTS; the MTS is responsible for delivering the messages to the
agent on its own platform. In order to do this the MTS may access the information
provided by the AMS and DF if needed.
For transporting messages between different platforms the MTS is able to receive
messages through the HTTP and IIOP protocol. These messages can be encoded in XML,
string or binary form. For communication within a platform the MTS is free to choose its
own techniques.
2.3.6. App and Platform
These two classes make the Spyse framework accessible to other developers.
The platform class takes care of initializing the major three services, AMS, DF and MTS
and all supporting libraries that they may use.
The App class should be extended when someone wants to create his own Spyse
application. On initialization the class will process any arguments that are given to it like
a port number to use for incoming messages or the distribution mode that should be used.
The platform class will then be initialized and the main ‘run’ method be called. This
method can be overridden by a custom application and should be the starting point of the
users own program.
16
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
2.4. Environment model of Spyse
Beside communicating through message exchange agents may also have other methods
of interaction. Agent definitions often mention the agent being surrounded by an
environment, physical or only in software.
Spyse offers an environment model that agents can use to keep track of multiple entities.
An entity can represent an agent, but also objects in its environment. In case of a MAS
that’s being used for the simulation of an evacuation of a building the environment could
exist of doors, benches, hallways, etc; agents represent the people in the building.
Agents can exchange information using the environment. An agent can change the
environment, for example by opening a door. An agent can also collect information about
the number of people that are inside a hallway.
The environment can also be used as an alternative way for agents to find each other. An
agent may not know the name of another agent, but could retrieve its address from the
environment model. This can be useful when it notices another agent standing nearby.
Figure 2.5 Environment Model
17
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The environment is formed by a collection of entities. An entity can have a position, size,
speed, etc.
The entities are managed by the Model class; this is the only class that has direct access
to the entities. The agents are not supposed to use this class, instead they should use
either the View class, which can be used to take a look at the environment, or the
Controller class, which can be used to make changes to the environment.
The Environment class is used to setup these three classes, simply by creating an
Environment object.
The default environment doesn’t provide much functionality apart from adding and
removing entities to the model. An example of a more advanced environment is the plane
environment which provides functionality for moving around entities in a two
dimensional plane.
Because the Model class is the only class that has direct access to the entities and the
View and Controller depend on it for their functions, this class is most important for the
distribution of the environment.
18
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
3. Scaling Spyse – Multi tasking
In order to make Spyse capable of running more agents some modifications have been
made. We’ve first taken a look at possibilities to run agents concurrently on a single
system. Later we’ve made the necessary modifications to make a distributed system of
Spyse.
3.1. Multi Task Techniques
Before we have a look at how we have used different multi tasking methods we have
tested in Spyse we will give a quick introduction to techniques that we have used for
them.
3.1.1. Threads
The oldest way of executing multiple tasks at the same time on a single computer system
is by running multiple processes simultaneous. If a system has multiple processors then
true concurrency can be achieved by running each process on a separate processor. The
number of processes however is usually greater then the number of processors available.
To cope with the lack of processes concurrency it is often simulated by letting a single
processor execute multiple processes. A processor will execute a process for a small
amount of time and then switch its focus on another process. By switching very fast it
looks like the processes are running concurrent.
One of the disadvantages of this approach is that every process needs its own state
information, address space, execution stack, etc. Switching from one process to another
will result in what is called a context switch. When switching from process-A to processB all information for program-A, address space, execution stack, etc. has to be stored
somewhere in order to load it again when switching back to program-A.
Communication between different processes is hindered because they don’t share any
memory. So techniques have to be used that either creates a shared memory or that allows
processes to send messages to each other.
Threads are used to overcome these problems. Threads are used to create multiple tasks
inside a single process and thus allowing these tasks to use the same address space and
resources. A separate execution stack is still needed for every thread. When switching
between threads only this execution stack and the processor registers have to be saved
allowing for much faster switches when compared to switching between processes.
The different threads do have to be very careful with the memory space they share.
Different threads may work with the same variables. The time at which the focus is
19
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
switched from one thread to another might determine the outcome of the process.
Switching between threads is a task of the operating system. The operating system can
switch threads anytime it wants, this is called preemptive scheduling. So the developer
cannot know when a switch will occur. This can result in unexpected random behaviour,
known as race conditions [9].
3.1.2. Micro threads
Just like threads are sometimes called light weight processes, micro threads are light
weight threads. Other names are also used for micro threads a few examples are: fibers,
greenlets, user threads, tasklets, etc. These names are found when looking at different
implementations of micro threads. Different implementations of micro threads all have
their own specific features, but they are all based on the same ideas.
Micro threads, just like ‘normal’ threads, share address space and resources. On top of
that they also share the same execution stack. Sharing of the execution stack enables even
faster context switches, when compared to threads.
Micro thread implementations also enable the developer to take care of switching
between tasks. Inside a micro thread one should put a mark when it is safe to switch to
another micro thread. By taking out the randomness of preemptive scheduling no race
conditions should occur.
The scheduling in a micro thread implementation is not done by the operating system but
by a custom scheduler. Once a process is started the main routine is added to this
scheduler as the first, and at that point only, micro thread. More routines can be added to
the scheduler. When in a routine the mark indicating that it’s safe to switch is raised the
scheduler will decide, usually with a round robin algorithm, which micro thread to run
next.
3.2. Multi Task Methods Research
Agents operate autonomous and parallel to each other. To determine which method to
implement in our agent framework four different multi tasking methods have been
studied. As main criterion we’ve taken the execution time needed to execute our test
scenarios. We also tested the hypothesies that the time scales linear with the number of
agents.
Tasklets [10] are part of the Stackless Python library and provide a micro thread
implementation.
-
M1 Threads: Each agent is run in a separate thread, executing its behaviours until
all of them are finished.
20
Scaleable MASs
-
-
Dirk-Jan van Dijk
7/13/2017
M2 Tasklets: Each agent is run in a separate tasklet, executing its behaviours until
all of them are finished, after executing a behaviour the tasklet scheduler is called.
M3 Pooling: A specified number of workers, each running in a separate thread,
are created. Every worker asks a scheduler to provide an agent, execute one step
of a behaviour of the agent and then reschedule the agent if there are behaviours
left.
M4 Custom method: A scheduler schedules the agents in a round-robin pattern.
The scheduler will take one agent, execute one step of a behaviour from the agent
and then reschedule the agent if there are behaviours left.
Two types of test scenarios are created. In the first scenario each agent increments a local
counter, starting from zero, until a specified value has been reached. We call this the
‘Count to X’ scenario, where X is the value that needs to be reached.
In the second scenario the agents share a common counter that will be incremented,
starting from zero, until a specified value has been reached. We call this the
‘Cooperative’ scenario. In all our tests we’ve taken one million as our end value.
All tests have been performed on a system with the following specifications:
-
Pentium 4 CPU at 1.6 GHz
512MB RAM
Windows XP Professional Service Pack 2
Python 2.4.2
Two values haven been measured during the tests:
-
Creation time: The time needed to create an agent.
Runtime: The time an agent is running.
3.2.1. Expectations
Before the results of the experiments are presented we will do predictions of what kind of
the expected behaviour.
Count to ‘X’
In the ‘Count to X’ scenario we expect the time needed for creating and running a group
of agents to increase when the number of agents is increased. This is obvious, because
more agents result in more work to be done.
The M2 and M4 method should perform about the same. Both don’t use preemption and
the context switches should be fast enough to have no significant influence. After an
agent has increased his local counter the scheduler will schedule the next agent. When
using a simple round robin policy the time needed for deciding on which agent to take
21
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
next won’t be influenced by the amount of agents present, O(1). The next one in the line
will simply be chosen. We can define:
-
n: number of agents
c: amount to count to
a: time needed to increment the local counter once
s: time needed to schedule the next agent
r: time needed to create the agent
t: total time needed
Equation 3.1
t  nca  ncs  nr  n(ca  cs  r )
The total time, t, will be linearly dependant on n as all other variables will be constant.
With method M1 and M3 the scheduling will be different. Because the OS will take care
of it we can’t predict when the scheduler will become active and how often. The
Windows operating system uses a strict priority queue to schedule threads; the thread
with the highest priority will always run if possible. There are 32 priority levels, the
highest 15 are reserved for real-time threads and behave a bit differently.
After a normal thread has waited and becomes active he will get a priority boost based on
the time he has waited. A long wait will result in a large boost and visa versa. Such a
boost will never put a thread at the highest real-time priority levels. After a thread has
finished running for a quantum his priority will be decreased by a certain amount until it
has reached his base priority. This increasing and decreasing of priority levels will not
happen for real-time threads.
All of this won’t be of too much influence for any of our scenarios because the agents all
have very similar tasks and it doesn’t matter in which order they are executed, eventually
they will all be executed. So therefore it doesn’t matter which agent gets elected.
Selecting from a priority queue can be as fast as O(1) if the number of priority levels isn’t
changed. If no interrupts occur every thread should be able to finish its quantum before
the scheduler will interrupt it. So the total time needed for the scheduling is not
depending on the amount of threads we have, but on the time our program is running. We
will assume that most of the time our program is busy with increasing a local counter and
we will let the scheduling time be dependent of the amount of times this action occurs.
Again introduce:
-
st: time needed to schedule the threads
Equation 3.2
t  nca  ncst  nr  n(ca  cst  r )
22
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
For the pool scenario we also need to introduce the amount of workers in the pool and the
time a worker needs to select the next agent to work with. This is done in a round robin
fashion just like our custom scheduler and is thus not directly influenced by the amount
of agents we have, but by the amount of times we have to select a new agent:
-
sp: time needed for the pool worker to select a new agent
Equation 3.3
t  nca  ncst  ncs p  nr  n(ca  cst  cs p  r )
So in the end for all our methods the only variable, n, will introduce a linear increase in
time. We do expect the slope to be higher for the methods that use threads as the
Windows thread scheduler is a lot more complicated then the round-robin schedulers of
the other solutions. Also when scheduling a new thread a context switch will occur.
Cooperative
In the ‘Cooperative’ scenario we should be able to see more clearly how much time the
scheduling will take. The amount of work will be the same no matter how many agents
there are present. Having more agents should have no influence on the runtime other then
the extra time needed for scheduling more agents. So for M2 and M4 we get:
-
n: number of agents
c: global amount to count to
a: time needed to increment the local counter once
s: time needed to schedule the next agent
t: total time needed
Equation 3.4
t  ncs  ca
This should still result in a linear dependency, however since the schedule time, s, should
be near zero we expect a very gradual slope.
The same will happen for the threading methods. If our assumption is correct that the
amount of threads will not be of any influence on the scheduling time needed. We get:
-
st: time needed to schedule the threads
Equation 3.5
t  ca  cst  c(a  st )
This should result in no dependency at all of the needed time on the amount of agents.
For our pool solution we get:
-
sp: time needed for the pool worker to select a new agent
23
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Equation 3.6
t  ca  cst  cs p  c(a  st  s p )
Again we expect no dependency between the time on the amount of agents.
General
The difference between threads, tasklets, pooling and our custom method will mostly be
determined by the difference in context switching. This difference should become larger
when the program runs longer, so when more work needs to be done. Tasklets and our
custom method should be faster then the threads and pooling methods, both in creation
and in runtime.
Pooling will probably be slower then threads cause of some extra mechanism that are
needed to supply the agents to the workers.
3.2.2. Count to X Results
Count to 100
6,000
Time (seconds)
5,000
4,000
Threads
Tasklets
3,000
Custom
Pool10
Pool100
2,000
1,000
0,000
100
200
300
400
500
600
700
800
900 1000
Number of Agents
Figure 3.1 Count to 100
24
Scaleable MASs
Dirk-Jan van Dijk
y=cx+b
c
b
7/13/2017
r2
r
threads 4,60x10-3 -0,011 1,000
1,000
tasklets 1,48x10-3 0,007
1,000
1,000
custom
1,53x10-3 -0,004 0,999
0,999
pool10
4,37x10-3 -0,029 1,000
1,000
-3
pool100 5,70x10
-0,116 1,000
0,999
Table 1 Linear Regression - Count to 100
Count to 1000
60,000
Time (seconds)
50,000
40,000
Threads
Tasklets
Custom
30,000
Pool10
Pool100
20,000
10,000
0,000
100
200
400
300
500
600
700
800
900 1000
Number of Agents
Figure 3.2 Count to 1000
y=cx+b
C
b
r
r2
threads 3,05x10-2
-0,110
0,999 0,999
tasklets 1,04x10-2
0,251
0,998 0,995
custom
1,10x10-2
-0,118
1,000 1,000
-2
pool10
3,74x10
0,224
1,000 1,000
pool100 5,14x10-2
-0,040
1,000 1,000
Table 2 Linear Regression - Count to 1000
We can see from the figures that time grows linear for all methods when we increase the
number of agents. The graphs above don’t show more then 1000 agents as we’re limited
to creating 1034 threaded agents when using Python on the Windows XP platform. We
25
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
have tested with thousands of agents, until 90.000, for the M1 and M4 methods which
don’t use threads (see Figure 3.4 Count to 1 million).
When we apply linear regression to these results the first thing we notice is a very high
correlation value between 0.99 and 1.00 for each graph. So based on the graphs and the
high correlation factor it’s safe to conclude that the relationship between the needed time
and the amount of agents is linear. The relationship is more obvious when we increase the
amount to count to and see the slope increase.
We also see that for small tasks, ‘Count to 100’, thread pooling is faster then normal
threads while for bigger tasks, ‘Count to 1000’, normal threading is faster. This can either
be the result of the extra time that’s needed to create and destroy a thread which occurs
more frequently when using a thread for every agent. It can also be possible that an agent
has finished its task before it has used its entire time slice. When this happens in a worker
thread from a thread pool a new agent can be selected and executed in the same time
slice. When every agent runs in its own thread a new thread has to be selected and thus
another context switch is needed.
Furthermore we clearly see the extra time needed for the scheduling and context
switching while using any of the threading methods. Tasklets and our custom method are
more efficient. The more agents there are, the bigger the difference becomes.
3.2.3. Cooperative Results
Cooperative count to 1million
60,000
Time (seconds)
50,000
40,000
Threads
Tasklets
30,000
Custom
Pool10
Pool100
20,000
10,000
0,000
100
200
300
400
500
600
700
800
900
1000
Number of Agents
Figure 3.3 Count to 1 million
26
Scaleable MASs
Dirk-Jan van Dijk
y=cx+b
c
b
r
7/13/2017
r2
threads 11,1x10-3 20,779 0,989
0,979
tasklets 0,41x10-3 10,461 0,781
0,609
custom
1,34x10-3 10,196 0,885
0,783
pool10
1,43x10-3 38,472 0,589
0,347
-3
pool100 2,60x10
50,970 0,967
0,934
Table 3 Linear Regression - Cooperative count
In the above figure we again see the difference between methods that use threads and
those that don’t. For M2 and M4 we notice that the time needed to execute the procedure
remains about the same if we increase the number of agents. Which makes sense as the
amount of work is stable in this scenario and switching between agents takes virtually no
time. In a close up view we can see an increase in time.
The thread method shows a greater increase in time needed when we add more agents,
compared to the other methods. This is most likely due to the threads fighting over the
common variable. The common value is protected by a locking mechanism. It can happen
that one thread is trying to acquire the lock when it’s already been given to another
thread. The more threads that are running the bigger chance this event will occur and
more time is wasted.
With the pool mechanism the number of threads remains the same and thus the chance of
failing to acquire a lock will be the same no matter how much agents there are running.
We see the unpredictable nature of threads as the amount of time sometimes increases
and sometimes decreases possibly due to the fighting over the common variable.
27
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Cooperative count to 1 million
140,000
120,000
Time (seconds)
100,000
Tasklet-Runtime
80,000
Custom-Runtime
Tasklet-Creation
60,000
Custom-Creation
40,000
20,000
0,000
10000 20000 30000 40000 50000 60000 70000 80000 90000
Number of Agents
Figure 3.4 Count to 1 million
run=cx+b
c
b
r
r2
tasklets
7,79x10-3 10,028 0,998
0,996
custom
7,48x10-5 10,035 0,994
0,988
Table 4 Linear Regression - Cooperative Runtime
At very high number of agents we still need little extra time to schedule the agents. The
creation time becomes dominant in our scenario. The extra runtime needed to complete
the scenario could come from disposing all of the agents and not the actual calculation
itself.
3.2.4. Conclusions
Having more agents active in the Spyse framework will introduce little overhead. It
doesn’t matter much if we need to schedule 10 or 1000 agents. This can be seen best in
our cooperative scenario. Particularly when using a method without threads we see that
even when running thousands of agents we still don’t need much extra time to perform
the same amount of work.
We are however limited to about 1.000 agents with the threading method on a Windows
pc. This is due to the stack size that is reserved for each thread. The stack size cannot be
28
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
changed in the Python programming language, unless by changing the source files of the
interpreter which will result in a loss of compatibility.
Threads have an advantage above the other methods because they are preemptive
scheduled by the OS. This is more in line with the autonomous behaviour that agents
should have.
Pooling is a good alternative to overcome the thread limit of the operating system. The
agents are still scheduled in a preemptive matter, but only those that are active in the
worker threads. Pooling does not introduce much overhead and the performance is about
the same as with threads.
If performance is critical one could use tasklets or a custom scheduling method. Both
perform almost the same. Because the tasklets require the use of an additional library, the
custom method is preferred. The preemptive behaviour can be simulated by scheduling
the agents in a random instead of a round-robin fashion. This could introduce starvation
and precautions should be taken if one wants to avoid this.
29
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
4. Scaling Spyse – Distribution
Now that we’ve taken a look at how to run multiple agents efficiently on a single system
we’ll move on to making a distributed system of Spyse.
4.1. Single System
We will now take a closer look to some of the processes that are going on inside the
Spyse framework. We’ll especially look at how processes operate on a single system.
When the framework is run on multiple systems the implementation needs to be adjusted.
4.1.1. Initialization
Figure 4.1 Initialization of Spyse
Here we see the initialization phase of Spyse. A user starts an application with a number
of arguments. The arguments are processed in an App object; a Platform object is
30
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
initialized. The Platform object will take care of initializing the MTS, AMS and DF.
After all that is done, the run method of the UserApp will be called.
4.1.2. Starting an Agent
Both the AMS and DF are agents themselves, all agents should be started by the AMS.
The AMS itself can’t be started through the AMS, it’s the only agent that’s started by the
Platform object during initialization. So in figure 3.5 we see the first agent to be started
by the AMS is the DF. We’ll take a look at the steps that are taken to start an agent.
Figure 4.2 Starting an agent with different scheduling methods
A request to start a new agent can come from either the Platform object, from the
application (a request from the application will be forwarded to the Platform object) or
from a message from another agent to the AMS with a request to start one.
31
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The start_agent method will pass the class and name arguments to the create_agent
function, any other arguments will be passed to the agent’s init method. The class
argument is a so called Python Type [11] and will be used to create the appropriate Agent
(sub)-class.
After the agent is created it will be registered and invoked. Depending on which multi
tasking method is used the agent is either run in its own thread, added to the runnable list
of the AMS and be scheduled by the run_agents method or the agent is scheduled to the
thread pool.
4.1.3. Sending a Message
Figure 4.3 Sending a message
When sending a message we have two options. Either the receiver is on the same
platform or it’s not. If the receiver is on the same platform we can use a number of
32
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
methods for delivering the message. The implementation that has been chosen uses a
remote invocation library to call the receive_message method from the receiving MTS.
In the latter case the message has to be send through either HTTP or IIOP according to
the FIPA specification. The receiving MTS is listening for any incoming messages that
are send this way and deliver the message to the receiving agent.
The MTS may use any methods from the AMS or DF to find the receiving agent. Usually
it can simple ask the AMS for an agent with a given name. In some cases the name of the
receiving agent is not specified and another method for finding an appropriate receiver
should be used.
4.2. Distribution
In order to run large scale multi agent systems on the Spyse framework a single computer
system may not be able to provide enough computing power. Therefore it would be
desirable to run the MAS on multiple systems.
The usual way to achieve this is by letting the agents on different systems communicate
with each other through platform external channels. An example is using HTTP to let
agents send messages to each other. These channels however are optimized for
compatibility and not for performance.
By only offering message passing as a solution to let agents on different systems operate
with each other the creator of the MAS should take care of the distribution process.
An improvement is to let the framework take care of the distribution. We can then
internal channels for communication and take a lot of work away from anyone who wants
to create a distributed MAS.
4.2.1. What to distribute
Agents themselves are self contained; no changes should be made to them. So what are
the main differences between agents working together on multiple platforms and agents
working together on a distributed platform?
Yellow and white page searches should span multiple systems (DF and AMS). When
someone is looking for an agent that offers a specific service the yellow page service
should be able to look not only on its own system for such an agent, but should also be
able to find an agent on another system.
The same environmental model applies for all agents and changes made on one system
should be synchronized with all other systems.
33
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The message sending mechanism should also be extended. It already uses different
methods for sending message within the same platform or between platforms, but for
messages within the same platform we should also make a difference between messages
that should be delivered on a different system.
We’ll first have a look at the possibilities we have for distributing the different
framework components. Our goal is to make a Spyse application be able to run
distributed without changing the program. The framework should take care of all
complications for the user.
Then we’ll have a look at what internal communication mechanism we can use to
optimize the communication performance between the different systems.
4.2.2. Distributed AMS
Agents on the different systems needs be able to find each other and send messages to
each other. Agents use the AMS and DF for finding each other. These services have to be
adapted so that they can find agents that are not running on the local system. For the
AMS we present three different scenarios to achieve this.
Broadcast Update Scenario
In this scenario every Spyse container runs its own AMS, though they will act like a
unity. Whenever some container registers a new agent an update is broadcasted to all
other containers enabling them to update their own registry. The broadcast should contain
the AID of the agent, to enable other containers to store it in their own registry.
Figure 4.4 Broadcast Update - Creating an agent
Methods that need to be adapted:
34
Scaleable MASs
-
Dirk-Jan van Dijk
7/13/2017
create_agent: Register the Agent locally and broadcast an update to all other
containers.
unregister_agent: Unregister the Agent locally and broadcast an update to all
other containers.
Also when a new Spyse container is added to the platform the AMS of this container
should request all AIDs from another AMS.
The advantage of this solution is that all information is always locally available and
searching for an AID can be done efficiently. The downside is that registering the AID at
all the other containers can be time inefficient.
Creating and registering the agents will be an O(n) operation, with n being the number of
containers. A local search will be an O(1) operation and not be dependent on the number
of containers.
Broadcast Retrieve Scenario
In this scenario each Spyse container also runs its own AMS. Whenever a container can’t
find an agent it sends a broadcast to all other containers. The container that contains the
agent will provide the AID of the agent.
Figure 4.5 Broadcast Retrieve - Finding an agent
Methods that need to be adapted:
- find_agent: Broadcast a request to all other containers whenever it can’t find an
Agent locally.
35
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The advantage of this scenario is that registering a new agent only needs to be done at the
local container and will not be an O(1) operation, but obviously searching for an agent
now becomes more inefficient. In the best case the agent will be available locally and we
will still have a O(1) operation, but in the worst case scenario all other containers will
need to be queried before the agent is find, O(n).
Depending on whether it is more common to (un)register a new agent or to find for an
existing one this solution can perform better or worse then the first one.
Central Scenario
Every Spyse container runs its own AMS, but one specific container will act as a global
registry. Whenever a container registers a new Agent it should inform this global registry
about the new Agent. Whenever an Agent can’t be found locally it should ask the global
registry to provide the AID of the Agent.
Figure 4.6 Central - Creating and finding an agent
Methods that need to be adapted:
36
Scaleable MASs
-
Dirk-Jan van Dijk
7/13/2017
find_agent: Check if the Agent is registered on the local container and if not
request the AID from the global registry.
register_agent: Register the Agent locally and send an update to the global
registry.
unregister_agent: Unregister the Agent locally and send an update to the global
registry.
The AMS that is going to act as a server should be the first one to be started.
In this scenario both finding and (un)registering agents can be handled reasonable fast, in
both cases only one other container needs to be informed which will result in a O(1)
operation. The central AMS should however be capable of handling all requests that all
other containers will make.
4.2.3. Directory Facilitator
The changes that need to be made to the DF are almost the same as the changes for the
AMS. We can again choose to broadcast all updates, retrievals or create a central point
containing a global registry.
Only difference is in that a service can be offered by more then one agent. Thus when
trying to find agents that offer a certain service during the ‘broadcast retrieve’ scenario,
one cannot stop when an agent has been found (even if the agent is locally available),
which is possible in the AMS case. There will be no best scenario in which the search
will be of O(1). All search operations will be O(n).
4.2.4. Environment
Agents that are distributed are still part of the same platform and therefore should all use
the same environment model. Just as we did with the AMS we will provide three
possibilities on how to make this model available on multiple containers.
Local Scenario
In this scenario the entire environment is available at every Spyse container. All retrievals
are done locally through a Viewer that will access the Model which has all the entities
available. Any updates that the Model should make on behalf of the Controller should be
broadcasted to all other containers.
Just as the broadcast update scenario from the AMS this can be efficient as long as not
many changes need to be broadcasted. It’s perfectly plausible that for some MAS the
environment will remain static; in this case this would be an ideal scenario.
Central Scenario
37
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The environment is only available at a central server and all updates and retrievals should
take place on this server only. So anything the Model might want to do with the entities,
read or write, has to be redirected to the server.
Just as in the central scenario for the AMS this would mean a reasonable result for both
retrievals and updated as long as the central server doesn’t become a bottleneck.
Distributed Scenario
In this scenario parts of the environment will be distributed among the Spyse containers.
Information on which part of the environment is available at which container should be
available everywhere. So whenever a new container is created or when the borders of the
environment are changed this information should be updated.
When updating/retrieving information from the environment a check should be made to
see if the action can be handled locally or if a retrieval/update method should be called
from another container.
In an ideal case an agent and the part of the environment it wants to access are always on
the same system, because then it’s possible to handle everything locally and very little
communication is needed.
4.2.5. Libraries
For communication within the different parts of the Spyse framework we can either
create our own system on top of the default Python libraries for TCP/UDP
communication or we can use a third party library.
Five different libraries and a custom solution based on the default Python libraries have
been tested and compared with each other. The most important selection criterion is the
speed at which we can send a message, but we’ll also look at other aspects like stability,
documentation or support offered.
In the following paragraphs we’ll discuss the libraries we’ve looked at.
XML-RPC [12]
eXtensible Markup Language-Remote Procedure Call. XML-RPC has been part of the
default Python distribution since Python version 2.2. XML-RPC allows to call functions
of remote applications. In order to do this a function call is encoded in XML format and
send to the remote application using the HyperText Transport Protocol (HTTP). Using
XML and HTTP enables the method to call functions from applications that are
programmed in different languages.
38
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Both HTTP and especially XML, because of the encoding and decoding that needs to be
done, add extra overhead. Despite its limitations it is broadly used among different
platforms and programming languages and there is a lot of documentation available.
XML-RPC can be used to transfer message by calling a function of a remote host that
should deal with incoming messages and supplying the actual message as a parameter for
that function.
RPyC [13]
Remote Python Call enables the user to control a remote Python interpreter. The remote
interpreter can be used just like the local one; objects can be created on the remote
interpreter and the remote interpreter can be instructed to execute specific functions.
Any errors that may occur on the remote interpreter by instructions from the local
interpreter are propagated back to the local interpreter and can be handled there.
Asynchronous calls are also possible, so one can ask a remote interpreter to execute a
function and receive a signal when the remote interpreter has finished.
On the downside we have the extra time that RPyC needs to offer so much transparency.
A lot of messaging is done between a local and a remote interpreter and thus RPyC is
certainly not the fastest of the tested libraries. The documentation is also sparse and
there’s no information about projects using this library. The library is still in active
development.
RPyC can be used to deliver messages in a similar way of how XML-RPC can be used. A
function of a remote Python interpreter should be called supplying the message as
parameter.
PyLinda [14]
PyLinda is not really a communication library, but an implementation of a tuple space. A
tuple space can be seen as a bag that contains tuples. A tuple is a typed and ordered list of
objects, so for instance the tuple (1, ‘A’, 3) consists of 2 integers and a character, in the
order integer, character, integer. In a tuple space one can then ask for all tuples that are of
form integer, character, integer. This bag can be stored on one system and be accessed by
another system.
We’ve looked at PyLinda mainly because it can be useful for the implementation of the
environmental model, but it can also be used for communicating by letting a client insert
messages in the bag and a server grab them about of the bag.
39
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The current version number of PyLinda is 0.6 indicating it’s still in beta phase,
nevertheless the library looks stable, there’s enough documentation available and new
version are also released on a regular basis.
PyRO [15]
Python Remote Objects is a distributed object technology, very much like Java RMI [16],
which makes object accessible for remote use. In order to achieve this, a server can create
an object and register it. Once an object is registered it’s available for other programs to
access it. A client can then create a proxy for the object. A proxy mimics the object and
will forward all method calls to the remote object.
In order create a proxy one needs to have a Uniform Resource Identifier (URI) for the
object. The URI format looks like:
PYROLOC://<hostname>:<portnumber>/<objectname>
A name server can be used to register URIs with a certain name and get a URI by
providing the name.
Pyro is feature rich, besides accessing the remote objects it also has features for security,
authorization, encryption, mobile code and more.
Pyro is still under active development and currently at version 3.5, there’s good
documentation available and Pyro is used in many other projects.
Again the strategy for delivering a message is to call the function of a remote object and
supply the message as parameter.
SPyRO [17]
Simple Python Remote Objects. As the name implies this library resembles the Pyro
library a lot. It’s not as rich as Pyro and should be easy to use. In Spyro one can access
remote objects as if they where residing on the local system. Spyro is created to be used
in multiple languages, but at the moment it’s only implemented in the Python language.
The current version number 0.9.8 indicates that it’s at the end of beta phase and should be
pretty stable. Apart from the API there’s some minimal documentation available, but this
should be enough since there aren’t too many features that need to be documented.
Sockets [18]
Using the standard TCP socket library from Python we’ve also implemented our own
solution for sending messages. All message objects will be serialized using Python’s
pickle module. This module can serialize objects into a character or byte string. We’ve
chosen to send the message as a byte string as this method will result in much smaller
objects.
At the server side the incoming byte string will be de-serialized into objects.
40
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
4.2.6. Test Results
Two tests have been performed. In the first test we send 1.000 messages from a client to a
server. The size of the message is variable, starting with empty messages to messages that
contain 100.000 characters.
In the second test we have send messages of 50.000 characters and we changed the
number of messages that are being sent. Starting with sending 100 messages and increase
this amount to 1.000 messages.
Send 1000 Msgs
25,000
20,000
Time (seconds)
pyro
linda
15,000
rpyc
sockets
10,000
spyro
xml-rpc
5,000
0,000
0
10
20
30
40
50
60
70
80
90
100
Size (in 1k chars)
Figure 4.7 Test 1 Absolute Time
41
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Send 1000 Msgs
14,000
Extra Time Compared to empty
message (seconds)
12,000
10,000
pyro
linda
8,000
rpyc
sockets
6,000
spyro
xml-rpc
4,000
2,000
0,000
0
10
20
30
40
50
60
70
80
90
100
Size (in 1k chars)
Figure 4.8 Test 1 Time Compared to Empty Message
Send X 50k messages
18,000
16,000
Time (seconds)
14,000
pyro
12,000
linda
10,000
rpyc
sockets
8,000
spyro
6,000
xml-rpc
4,000
2,000
0,000
100
200
300
400
500
600
700
800
900
1000
Number of Messages
Figure 4.9 Test 2
42
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
4.2.7. Conclusion
From Figure 4.7 we can clearly see that for small messages there’s little difference
between PyLinda, Spyro, Pyro and Sockets. When the message size increases Pyro and
Sockets continue to stay close together, followed by Spyro, this can also be seen in the
second diagram.
In Figure 4.8 the extra time needed to send a message compared to an empty message is
shown. For Pyro and Sockets we see that the extra needed is about the same. While for
PyLinda and especially XML-RPC we see a considerable increase in the needed time as
the message size increases.
RPyC shows some interesting behaviour. Based on the first diagram it looks like one of
the slowest libraries of the ones we’ve tested. But in the second diagram we see that
increasing the message size has the least influence on RPyC. So for large messages RPyC
will perform relatively better then for small ones and for very big messages it might even
outperform the other methods. In a multi agent system however, messages tend not to be
very large, they usually contain small request or status updates.
In Figure 4.9 we see that all libraries show a linear increase in time as we increase the
number of, equally sized, messages that need to be sent. We see that RPyC and XMLRPC again show up as the slowest libraries and perform about the same. Had we chosen a
smaller message size then the graph would have favored XML-RPC above RPyC and for
a larger message size RPyC would get the overhand. Unless we chose for a very big
message size, both XML-RPC and RPyC won’t be able to keep up with the others. Pyro
is the fastest library in this graph, but closely followed by the others.
Chosen library
In both tests we see Pyro and Sockets perform very well, followed by PyLinda and
Spyro, but for larger messages PyLinda and Spyro performance drops fast.
As already mentioned PyLinda is not designed for message delivery, but despite that fact
the library performs pretty well making it an interesting library to look at for the
environmental model.
Spyro does not offer any special features that are not offered by Pyro apart from being
slightly easier to use, performance is a more important factor.
Sockets are very hard to use and will introduce a lot of extra implementation effort. The
extra effort would only be arguable if they would show a great performance gain, but the
results show that in most cases Sockets aren’t even the fastest solution.
Neither of the tests favors XML-RPC and since it’s a very simple method only allowing
remote function calls without any extra useful features this library is also not the best
candidate.
43
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
RPyC does have some nice features that are not available in other libraries. It is the most
transparent library that we’ve tested, meaning remote objects are truly just accessed as
local objects and very little extra has to be done to work with them. Its lack of
documentation and (relatively) bad performance for small messages are not favorable
aspects.
Pyro looks to be the best solution for our specific case. There’s a lot of documentation
available explaining the many features and giving lots of samples. The list of projects that
are using Pyro is also very convincing. More importantly the library outperforms all other
ones no matter which test we look at.
44
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
5. Semantic networks and SNE
Semantic networks are schemes used to represent knowledge. They are structured in such
a way that they can easily be processed by computers. The knowledge in a semantic
network is stored in nodes. Every node represents a single object or concept. Connections
are made between the nodes to represent relations between them. There are many
different ways of creating such connections [19].
Semantic networks lend themselves perfectly for a distributed approach. The number of
nodes of a semantic network can be very large. Because of that part of the nodes are
distributed over multiple computer systems to improve the performance of the network.
We will first have a look at a semantic network system that’s developed at TNO. Based
on some of its shortcomings we’ll propose a system that makes use of agent technology.
5.1. Semantic Network Model
Before we discuss the system that’s being developed at TNO and our agent based
solution we will first have a look at the semantic network model that’s used in both
systems.
The semantic network model that we’re using contains three elements: nodes, statements
and attributes.
Nodes
Nodes represent concepts. A concept can be anything; a person, a chair, a building, a
formula, etc.
The meaning of a concept can only be derived from the relations it has with other nodes,
which on their turn have relations with even more nodes and the values of the attributes
that are assigned to the node.
A node contains the following properties:
-
NID: Node Identification, which is a unique identifier within the semantic
network.
Name(s): Describes the concept that is represented. A node has at least one name,
but can have more.
Statements: Associations between different nodes.
Attributes: Properties that can be added to nodes, statements or other attributes.
45
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Names
A name consists of a label of a node and a reference to a node which specifies the
language of the label. A node which has three different names could have the following
three names defined:
Figure 5.1 Example of names
Statements
An statement consists of a:
-
Subject: Node containing the subjective concept.
Predicate: Relation representing the meaning of the association.
Object node: Node containing the objective concept.
For example:
Figure 5.2 Example of a statement
“Den Haag is a Town” has a subject: ‘Den Haag, predicate: ‘is a’ and an object: ‘Town’.
Many predicates also have an inverse. An example is shown below.
Figure 5.3 Statement with inverse
Attributes
An attribute has a type and value. For instance if we have a town ‘Den Haag’ we can add
an attribute with type: ‘inhabitants’ and value: ‘480.000’ which indicates that the actor
has 480.000 inhabitants.
46
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Apart from nodes, attributes can also be connected to statements or other attributes. An
example of an attribute that’s connected to a statement is given below.
Figure 5.4 Statement with attribute
The ‘has seen the birth of’ statement has been given an attribute of type ‘birth date’ and
value ’22-12-1932’. This attribute on its turn can be attached to another attribute which
could indicate for instance what kind of date format is used.
5.2. Current system - SNE
The Semantic Network Engine (SNE) has been developed at TNO - Defence, Security
and Safety. The SNE has been in development for many years. The development of its
predecessor, Notion System, started in 1990. The development of the current system
started in 2000.
Over the years the system has been extended and what first was the entire SNE is now
only the kernel of a much larger system. When referring to ‘SNE’ we mean the entire
system, when we talk about the ‘SNE engine’ or simply ‘engine’ we mean the central
component within SNE.
The semantic network system at TNO contains five different parts:
Storage
Provides the physical storage for the network. The nodes and links of the network can be
stored in any kind of data model and stored in any way. For instance in XML files, binary
files, a SQL database, etc.
Engine
The kernel, or sometimes just called engine, of the system which provides an interface
between the storage and applications. The SNE implements the semantic network model
47
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
and provides functionality to modify the network, by creating, deleting or modifying
nodes and also provides functionality to access the data like a search function.
Backend Objects
A layer between the storage and SNE which converts the physical data to the semantic
network model and back. Any kind of backend object (BO) can be written to support any
kind of data model that may be used. For example the semantic model of the SNE is
being converted to a table structure. This table structure can then easily be stored by a
MySQL storage provider.
Services
A layer between the SNE and an application that uses the network. The services can
provide additional functionality that the SNE doesn’t offer. There are many of these small
services implemented like a service that can find a picture belonging to a node, or a
service that fetches information of the nodes in a given language.
Applications
Applications can be any application that uses the semantic network. One of the most used
ones is the web interface that allows the user to view the data of the semantic network in
a browser. The web interface makes use of many services, like the picture and language
service, to show the data in a way that’s easy for the user to understand.
The KGTE is another example of an application that visualizes data in the semantic
network. KGTE is used to visualize the knowledge that’s available at the employees of
TNO DenV.
The SNE is mainly used for research projects which are aimed at solving a single
problem, like the Natural Language Question Parser. The NLQP tries to answer questions
made in natural language like “What is the diameter of Mars?”. In order to find an answer
the NLQP access the data in SNE.
There are numerous other research projects that make use of the SNE. The main goal for
most of them is information extraction.
48
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 5.5 SNE Overview
A good example of how the components are tied together is the web interface application
that can be used to access the semantic network. Through the web interface the nodes can
be searched for by a variety of input parameters. Nodes can also be added, edited, etc.
Once a search has been entered in the web interface a search service will be used to find a
list of nodes that comply with the input parameters. When a node from the list is selected
a view service is used which will try to find an appropriate way of showing the properties
of a node.
The search and view services rely on functionality that’s provided by the SNE to fulfill
their tasks. The SNE on its turn uses the underlying data model and storage to access the
raw data.
Having multiple layers in a system makes it easier to make modifications. It also
separates responsibilities, the engine for instance is only concerned about nodes and will
never be worried about any meaning they might have. However since the data has to pass
all layers it will have a negative influence on the speed of the system.
The SNE engine is a central component within the entire system. This means that for all
tasks the engine will be called upon. This introduces a single point of failure and
bottleneck within the system.
It is possible to distribute the current system by running multiple engines each
responsible for their own part of the network, but a finer level of distribution would be
preferred.
49
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Maintenance of the information in a large network can become cumbersome when using
SNE. The current network consists of 1.3 million concepts, 4 million statements and 9
million attributes.
A common task that one would like to performe while maintaining the information in a
semantic network is an availability check. During such a check the links of the nodes are
checked to see if the resources they are pointing at are still available. These resources will
mainly be other nodes in the network, but an attribute value can represent a resource
outside of the semantic network. A person might have an attribute of type ‘personal
homepage’ and as attribute value the URL of his homepage.
An application that’s responsible for doing availability checks needs to browse through
the network trying checking the nodes one piece at a time. This application would need to
know how to perform the check for each kind of node it can encounter. Links to other
nodes need to be checked differently then links to for instance personal homepages.
When new nodes are added to the system, new types of links could be introduced. The
availability check application would need to be updated to be able to deal with these new
type of links.
The application would also need to keep track of which nodes it has visited and would
need an algorithm to be sure to visit each and every node in the network which would
result in a complexity of O(n2). Since semantic networks can consist of millions of nodes
this task can get to large to handle.
6. Proposed System - IBAS
IBAS stands for Information Bearing Agent System. IBAS is an attempt to recreate the
SNE system using agent technology.
The nodes within a semantic network are represented by agents. An Information Bearing
Agent (IBA) is an agent that’s responsible for the data it’s carrying. With these agents
IBAS creates a semantic network, every agent representing a single node and links
representing relations.
Using software agents to implement a semantic network enables the nodes of the network
to behave in an intelligent matter and act on their own. This should have advantages over
the way of how things are handled within the SNE.
First of all the core of the system would no longer consist of a central component, but of
a collection of agents. Instead of having a central component taking care of any actions
that would involve retrieving or modifying a node, one can talk directly to the agent
that’s responsible for the specific node. As soon as the NID of a node is known one can
send a message to the agent that’s representing this node using the MTS (Message
Transport System: part of Spyse which is responsible for message delivery).
50
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
This would allow any application to skip the layers that we find in the SNE as soon as the
specific node is known. Some services could still be used to find a specific node or make
the handling of common tasks easier, but the access to the actual data would be more
direct.
Because each IBA will operate independently there is no single point of failure or
bottleneck anymore. The amount of computing that is needed could still exceed the
capacity of a single system. Since Spyse offers functionality to run the IBAs on multiple
systems we can easily share the load.
Agents can also provide a powerful security method. When someone tries to access a
secured node in a traditional semantic network he will send a request to an authorization
instance, usually in some encrypted form (for instance HTTPS). If this instance grants
access it will send the information back, again in some encrypted form. After the user’s
browser receives the response, the information will be decrypted and showed to the user.
This scheme has the disadvantage of a central authority instance and provides no way to
secure the data as soon as it has entered the user’s system.
Agents can solve both of these problems. A secured agent can contain information about
who is allowed to access its data; no central authority instance is needed to give access to
its contents. The secured agent can be transferred to the user and can remain at the user
system for as long as needed without giving away the secured data until the user
authorizes himself at the agent.
The main advantages are the possibilities to let the data in the network evolve by itself
without the need of any interior intervention.
Maintenance of IBAS for example can be done by the individual IBAs. This will make an
availability check application superfluous. Instead of a central application that browses
through all the nodes, the nodes can perform any availability checks their selves.
The implementation of how to perform such a check can be implemented in an agent. An
IBA could ask this agent to perform for instance a check to see if a certain web page is
still available. If this is the case it could go idle for a while and choose to check again in a
few hours, days, weeks, depending on parameters like how often it has checked the site in
the past.
When the website is not available it could inform IBA agents that also have web pages on
the same server to perform the same check. Based on those result they can together
decide whether the entire server is down or if only a single page has been removed.
Another simple example involves deciding on selecting nodes to keep in memory so they
can be accessed faster. Because of the large number of nodes it is not possible to keep
them all available at all time. IBA agents can keep track of how often they are accessed.
51
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Depending on this value and the value of the nodes surrounding it, the IBA can decide on
its relative ‘importance’.
A more difficult topic involves grouping similar IBA agents. Based on the contents of
their nodes, two IBA agents could conclude that they represent the same kind of concept.
They could decide to form a group and only move to other systems as a group. An even
more advanced implementation might allow similar IBA agents to teach new actions to
each other. Two IBA agents might represent the same concept and try to perform the
same action in a different way. They could compare their efforts and from then on only
use the most efficient one.
Our agent based semantic network, IBAS, will be using the same semantic network
model as that is used in SNE. This allows us to easily use the data inside SNE to fill our
own semantic network and allows for a good comparison between the two systems. Out
of the advantages we have stated above we will implement a simple availability check for
web pages and we will allow the individuals agents to decide whether they should stay in
memory.
6.1. Terminology
Before we investigate the details of IBAS we will provide some terms which will be
used.
-
IBA: Information Bearing Agent, an agent representing a node in the semantic
network.
IBAS: Information Bearing Agent System, collection of IBAs that form the
semantic network.
IBAS Application: Application that’s build upon the Spyse framework that
contains a semantic network. Applications can be combined, using Spyse, to share
nodes on multiple systems.
6.2. Analysis
A closer look will be taken at the requirements for IBAS. We will start with some use
cases to illustrate which functionality IBAS offer from a user’s point of view.
The functional and non-functional requirements will be extracted from these.
6.2.1. Use Cases
52
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 6.1 IBAS Use Case Diagram
Use case name
Search Node
Participating Actor
User or Admin
Entry condition
1. The user activates the ‘Search’ function on his interface.
Flow of events
2. The user is presented a form in which he can enter search
parameters. The user can search on:
 Names
 Attribute type
 Attribute value
 Combination of attribute type and value
 Statement predicate
 Statement attribute type
 Statement attribute value
 Any combination of statement predicate, attribute type
and/or attribute value
53
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
3. The user submits the search request by pressing a button.
4. The request is submitted to the agent that searches through the
network
Exit condition
5. The result of the search is represented to the user in the form of
a list of nodes that comply with the search parameters.
Use case name
View Node
Participating Actor
User or Admin
Entry condition
1. This use case extends the Search Node use case. It is initiated
when the user is presented the results of the Search Node use case.
Flow of events
2. The user selects a node from the result list by pressing the view
button next to it.
3. The request is submitted to an agent that will browse through the
network to create an appropriate representation.
Exit condition
4. The user is presented a view of the contents of the selected node.
Use case name
Modify Node
Participant Actor
Administrator
Entry condition
1. This use case extends the View Node use case. It is initiated
when the administrator is presented the results of the View Node
use case.
Flow of events
2. The administrator presses the modify button.
3. The administrator enters new values for the fields he wants to
change.
4. The administrator presses the save button.
Exit condition
5. The modified node is stored.
Use case name
Remove Node
Participating Actor
Administrator
54
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Entry condition
1. This use case extends the Search Node use case. It is initiated
when the administrator is presented the results of the Search Node
use case.
Flow of events
2. The administrator selects a node from the result list by pressing
the delete button next to it.
Exit condition
3. The node is removed from the network.
Use case name
Add Node Manually
Participating Actor
Administrator
Entry condition
1. The administrator activates the ‘Add’ function on his interface.
Flow of events
2. The administrator is presented a form in which he can enter the
values for the new node.
3. The administrator fills all values and presses the save button.
Exit condition
4. A new node is added to the network
User case name
Add Node from a XML file
Participating Actor
Administrator
Entry condition
1. The administrator activates the ‘Add’ function on his interface.
Flow of events
2. The administrator is presented a form in which he can select a
file on his local system and presses the submit button.
3. The file is uploaded to the IBAS system.
4. The file is analyzed and a new node is created from the contents.
Exit condition
5. A new IBA will be started representing the new node.
6.2.2. Functional Requirements
-
A user should be able to select a default language in which it would like to view
the data of the nodes.
A user can specify the maximum number of results to show when submitting a
search request.
55
Scaleable MASs
-
Dirk-Jan van Dijk
7/13/2017
The interface will be accessible through a web browser.
6.2.3. Non functional requirements
-
-
IBAS/Spyse is implemented in Python and therefore should be platform
independent. This will be tested on a Windows and a Linux system.
Resistant to failure: The system should be able to handle failure of any of the
systems and continue to run with the nodes still available on the other systems.
Precautions should be taken to actions that might put to much stress on the IBAS,
such as searches which will match will almost every node in the system or the
view operation of a node with too many links.
The interface should respond fast enough to any request given by the user. For
any normal search/modify/delete request this should be within 5 seconds.
The IBAS system should be capable or running with at least 5000 nodes on single
system which has at least a Pentium 4 CPU operating at 1.6 GHz and 512Mb
RAM.
6.3. Design
Since IBAS is an agent based application, instead of defining different subsystems, we
will extract different agents to take care of the tasks in IBAS.
Figure 6.2 Agents in IBAS
56
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
At the center of IBAS we’ll have a collection of IBA agents. These agents represent the
nodes of the semantic network and are linked according to the names, statements and
attributes of these nodes. The IBA agents form the core of the IBAS system, like the
engine of SNE.
Where SNE uses services we introduce special agents that can perform managing tasks
for the user like searching or viewing nodes.
A special agent will act as a bridge between the agent and non-agent world.
6.3.1. Fipa Request Protocol
The FIPA Request Protocol [20] is used inside IBAS. Before we discuss the different
agents we will first explain this protocol. The interaction diagram [21] of this protocol is
shown below.
Figure 6.3 FIPA Request Interaction Protocol
As an example we’ll use the Interface Agent as initiator and the Search Agent as
participant. The initiator sends a request to a participant indicating the request it wants to
do; in our case the interface agent ask the search agent to perform a search.
57
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
The participant will first respond with either an agree or a refuse message indicating if it
wants to fulfill the request. The protocol will be ended when a refuse is sent. In case of an
agree the participant will start the requested operation.
If the participant for some reason fails performing the request he will send back a failure
message. An inform message is sent if the request is completed successfully. This
message will contain the results of the requested operation, in our example this result will
be a list of nodes. The inform-done message is used when no result is expected and only
indicates the participant has successfully completed the request.
6.3.2. IBAS Agents
We will take a more detailed look at the agents that are used in IBAS.
IBA Agent
The IBA is the core agent of IBAS. Each IBA contains a single node of the semantic
web. The name of an IBA will be the same as the NID of the node that it contains. This
will make it easy for any agent to contact an IBA once it knows the NID of the node it is
interested in.
An IBA provides a service to provide the data of this node. To make use of this service
the request protocol is used, with the IBA taking the role of the participant in this
protocol. The content of the request message should specify which part of the data the
initiator is interested in, for instance the name, attributes, statements or everything.
Usually an agent registers its service at the DF (Directory Facilitator: The yellow page
service of Spyse). However an IBA will not register its service at the DF. This is done
because an IBA should not be located by using the DF. If someone would do a search for
every agent that has the data provide service the DF would return the list of all IBA
agents on the platform, since they all will provide this service.
IBA agents should be located by using a search agent. Once another agent has located an
IBA with the use of a search agent it can assume that the IBA has the data provide
service.
Search Agent
The Search Agent’s sole purpose is to provide a search service for other agents. The
search agent takes the role of the participant in the request protocol to provide this
service. Based on the contents of the request message the search agent will execute a
search and return the result of this search in an inform message.
The search agent registers his service at the DF so other agents on the platform can find
the search agent.
58
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Interface Agent
The Interface Agent acts as a bridge between IBAS and other non-agent applications. It
offers functions to search, view and modify the semantic network to non-agent
applications using Pyro and uses message sending to communicate with the agents in
IBAS.
Once the interface agent is started it uses the Pyro daemon of Spyse to register itself.
Other applications can call the search node, view node, add node, delete node and modify
node functions of the interface agent by connecting to its location using Pyro.
Once, for instance, the search node function has been called the interface agent will,
assuming it is already aware of the location of a search agent, add a new Request Initiator
Behaviour to itself. It will send a request message to a search agent with parameters
indicating what it wants to search for. The interface agent will keep waiting until the
newly added behaviour will handle an inform message, which will contain the results of
the search.
View Agent
The View Agent can be asked to create a, to humans, more understandable,
representation of the node of an IBA. The view agent will retrieve the data from an IBA
and will follow the links to other IBAs to be able to give this representation.
The view agent has to act as both request participant and as request initiator. It will start
with acting as a participant and wait for incoming requests. Once it has a received a
request to view a specific IBA it will start a new Request Initiator Behaviour to first
retrieve the node data of the IBA. Once it has received the data it will start new
behaviours to get the names of all the NIDs it finds in the node’s statements and
attributes.
The view agent can be limited to only follow a certain amount of links since some nodes
can have many references.
Node Manager Agent
The NodeManagerAgent can be used to modify the semantic network. Again the request
protocol is used and the node manager will act as participant and will wait for any
incoming requests.
When a node is added to the semantic network the node manager will take care of
creating a new IBA for the node by contacting the AMS, again using the request
behaviour and this time acting as the initiator, and ask it to create the new IBA.
59
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
6.3.3. IBAS Objects
Besides agents there are objects used in IBAS to support the agents with their tasks. We
will give UML class diagrams to describe these objects and we’ll give a brief explanation
about there uses.
Node
Node objects are used to represent the nodes of the semantic network. Every IBA will
have a single Node object as attribute.
Figure 6.4 Node Class Diagram
The figure above shows how a Node is represented within IBAS. The NID is stored as a
string and the names of a node are stored in a list of tupples. Every name tupple consists
of a string with the actual value of the name and a NID of the node that defines the
language of the name.
So the node representing the city of ‘The Hague’ could have the following three tupples:
-
<’Den Haag’,NID-of-Dutch>
<’The Hague’,NID-of-English>
<’La Haie’,NID-of-French>
NodeIndex
We’re using an index in order to keep track of all the nodes in the network. The index can
used to search for nodes.
60
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 6.5 NodeIndex Class Diagram
The NodexIndex is primarily used by the search agent to find NIDs. Three search
functions are available which will all return a list of NIDs.
The index_node function will add the supplied node to the index. The NodeIndex will
browse through the node’s names, statements and attributes and add the necessary
information.
NodeIndex uses PyLucene [22] to index the nodes. PyLucene is a python version of
Lucene [23]. Lucene is a text search engine library and can be used for doing full text
searches.
We are aware that by introducing an index we’re creating a central component in our
otherwise entirely distributed system. There are a number of reasons why we’ve chosen
for this approach.
The nodes, and thus our IBA agents, are linked according to their semantic meaning.
While these links can be used to find interesting properties about certain concepts, they
are useless for a global search for a node with a specific name.
In order to perform a global search we need to use certain algorithms that are found in
P2P networks. This would require links to be made between nodes based on other
properties then their semantics. In the data that we’re using from SNE there is no such
information available.
Creating such information and implementing a good distributed search algorithm would
take up far too much time and could be considered a separate project.
Therefore in our prototype we use a central index, which is located outside of the core of
IBAS, which uses the same technique as SNE.
In order to deal with performance and bottleneck problems that central components can
introduce we can make copies of the NodeIndex and store these copies on every system
that runs a part of the IBAS. These copies can easily be kept up to date by assigning a
NodeManager agent to each index. When an update is needed the request needs to be sent
to each manager. The managers can be find using the DF.
61
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
IBASApp
Figure 6.6 IBASApp Class Diagram
The IBASApp class is responsible for launching the IBAS. Its super class is the Spyse
App class which, amongst others, provides functions to create agents.
SNE can give XML representations of the nodes in the network. These XML
representations can be used to create an IBA. The IBASApp can be given a path
argument on startup which points to a directory containing XML representations of the
nodes of the semantic network. If a path is specified the IBASApp will iterate through the
XML files and will parse them in order to create nodes. Every node will be added to the
index and an IBA will be launched.
IBASApp will also launch the special service agents.
6.3.4. Class model
62
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 6.7 Class Model of IBAS
The class diagram of IBAS shows the different class diagrams of the individual agents.
Since agents only communicate through message exchange there are no associations
between agents shown inside the diagram.
The only associations that do occur are between agents and objects or between objects
and other objects.
6.3.5. Sequence Diagrams
We will first provide the sequence diagrams which will specify in more detail how the
communication through the different parts will look like.
63
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 6.8 Sequence diagram: Search agent
Different forms of communication are used in this sequence diagram. Between the User’s
Browser and the Web Server HTTP will be used for communication.
We will use Pyro between the Web Server and the Interface Agent3.
To find the Search Agent we will use message sending, the Search Agent will call
methods from the Node Index to find the nodes.
The Interface Agent could ask the DF to provide him a list of all Search Agents. This step
is optional. If the Interface Agent has already asked for such a list in the recent past he
could decide to use that list first.
If one Search Agent refuses to execute the request, because it’s already working on
someone else’s request for instance, the interface agent can ask the next one on the list.
3
We have already tested different communication libraries that are available for Python and concluded that
Pyro is one of the fastest solutions and the best for message sending within Spyse. The requirements for
message sending and communication between a web server and our interface agent might be different, but
since Pyro is already present in Spyse it is one of the easiest solutions to use for us.
64
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 6.9 Sequence diagram: View agent
The first steps that need to be taken for viewing an agent resemble the steps for searching
an agent. The difference is made where the search agent calls upon a method from the
node index to perform the search. In case of viewing a node, the view agent requests the
data from multiple IBA agents to form a representation.
The View Agent will request the data from the IBA from which it wants to create a
representation. If needed the View Agent will follow the links inside the data to form a
better representation. A view agent can decide to follow a link for example to get the
name of the node at the other side of the link, since the name will usually be a better
representation then simply the NID string.
The view agent might also want to call upon the services of other agents. It might for
instance request another agent to get an image that might be associated with a certain
node.
65
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 6.10 Sequence diagram: Add node
When adding a new node to the semantic network a new IBA is created which is made
responsible for the node. New agents are created by the AMS so the AMS is being sent a
message to create a new IBA.
The new node is also added to the index so the search agent can find the newly created
IBA.
6.4. Suspending Agents
IBAs in general are not very busy. Most of the time they are waiting for a request to
provide data. Out of thousands of running IBAs only a handful will need to be actively
running. By letting the IBAs enter a suspended state we can save a lot of computing
power.
Suspending agents also enables us to work with more then thousand agents and still be
able to run each agent in his own thread, subduing the thread limit of 1034 threads on the
Windows operating system.
Agents can either choose to enter their suspended state or the AMS may force them to do
so. The AMS could choose to do this when the IBAS is shutdown for instance and it may
resume them all when the IBAS is started again.
An agent that chooses to enter its suspended state can tell the AMS at which time it
wishes to be activated again.
The AMS should keep track of all suspended agents and their wakeup time. This should
be done in a separate behaviour which will check if there are any agents that should be
awoken.
66
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
It can happen that an agent requires a suspended agent to perform a request for it. A view
agent could for instance require the data of a suspended IBA. The AMS should awaken
the IBA in this case to enable the IBA to respond to the request.
When the IBA awakes it should check the current time and notify when it whether
awakened before its set wakeup time. The IBA can decide to remain active and perform
the operations it was planning to do if the current time is close enough to its set wakeup
time. It can also choose to be suspended again until his set wakeup time if he planned on
being suspended for a while.
While an agent is suspended it may happen that the class of that agent gets modified. For
instance in case of an image searcher the functionality of devising a search strategy from
the node data may be improved. When such a specific agent is loaded again it should
notify this change and try to evolve to this new class.
6.5. Load Balancing
One of the advantages of using agents for creating IBAS with Spyse is the distributed
capabilities that Spyse offers. The different IBA agents can be run from multiple systems
spreading the needed computing power.
When operating IBAS from multiple systems one would like to distribute the agents in
such a way that the load is equally divided amongst the systems. This can be done by
moving the agents from one system to another.
In order to decide when to move which agent to which container we will use a load
balancing algorithm. The algorithm should allow agents to decide individually on
whether to move or not. No global information should be needed. Furthermore we would
like the system to reach a stable state after a while so eventually won’t keep sending
agents back and forth when there is no more gain.
We’ll be using the distributed selfish load balancing algorithm [24] to decide on when
and where to move agents to. The algorithm looks as follows for agents:
For each agent a do:
Let Ca be the current container of agent a
Choose Ci at random from other available containers
Let LCa(t) be the current load of container a
Let LCi(t) be the current load of container i
If LCa(t) > LCi(t) then
Move agent a from Ca to Ci with probability 1- LCi(t) /LCa(t)
Equation 6.1 Selfish load balancing algorithm
67
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
This load balancing algorithm has a few important features:
-
No global information is needed; in particular an agent does not need to know the
total number of agents.
The algorithm is strongly distributed and concurrent.
The platform will converge if a new agent enters the platform.
Every agent only needs to ask the load of a single container, keeping the amount
of work constant in each round.
The protocol is easy to implement.
Let T be the number of rounds taken by the protocol to reach equilibrium.
Let m be the total number of agents
Let n be the total number of containers
Then E[T] = O(log log m + n4)
Equation 6.2 Selfish load balancing upper bound
The proof of this theorem can be found in [24] chapter 3.
The load of a container can be defined in different ways. To name just a few of the
possibilities; one could choose to let the load be decided by the number of active agents,
the number of total agents or one could create a ratio by dividing the two numbers. It’s
also possible to assign weights to every agent and use them to calculate the load.
The number of active agents will only give a temporary indication of how busy a
container is. This number can chance drastic in a short time span. We will therefore only
look to the total number of agents on a container.
Adding weights to the agents will complicate the algorithm which we would like to keep
as simple as possible. The tasks of the agents will also be about equally sized and we will
therefore not gain a better distribution by adding the weights.
6.6. Modifications to Spyse
Spyse in its current state should provide enough functionality to implement a system like
IBAS. However functionality from IBAS that could be useful for agent applications in
general should be added to Spyse if possible.
Suspending and activating agents for instance can be implemented by the IBAS system
itself. A special agent could be responsible for suspending and activating agents. It would
be better to implement this functionality in Spyse so any future application can take
advantage of this system. Adding this functionality to the framework is also a more
68
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
logical choice as storing agents to a hard drive and loading them again is a low level
operation which should not be the responsibility of any agent application.
Load balancing is also a feature that can be useful for many applications and should be
supported by the framework if possible. Unfortunately load balancing is a complicated
matter for which there exists no universal solution for every possible application. Since
one of the goals of Spyse is to offer advanced features in a simple matter we have chosen
to implement a basic load balancing algorithm which should provide reasonable results
for most applications. Some parameters to tweak the algorithm can be set by the
application. If a more specific algorithm is needed it should be implemented by the
developer of the agent application.
Systems can already be added and removed dynamically from a running Spyse platform.
What should happen in case of a system failure is again application dependant and should
be the responsibility of the application itself. The Spyse framework can only take care
that the other systems won’t crash together with one failing system and make sure that
any messages that are send to agents on the crashed system raise an appropriate error
message. The agent application should take actions when it realizes that some of the
agents are no longer available and, if possible, decide to let the work be taken over by
other agents that are still available.
69
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
7. IBAS Prototype
A prototype of IBAS has been built using Spyse.
Beside the participant behaviour, that IBA agents use to deal with data requests, four
other behaviours have been created for the IBA agents:
-
MoveBehaviour: Perform load balancing between multiple systems.
SuspendBehaviour: Keep an agent suspended whenever possible.
ScheduleBehaviour: Keep the most used agents in memory and suspend the
others.
AvailabilityBehaviour: Performs a check to see whether a website to which an
attribute of the node links to is still available.
The MoveBehaviour is added to every IBA and also either the Suspend or Schedule
Behaviour.
MoveBehaviour
The move behaviour is added to the IBA agents to let them make use of the load
balancing algorithm, which is implemented in Spyse. The IBA will ask the AMS to
provide them with a target container to move to. The AMS will execute the load
balancing algorithm and will provide the IBA with either an appropriate target or inform
the agent that it’s not needed to move to a new container.
SuspendBehaviour
The suspend behaviour will try to keep the agent suspended whenever possible. This will
result in most agents being suspended at all time and only be activated briefly when a
data request is made.
With the addition of this behaviour the IBA life cycle could be pictured as follows:
70
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 7.1 IBA life cycle 1
When the IBA is created and invoked for the first time the suspend behaviour will
suspend the agent right away. The IBA will be resumed as soon as it’s needed to respond
to a request.
Once the response to the request is sent the IBA will execute the load balancing
algorithm and will either move to a new container or will enter its suspended state again.
As soon as the IBA arrives at the new container it will enter its suspended state on the
new container.
ScheduleBehaviour
The schedule behaviour will check how often an IBA is accessed. The ‘importance’ of an
IBA will be defined by the number of times it is accessed. If the IBA is not important it
will enter its suspended state. If the IBA is important it should stay in memory and a
timer is set for when to check its importance again. An IBA that’s very important will not
need to check its importance again for a long time.
Our decision making algorithm is very basic. If the agent was accessed during the last 5
minutes he will choose to stay in memory.
With this behaviour the life cycle will look as follows:
71
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 7.2 IBA Life Cycle 2
AvailabilityBehaviour
The availability behaviour will be added to agents that have an attribute node of type
‘URL’ indicating the value of the attribute is the URL of a website. The check will be
executed between the ‘respond to request’ and ‘check move’ state.
The outcome of the test will be written to a log. At the moment we have not implemented
any other actions the IBA could take based on whether the resource is still available or
not.
7.1. Test Scenarios
We have tested three different scenarios to show the differences between the different
behaviours.
7.1.1. Scenario 1
In the first scenario we launch the IBAS and we request random nodes from the network,
using the view agent and calculate the time needed for the operation to complete. We
have executed this scenario with the suspend behaviour and with the schedule behaviour
with the same list of random nodes.
The view agent has been limited to follow not more then twenty links. Both behaviours
where tested with 400, 1200 and 4000 nodes.
After initialization both behaviours will be in the same state with all agents being
suspended. After nodes are requested we expect to see the schedule behaviour perform
faster then the suspend behaviour, because more nodes will be available in memory.
72
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
When using the suspend behaviour the agents will be loaded and stored again during
every request. We expect the only difference to be made for this behaviour will result
from the difference in links the view agent needs to follow.
Examples of runs of the system:
IBAS Response Times - 1200 Nodes
1,4
1,2
Time (seconds)
1
0,8
Schedule
Suspend
0,6
0,4
0,2
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
Request Number
Figure 7.3 IBAS Scenario 1 Comparison between behaviours
IBAS Response times (1200 Nodes)
1,4
1,2
0,8
Suspend-1
Suspend-2
0,6
0,4
0,2
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
0
1
Time (seconds)
1
Request Number
73
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Figure 7.4 IBAS Scenario 1 Comparison of different runs
400
1200
4000
16000
Nodes
Nodes
Nodes
Nodes
Schedule
0,5462
0,5468
0,5878 0,5656
Suspend
0,6040
0,8826
0,9760 0,9428
Table 5 IBAS Scenario 1 Average response times
When we look at the table we see that the schedule behaviour performs better then the
suspend behaviour. When we look at the graphs we see that a request basically either
takes about halve a second or one second.
We didn’t expect the suspend behaviour’s performance to drop when increasing the
number of nodes. Apparently caching mechanisms of the hard drive and operating system
still makes part of the data available fast. This would explain why sometimes the suspend
behaviour is equally as fast as the schedule behaviour. Caching data becomes more
difficult when we use more nodes as this would require more cache to be available.
When we repeat the test multiple times we see some randomness in the results for the
suspend behaviour. We do not see this happen for the schedule behaviour. We can
explain this because disk access times are more random then memory access times.
7.1.2. Scenario 2
In the second scenario we repeat requesting the same node. By doing this we will ensure
the corresponding IBA agents to be available in memory at all times when using the
suspending behaviour and we should see a larger performance difference between the two
methods.
We have performed this scenario with requesting a small node, which has about 10
references and with a large node which has about 700 references.
74
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
IBAS Response Times
12
Time (seconds)
10
8
Schedule-Small
Suspend-Small
6
Schedule-Big
Suspend-Big
4
2
0
1
2
3
4
5
6
7
8
9
10
Response Number
Figure 7.5 IBAS Scenario 2 Response times
Small
Big
Schedule
0,5110
5,7877
Suspend
1,0563
10,4578
Table 6 IBAS Scenario 2 Average response times
From both the table and the graph we clearly see the suspend behaviour performing twice
as fast as the suspend behaviour. Having agents idle in memory clearly has the advantage
above storing them on hard disk all the time.
The real challenge would be to devise a smart algorithm to predict which agents are
going to be needed within the near feature.
7.1.3. Scenario 3
In the third scenario we tested the load balancing algorithm and the availability check.
We operated the IBAS with three containers. All agents are created on the first container
and should spread over the other containers as they are activated.
We keep requesting random nodes from the network until we see no more agents
switching containers. We expect to see many agents making the jump from one container
to another when the scenario is just started. As the agents are becoming better distributed
amongst the containers they will be less likely to make the jump.
75
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
IBAS Load Balancing
450
400
Number of agents
350
300
Main Container
250
Remote Container 1
200
Remote Container 2
150
100
50
1280
1200
1120
1040
960
880
800
720
640
560
480
400
320
240
160
80
0
0
Number of requests
Figure 7.6 IBAS Load balancing
We see the results matching our expectations. Many agents leave the main container
during the first number of requests. After 112 requests 100 agents are moved to the other
two containers. At the end we see that we need a lot more requests before an agent is
move.
76
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
8. Conclusion
We started this research project with the goal of evaluating and applying techniques to
allow the creation of a scaleable agent framework. We defined as end target an agent
based application on the framework that operates with at least 10.000 agents.
We started our trajectory by having a closer look at the framework at hand, Spyse. We
have discussed and explained how Spyse operated before we started.
Then we moved on to exploring techniques that we could use to run agents concurrently.
We have explained the differences between using threads, micro threads, thread pooling
or using no threads at all. From our tests we have concluded that micro threads offer little
advantages for agent systems and have decided to implement the other three techniques
so the user of Spyse can decide which one is most appropriate in their situation.
We also had a look at how to distribute the different components from Spyse across
multiple computer systems and discussed the changes that needed to be made. Extra
attention was given to the communication mechanism and we compared different
libraries that could help us.
In order to put our changes to the test we have created a prototype of a semantic network
application using agents to represent the nodes. We have successfully operated IBAS
with 16.000 IBA agents and thus succeeded in creating an agent based application that
runs with more then 10.000 agents.
With adding different small behaviours to the IBA agents the overall system behaves
different. This expresses the power of the agent technology. IBAS is a good prototype to
show how simple small agents can form a complicated application together. Further
development of IBAS should be aimed at developing more behaviours which deal with
the information in the semantic network.
Further research should be done to create more applications that make use of the new
features that are now available and try to find the limits of the current implementation.
The solutions we implemented rely on fast communication mechanisms. We use
primitive broadcasting algorithms to find the agents on the different containers. These
solutions work well on a local area network where the communication lines are fast and
stable. Our solutions will not perform very well on internet links which are much slower
and unstable.
In order for an agent framework to perform well when distributed on the internet the
solution should be found in peer to peer solutions.
The load balancing should also be aimed at grouping agents together based on
communication and not solely on spreading the work load.
77
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
Our techniques, which work well on a LAN, could be combined with techniques
optimized for the internet. Clusters of computer systems that are on the same LAN could
be formed using our techniques. These clusters could then be coupled together using
techniques optimized for the internet.
The entire project has been a real challenge to me. Spyse is an interesting research project
in which agent technology comes together with 'traditional' software. Having worked on
both sides has given me a good insight on how these two worlds differ from each other.
There is still a lot of work left on Spyse before it can meassure itself with agent
frameworks that have been around longer like Jade. One of the difficulties I had is the
lack of proper documentation of the internals of Spyse.
Spyse has a very ambitious goal of allowing people to easily use very advanced
techniques. In order to do this Spyse uses Python because of its simplicity, but also
implements even higher programming languages like 3APL. Having support for higher
level languages eliminates the need to develop the framework itself using a simple
language.
While Python is a nice language it is designed with features like readability and 'fun to
use'. During my research the goal was to run as many agents possible, which is more a
performance issue, something which Python isn't designed for.
Furthermore agents run concurrently and we need threads to do this. The threading
support in Python is not very good. The Python Interpreter itself is not thread safe,
because of this Python programs will not benefit from multiprocessor systems or
multicore processors. There are no priorities. Threads cannot be stopped, suspended,
resumed or interrupted.
Despite these shortcomings we still managed to get an agent system running with 16.000
agents and we should have no problem in adding even more.
78
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
References
[1] Weiss, G. (Editor): Multiagent Systems: A Modern
Approach to Distributed Artificial Intelligence.
MIT Press (2000)
[2] The Foundation for Intelligent Physical Agents
http://www.fipa.org
[3] W3C Semantic Web
http://www.w3.org/2001/sw/
[4] 3APL An Abstract Agent Programming Language
http://www.cs.uu.nl/3apl/
[5] FIPA Contract Net Interaction Protocol Specification
http://www.fipa.org/specs/fipa00029/SC00029H.html
[6] FIPA ACL Message Structure Specification
http://fipa.org/specs/fipa00061/SC00061G.html
[7] Searle, J.: Speech Acts: An essay in the philosophy of language (1969)
[8] FIPA Communicative Act Library Specification
http://fipa.org/specs/fipa00037/SC00037J.html
[9] Wikipedia contributors, 'Race condition', Wikipedia, The Free Encyclopedia (27
September 2006)
http://en.wikipedia.org/wiki/Race_condition
[10] Stackless Python Wiki
http://www.stackless.com/wiki/Tasklets
[11] Chaturvedi, S.: Python Types and Objects (2005)
[12] XML-RPC Simple cross-platform distributed computing, based on the standards of
the Internet.
http://www.xmlrpc.com/
[13] Remote Python Call (RPyC)
http://rpyc.wikispaces.com/
[14] PyLinda – Distributed computing made easy
http://www-users.cs.york.ac.uk/aw/pylinda/
79
Scaleable MASs
Dirk-Jan van Dijk
7/13/2017
[15] Python Remote Objects
http://pyro.sourceforge.net/
[16] Java Remote Method Invocation
http://java.sun.com/products/jdk/rmi/
[17] Simple Python Remote Objects
http://lsc.fie.umich.mx/~sadit/spyro/spyro.html
[18] McMillan, G.: Socket Programming HOWTO
http://www.amk.ca/python/howto/sockets/
[19] Sowa, J.F.: Semantic networks," Encyclopedia of Artificial Intelligence, edited by
Shapiro, S.C., Wiley, New York, 1987; revised and extended for the second edition,
1992.
[20] FIPA Request Interaction Protocol Specification (2002/12/03)
http://fipa.org/specs/fipa00026/SC00026H.html
[21] Huget, M.P.: FIPA Modeling: Interaction Diagrams, Working Draft
Version 2003-07-02
[22] PyLucene
http://pylucene.osafoundation.org/
[23] Apache Lucene Overview
http://lucene.apache.org/java/docs/index.html
[24] P. Berenbrink, T. Friedetzky, L.A. Goldberg, P. Goldberg, Z. Hu, R. Martin:
Distributed Selfish Load Balancing (22 May 2006)
80