A proposition for exception handling in multi-agent systems

A proposition for exception handling
in multi-agent systems
Frédéric Souchon
Christophe Dony
C hristelle Urtado
L GI2P – Ecole des Mines
L GI2P – Ecole des Mines
LIRMM
d’Alès and L IRMM
d’Alès
161 rue Ada
Parc scientifique G. Besse 34 392 Montpellier - France Parc scientifique G. Besse
30 035 Nîmes - France
30 035 Nîmes - France
+33 4 67 41 85 33
+33 4 66 38 70 78
+33 4 66 38 70 30
[email protected]
[email protected]@siteeerie.ema.fr
eerie.ema.fr
ABSTRACT
Reliability is a major issue in new software architectures (such
as multi-agent systems or software component based
architectures) as they aim at integrating separately developed
software parts. In this paper, we focus on both multi-agent
systems and a crucial feature to ensure software reliability: an
appropriate exception handling system. We first introduce the
terminology and main concepts of exception handling. We
then present the particularities of exception handling in a
multi-agent system and identify three main criteria that an
exception handling systems for an agent platform should
fulfill. After having noticed the lacks of existing propositions
regarding these criteria, we present our solution (the SaGE
model): an exception handling system dedicated to multiagent systems that simultaneously addresses the previously
identified problems. It is inspired by exception handling
systems for concurrent programming that we have adapted and
extended to the agent domain. It has been implemented and
experimented in the multi-agent system MADKit.
Categories and Subject Descriptors
D.3.3 [Programming Languages]: Language Constructs and
Features – Control structures, Concurrent programming
structures.
General Terms
Reliability
Keywords
Reliability, Exception handling systems, Multi-agent systems,
Concurrency.
1. INTRODUCTION
New software architectures (such as multi-agent systems [4] or
software component based architectures [17]) are based on the
integration of numerous software entities being executed
concurrently and, in many cases, communicating
asynchronously [2][6]. We are interested in reliability
concerns in the context of these new architectures and our first
research, presented in this paper, focuses on exception
handling in the context of multi-agent programming. To our
opinion, exception handling capabilities are a must-have t o
enable the realization of reliable large scale agent systems. To
be adapted to multi-agent systems (MASs), an exception
handling system (EHS) must fulfill three main characteristics:
Sylvain Vauttier
L GI2P – Ecole des Mines
d’Alès
Parc scientifique G. Besse
30 035 Nîmes - France
+33 4 66 38 70 73
[email protected]
•
preservation of the agent paradigm (agent autonomy, need
for a specific EHS, context-dependant answers t o
exceptional situations),
•
taking concurrency and asynchronous communication
means into account,
•
taking the social organization of agents into account.
This paper describes an EHS dedicated to multi-agent
programming which addresses the three points identified
above and extends the MADKit platform [15].
The remainder of this paper is organized as follows.
Section 2 introduces the terminology and main concepts of
exception handling. Section 3 presents the problematic of
exception handling in the context of MASs and studies how
existing work responds. Section 4 describes our proposition:
the SaGE model. The description is illustrated throughout a
comprehensive example.
2. EXCEPTION HANDLING CONCEPTS
Exceptional events, often called exceptions [7], are
mechanisms to signal undesirable situations that hamper a
program standard execution to continue. When an exception
occurs, reliable software is able to react appropriately in order
to continue execution or, at least, to interrupt it properly while
preserving data integrity as much as possible. An exception
handling system [1][3][7][12][14][16][22] offers control
structures allowing the programmer to define such behaviors,
alternative to the standard behavior. This materializes by the
capability to s i g n a l exceptions, to define exceptional
continuations in exception handlers and, in the code of the
handler, to put the system back in a coherent state. This latter
objective can be achieved by:
•
continuing execution where it was interrupted after
having modified the context of the problem (resumption),
•
abandoning a part of the standard treatment and resuming
at a reliable point (termination),
•
or, delegating the treatment of the exception to another
entity by signaling a new one (delegation).
The signal of an exception provokes the interruption of what
is currently being executed and the search for an adequate
handler. The definition of a handler (for one or more type of
exceptions) results in associating code to determined types of
program units (i.e. the source code of a procedure [1], a class
definition [3], etc.). After the signal of an exception, an
adequate handler is searched among those associated to the
program unit in which the exception has occurred. If one i s
found, it is executed. If not, the search carries on recursively i n
the enclosing program unit. The set of program units the
exceptions of which can be treated by a given handler is called
the handler scope. Depending on the models, this scope can be
determined either statically (on the basis of the lexical
structure of the code) or, more often, dynamically at runtime
(on the basis of the history of execution contexts). This
history is being dynamically built as a program unit (the
called unit or the enclosed unit) is activated by another (the
calling unit or the enclosing unit). An execution context i s
then created and associated to the called unit ; it contains
information related to its execution (local data, return address,
parameters, etc.). This way, the calling unit precedes the unit
being currently executed in the history of execution contexts.
Systems that provide handlers with dynamic scope rely on this
history of contexts to determine the scope of a given handler
at runtime. For example, the scope of a handler associated to a
procedure covers the execution of this procedure and the
execution of all the procedure it calls (and so on). This type of
mechanism is used, for example, in the C++ and Java
languages.
contexts. Otherwise, handlers can only provide general
(context-independent) reactions.
In MASs, agents are often organized in social structures. In
Aalaadin for MADKit [5], for example, an agent is a member of a
group outside of which it cannot communicate. Inside a given
group, agents can play one or more roles, meaning they can
fulfill one or more group of services. The role then acts as a
common interface for a set of agents. When an agent needs t o
call a given service, it can address an agent directly to ask for
the service realization or address the set of agents that fulfill a
given role to ask them all for the service realization. In this
latter case, the role is a means to broadcast the demand to all
its member agents: The calling service asks the role to transmit
the request to its member agents while ignoring both their
identity and their number.
3.2 Preservation of the agent paradigm
This subsection discusses various points related to the
preservation of the agent paradigm by exception handling
systems:
1.
3. PROBLEMATIC AND EXISTING WORK
In this section, we first describe what we consider to be the
basis of agent models that we used in our study. We then
develop the three main issues we consider to be the
problematic of exception handling in an agent context.
3.1 Basis of multi-agent execution models
We have studied exception handling in the context multiagents platform, the basis of which are described in this
subsection.
Agents execute concurrently and autonomously pieces of code
fulfilling their (local) objectives. All agents in a MAS
contribute, without being aware of it, to the realization of a
global objective (what the MAS is dedicated to). The agents
interact with one another by requesting services to other
agents (we consider agents to provide and fulfill software
contracts [16]). Service requests are sent through an
asynchronous messaging protocol. Thus, service calls are
non-blocking: the sender can carry on its execution while the
recipient is treating the request, and send other parallel service
requests. This characteristic provides a means for advanced
execution patterns: for example, an agent can redundantly
request the same service to different agents in order to get
better performance and reliability guaranties. Another example
of such patterns is to calculate and consolidate partial results
as soon as other agents provide some piece of information i n
order to enforce quality of service policies (such as “provide
the result as soon it has reached a certain level of quality” or
“provide the best result without spending more than a given
amount of time”).
When a service has successfully been realized, the service
supplier sends back its response (using the messaging
protocol) to the service client. When an exception is signaled
during the realization of a service, the client of the service i s
the best entity to handle the exception because it knows the
purpose of the service-call (the intention) and how a failure i n
this call can be handled regarding the initial intention.
Exceptions can be adequately managed only if exception
handlers know and have access to the clients’ execution
Is there a need for a specific EHS ?
2 . Are supervisor-type architectures a
solution ?
satisfying
As argued in section 3.1, we consider agents from a software
engineering point of view (we do not consider their cognitive
capabilities) and, therefore, think that multi-agent systems,
just as software coded in other paradigms, need exceptional
situations to be dealt with adequate control structures.
Moreover, we believe that exception handling for MASs
requires specific mechanisms because:
1.
Agents are high abstraction concepts coded in a system
that transparently provides them with the capabilities t o
execute concurrently and in a distributed manner and t o
communicate throughout an asynchronous messaging
protocol. We think exception handling for MASs must be
tightly integrated into the agent paradigm (agents must
be aware of certain exceptions, exception between agents
must be carried by messages, exception handlers must be
associated to program units which are related to agents’
activities). This characteristic cannot be reached at the
programming language level: the EHS of the language
used to code the system cannot itself constitute an
appropriate EHS for a MAS. An EHS dedicated to MASs
must then be added ; it should be able to connect to the
underlying programming language EHS to constitute a
unified whole. In that case, it would become possible t o
treat all at a time all exception categories (as classified b y
[10][11]): low-level exceptions (exceptions related t o
implementation - code or virtual machine) and agent
specific exceptions.
2 . Agents are autonomous proactive entities that
integrally encapsulate their own behavior. Having
stated this, we do not consider solutions in which
exception handling is delegated to an exterior supervisor
agent in order to respect the agent behavioral autonomy.
Moreover delegating the handling of exceptions to some
exterior entity does not allow the handler to be written i n
the context of the entity that called the defective service
(as argued in section 3.1). Thus, handlers cannot be
contextual and can only code general reactions t o
exceptions.
Despite the statement made in point 1 (above), among the
MASs we surveyed [18], little are those who integrate a
specific EHS. Most of them limit their exception handling
capabilities to those of their underlying programming
language. For example, MADKit [15] is coded in Java and does
not have its proper EHS. The only exceptional treatments
possible are those allowed by Java. From the programming
language point of view, each agent behavior is executed in a
thread. Exceptions that occur during the execution of an
agent’s code are treated by handlers associated to pieces of its
code. When an exception cannot be handled, at the top-level of
the call stack, the thread in which the agent is being run i s
destroyed by the Java virtual machine. From the MAS point of
view, the destruction of the Java thread corresponds to the
accidental death of the concerned agent. The lack of an
appropriate agent-level EHS implies that no exception
signaling the agent's death will be signaled at the MAS level:
thus, no other living agent will be informed of the failure. In
this context, the only possible solution for the defective agent
to signal a failure to the agent demanding a service is to return
a special value that the demanding agent will have t o
distinguish from a “normal” return value. This ad hoc solution
violates software engineering “good practices” that
recommend the management of errors by the means of an
appropriate EHS. This solution is all the more unsatisfying
because it is unlikely that separately developed agents have a
concordant interpretation of such special values ; on the
contrary, the signal of an exception would be a reliable means
to at least be warned without ambiguity that a failure occurred.
Among the MASs we have surveyed, those which had their own
EHS were all based on supervisors [9][21] (also called
monitors). Supervisors are agents specialized in exception
handling for all (or groups of) agents of the system. They are
the program units to which those systems attach handlers. This
type of solution has the drawbacks we just pointed in point 2
(above). Furthermore, these models centralize the handling of
exception on a single (or sometimes a few) specialized entities
which become a bottleneck that might slow down the whole
system, decrease the overall system reliability, etc.
As a conclusion of this subsection, we can summarize b y
saying that the development of a specific EHS is essential t o
exception handling in MAS (the underlying language
exception capabilities are not sufficient for agents) and the
EHS must not solely rely on a ‘supervisor-type’ architecture.
3.3 Concurrency
A multi-agent system is made of software entities (agents) that
execute concurrently. Managing exceptions in such a system
necessarily implies considering how exceptions have to be
managed in a concurrent environment.
as an autonomous entity and carry out its proper activity.
Thus, in MASs, threads are not disincarnated. They are
associated to first class entities and the activities in the
system are the activities of these entities. The classification of
[19] is adapted here to this specificity of MASs.
First, disjoint concurrency is the kind of concurrency
provided by systems that actually provide no concurrency
management. Each agent is managed as if it was the only active
entity in the system. Regarding exception handling, this
means that an exception only regards the agent that caused its
occurrence (local behavior). There is no system level scheme t o
coordinate exception management between agents, as n o
global activity of the agents is considered. Second,
competitive concurrency is the kind of concurrency provided
by systems that manage the isolation of each active entity. The
system provides mechanisms to avoid inconsistencies caused
by concurrent uses of system resources (generally thanks t o
lock-based mechanisms). The aim of such system is to have
every agent act as if it was the only agent in the system, but i n
a disciplined way. The exception handling mechanisms
provided is coherent with this policy: Exception management
still is local to each agent. Finally, cooperative concurrency
is the kind of concurrency provided by systems that provide
some support to the management of collaborations between
active entities. Typically, an agent would request some service
to other agents, in order to carry out its activity. Thus, agents
participate the activities of one another, the completion of
which results from the collective action of the participating
agents. [19] claims that such a cooperative concurrency
management requires an execution model that allows such
collective activities to be explicitly represented. This is the
starting point to coordinated exception handling i n
concurrent systems.
Indeed, as global execution contexts are explicitly
represented, an EHS can be designed that allows handlers to be
associated with such global contexts, in order to manage the
impact of the failure of a participant to the collective activity.
Exception propagation schemes can thus be elaborated, from
the local execution context of the individual activity an agent,
to the global execution context of the collective activity t o
which it participates, and recursively, to a more global
execution context of an enclosing collective activity (the
former collective activity is part of which).
To summarize, MASs must provide support for cooperative
concurrency, as collaboration is a fundamental principle of
the agent paradigm (collective and distributed intelligence).
As a consequence, MASs’ execution models must provide a
means to explicitly represent and control the collective
activities of agents.
This section aims at presenting what must happen (and what
extra concepts will be necessary to correctly manage) i n
situations where a service demanded by a client cannot be
provided to him, because of the occurrence of an exception.
MASs that have their EHS based on supervisors [11][21]
provide a form of activity coordination. Indeed, a supervisor
can be associated to a group of collaborating agents t o
monitor the global activity of the group. While allowing
coordination, this category of EHSs lacks the qualities
described in section 3.2.
[19] provides a classification of three types of concurrency
and studies their impact on exception handling. This
classification is provided for classical concurrent objectoriented systems. In such systems, execution threads are
orthogonal to objects: objects are passive entities that are
executed by external threads. Conversely, agents are active
entities that own a thread and use this processing power to act
MASs that do not have their own specific EHS use the
exception handling capabilities of their underlying
programming language. Among those languages, Java is the
one that is mostly used. To coordinate the activities of agents
in such systems, MASs written in Java could use an advanced
language feature to coordinate threads: thread groups. When a
thread belonging to such a group cannot handle an exception,
3.3.1 A need for activity coordination
it is propagated to the thread's group. The exception can
therefore be caught and treated at the group level and, for
example, impact other threads of the group. By this means, the
group acts as a coordinator for the activities of its belonging
threads. Thread groups can in turn be included in larger groups
thus enabling the constitution of group hierarchies in order t o
control various nested levels of collective activities. Thread
groups are an interesting Java construct to coordinate
concurrent thread activities but:
1.
they are not used in the EHSs of the MASs we surveyed,
2.
they do not provide direct support for distribution or
asynchronous messaging. Therefore, they are not
sufficient, as is, to manage coordinated activities of
agents (cf. point 1 in section 3.2).
3.3.2 Concerted exception support
Once activity coordination is supported and integrates an
exception handling mechanism, a means is provided to collect
all minor exceptions (that do not need to be individually
handled) raised by the participants of a collective activity. A
second step is to define a management policy that determines
the impact of the co-occurrence of minor failures while
completing the global activity.
[8] suggests the integration of a concerted exception
mechanism in concurrent systems supporting cooperative
concurrency. Exceptions concurrently signaled by the entities
participating to a collective activity are composed together, b y
a resolution function, as a unique exception (called a
concerted exception), reflecting the global state of the
collective activity. This concerted exception is used instead of
the individual exceptions raised by the participants, to trigger
handlers at the collective activity level.
This is a general principle that can be adapted, through
different implementations of its basis concepts (concerted
exception and resolution function) to the exception handling
mechanism required by a specific system. We will discuss i n
the next section the adaptation of these concepts to a MAS, i n
order to manage concerted exception when dealing with the
collective activities of cooperating agents. The main
advantage of this scheme is the versatile support for exception
management policy definition it provides. A redefinition of
the resolution function associated with a collective activity b y
the programmer is enough to change its policy.
model presented in this section is the one we designed for
M ADKit agents because we needed, firstly, to be able to rely o n
a standard framework to manage concurrent activities among
and inside agents. Intra-agent concurrent activity management
is mandatory to preserve agents’ responsiveness to critical
events such as exception signaling. Though initially designed
for MADKit, the execution model presented here is generic and
we propose it as an execution model for managing the
coordination of concurrent activities that could be transposed
to any MAS.
4.1.1 Services
Exception handling in an agent system presuppose a rather
sophisticated execution model. Indeed, agents should always
remain responsive to messages, particularly critical messages
such as interruption calls (an agent is signaling to another
agent that it does not need anymore the service it requested)
and exception signaling (an agent is signaling to the agent
that requested a service that it cannot carry on its execution
due to some failure).
To ensure its responsiveness, any agent owns a thread that
actively scans its message box in order to be able to trigger
some actions as soon as a message is received. When a service
request is received and accepted, the execution of the
corresponding service is launched. In our model, services are
reified concepts and execute in their own thread. The execution
of services can explicitly be controlled by the agents in which
they are run (for example to interrupt them). Figure 1 show
how services are created in the SaGE model.
These services fall in two categories:
ß
atomic services, the execution of which does not
depend on other services’ call. These services do not
require asynchronous communication means.
ß
complex services, the execution of which need to ask
for other services to be completed. These services
rely on the asynchronous communication
capabilities of agents.
...
Service s = new Service ("Search for a travel",
this.getAddress(), serviceID)
{
public void live ()
{
// body of the service
}
public ServiceException concert (Vector subServicesInfo)
{
// body of the concerted exception handling function
}
4. SaGE: an exception handling system
dedicated to multi-agent systems
public void handle (NetworkException exc)
{
// body of a handler for NetworkException
}
This section presents our proposal: an EHS that tackles the
three issues stressed in the previous section: preservation of
the agent paradigm, concurrency and asynchronous messaging
support and agents’ social organization support.
4.1 Modifications of the agent execution
model to coordinate their concurrent
activities
Our objective is to conceive an EHS specific to MASs. The
execution models proposed by MASs are very different,
depending on the applications they target. As an example, the
MAS we use for experimentation, MADKit, is a generic MAS
that does not prescribe any predefined, fixed execution model:
it only provides a framework of versatile communication and
management mechanisms for agents. Thus, the execution
};
public void handle (NoProviderException exc)
{
// body of a handler for NoProviderException
}
...
Figure 1: Service definition and association of handlers to
services in SaGE
These two kinds of services are very different in nature. An
atomic service is implemented by a thread that directly
executes the treatment corresponding to the requested service.
A complex service is implemented by an internal agent t o
which the execution of the requested service is delegated.
Implementing complex services as agents is necessary and
natural regarding agent oriented programming. From a
functional point of view, a complex service must be able t o
request the execution of other services and to retrieve their
results through asynchronous messages: in other words, a
complex service must have the same communication
capabilities (mechanisms) as an agent.
In this execution model, every service defines the execution
context of a request and provides, in case of a complex service,
the control structures required for managing cooperative
concurrency among agents’ activities. Cascaded requests
result in cascaded service executions that form a
corresponding logical structure of execution contexts. Figure
3 shows the graph of execution contexts that result from the
cascaded calls of services. This structure is comparable to the
call stack in procedural or object-oriented programming.
Every time an agent receives a request and accepts it, it records
the agent to which it is providing a given service. Every time
an agent successfully requests a service, on behalf of one of its
services, this agent maintains a log containing the id of the
service, the id of the request, and the id of the recipient agent.
This log is the support of the termination policy for services.
2.
Handlers associated with an agent are designed t o
catch exceptions which are to be treated with generic
mechanisms at the agent level (treatments which are
common to all the agent’s services). For example, an
agents’ death or the coherence maintenance of agent
specific data (what is called the agent’s mental state)
can be dealt with exception handlers at the agent
level.
3.
Handlers associated with a role are designed t o
catch exceptions the treatment of which concern all
agents fulfilling a given role. For example, if
exceptions occurred after a broadcasted service
request, the building of a partial response to the
calling service or statistics to provide quality of
service measures can be dealt with exception
handlers at the role level.
These handlers are proper to the execution model of our
agents:
•
they are distinct from the handlers provided by the
underlying implementation language,
•
they are associated to the specific execution
constructs of the agent model (service, agent, role),
•
they are triggered by the signaling or the
propagation of a MAS level exception.
4.1.2 Adaptation of roles’ semantics
In order to make possible to address agents that fulfill a given
role with a single message send, the execution model had to be
completed with a concept to represent and manage roles. As for
services, roles had to be able to send and receive asynchronous
messages even if its message sending policy is original and
can be compared to a broadcast mechanism. In our proposition
we chose to implement the roles as specific agents (that
maintains a list of participating agents, communicates through
the asynchronous messaging protocol and redefine the
message sending capability to be able to broadcast a message
to all its participating agents).
The execution model described in this subsection provides a
means to coordinate the execution of agents’ activities that i s
exploited to integrate the exception handling system
presented in the following subsection. It can be illustrated b y
the example represented on Figure 3 which is the execution
model needed to realize travel agency requests in a small
agent-based application. In this example, Agency is an agent,
Provider
role is a role to which three Providers
participate, Contact the client is complex service being
executed by the Provider 3 agent and Validate is an
atomic service which is executed by the Client agent.
4.2 An exception handling system dedicated
to MASs
4.2.1 Definition of handlers
Exception handlers are naturally associated with the entities
that are responsible of the execution of agents’ activities, i.e.
services, agents and roles.
1.
Handlers associated with a service are designed t o
catch and treat the exceptions that could be raised,
directly or indirectly, by the execution of the service.
This enables a precise, contextual, definition of
handler: the goal of the service, its current state and
the impact of the exception on its completion can be
taken into account when defining the handler.
A handler is classically defined by the set of exception types i t
can catch and by an associated treatment that defines how
these exceptions are handled (as illustrated by Figure 1). A
handler can:
•
execute some treatments, in order to manage the
consequences of the abrupt interruption of a service
execution, such as restoring the agent in some
coherent state, sending some partial result, etc.
•
send in turn an exception that signals that the
handler has not been able to successfully manage the
exception.
•
re-launch a complete execution of the service, after
having possibly modified some elements in the
execution context in order to try to complete i t
successfully.
4.2.2 Exception signaling
Exceptions are signaled, during the execution of services or of
handlers, thanks to calls to a specific function of the MAS
infrastructure, distinct from the exception signaling primitive
of the underlying implementation language. This function
signals a MAS level exception (cf. Figure 2).
...
signal (new ServiceException ("Bad client address",
getOwnerAddress ());
...
Figure 2: Exception signaling in SaGE
Both exception systems are compatible: language level
exceptions, caught by language-level handlers, can be turned
into MAS-level exceptions by calling the MAS exception
signaling primitive within these language-level handlers.
Like other exception signaling primitives, ours takes as a
parameter the exception to be signaled. A call to this function
internally triggers SaGE ’s exception handling mechanism.
Travel agency
Search for
a travel
Select
an offer
Poll
providers
Provider 1
Contact
parties
Provider role
Get
price
Client
List
price
Provider 3
(selected)
Organize
a travel
Validate Contact the
provider
Get
price
Contact
the client
Establish Validate
the contract
Captions:
Agent
Role
Service
Provider 2
Get
price
Service call
Figure 3: Execution of a request to a travel agency
4.2.3 Handler search
The heart of an exception handling system is the way handlers
are searched for. As a general principle in our system, handlers
are searched for locally and then in the enclosing context.
When an exception is signaled, the execution of the defective
service is suspended. First, a handler for this type of exception
is searched for locally, in the list of handlers associated to the
service. If such a handler is found, its is executed. If not, the
search carries on among the handlers associated to the agent
that executed the defective service. In any case, the defective
service is terminated.
If no handler is found, associated with the service or the agent
to which it belongs, the handler is searched for in the entity
that requested the defective service: this is the scheme we
provide to propagate exception, back in the execution context
history. This entity can either be the calling service (as for the
service Contact parties of the agent Travel agency i n
the example of Figure 3) or the role entity that broadcasted a
message to a group of agents fulfilling a given role (as for the
service List prices of the Provider role in the example
of Figure 3). If no handler is found, the search process
reiterates (the whole process is illustrated in Figure 4). The
search process ends by either the execution of a convenient
handler or the agent’s death (if no handler is found until the
top-level service).
4.2.4 Concerted exception support
Inspired by [8] and [13], we provide SaGE user with concerted
exception support that has been integrated to the exception
propagation mechanism. This mechanism allows:
•
not to react to under-critical situations,
•
collect exceptions to reflect a collective or a global
defect.
The concerted exception mechanism is available both at the
service and the role level1.
Concerted exception support at the service level
An exception that is propagated to a calling service, to signal
the failure of a called service, is not always critical for this
calling service. Indeed, a service can redundantly ask for
another service to several agents, to increase reliability and
performance. In such a case, only the failure of all the called
services (or of a significant amount of them) might be critical
for the calling service.
1
No concerted exception handling is required at the agent
level. Indeed, the association of handlers to agents is
provided as a facility to define handlers that are common t o
all the services of the agent. As such, they are managed as
any handler associated with a service. Thus, the exceptions
that can trigger these handlers are concerted at the service
level by the concerted exception functions associated with
the services.
When a role is alerted that an exception occurred in one of the
services after a broadcast, the role logs the exception and does
not trigger any handler. When all services are done, if all
succeeded, the normal response is sent back to the demander. If
one or more services failed due to an exception the concerted
exception function is run. Its default behavior is to pack all
exceptions that occurred in a composite exception which i s
signaled to the demander. Of course, the programmer can
define its own concerted exception function to either raise a
concerted exception that would better reflect the situation or
provide partial results to the demanding service.
Calling service
Agent executing
Handler(s)
the called service
Called service
Handler(s)
(a) Standard request
Captions:
Calling service
Search direction
Role that broadcasted
the service request
Handler(s)
Agent executing
Handler(s)
the called service
Called service
Handler(s)
(b) Broadcasted request
Figure 4: Handler search process in SaGE
To enable concerted exception support, propagated exceptions
are not directly signaled to the calling service. Propagated
exceptions are stored in an log, associated to the service. This
log maintains an history of the so far propagated exceptions
(along with information such as the source of the exception).
Whenever a new propagated exception is logged, the concerted
exception function associated to a service is run to evaluate
the situation. This function acts as a filter and a composition
function. Regarding the nature or the number of logged
exceptions, this function determines if an exception is to be
effectively signaled to the calling service. If so, the signaled
exception can be the last propagated exception (in case it i s
critical enough to be signaled as is) or a new exception that i s
calculated from a set of logged exceptions, the conjunction of
which creates a critical situation (a concerted exception).
As for handlers, concerted exception functions are associated
with services. Each concerted exception function is specific t o
each service: its exception handling policy is to be defined b y
the developer of the service. For this purpose, the developer
has access to the exception log (that is automatically
maintained by the propagation mechanism), in order to decide
if an exception is to be signaled, and to the exception
signaling function, to effectively signal the chosen
coordinated exception.
Concerted exception support at the role level
Concerted exception at the role level allows the member agents
of the role to be managed as a set. The set of requests emitted
by a role to provide a collective service is transparent for the
calling service that sent a single request to a role. The role acts
as a collector for responses aiming to send back a single
(composite) response to the demander. We transposed this
behavior in exception handling at the role level.
This concerted exception support allowed by SaG E allows
acute criticality evaluation. On the one hand, the service Poll
Providers of the agent Travel agency broadcasts a
requests to get prices from Provider agents, none of these
request is individually critical. Thus, the concerted exception
function (see Figure 5) associated with the Provider role
will collect the exceptions signaling the failure of some
Provider agent without signaling any exception, unless a
chosen proportion of these agents fail. On the other hand,
services as Select an offer or Contact parties are
critical regarding the successful completion of the Search
for a travel service: their failure immediately results i n
the failure of the depending service (the concerted exception
mechanisms does not filter the exception nor delay its
handling).
5. CONCLUSION AND FURTHER WORK
In this paper we presented a proposition to handle exceptions
in a multi-agent system. This work has been motivated by the
fact that we did not find any fully satisfying support of
exceptions in agent platforms we surveyed and by our
conviction that large scale agent systems would benefit from
adequate exception handling constructs.
The proposed model is based on:
•
our vision of agents as service providers,
•
and, on our conviction that a service demander is the
best entity to handle an exception that occurred in a
demanded service because it knows why the service
has been requested and how a failure may be dealt
with.
Our proposition, the SaGE model, allows the programmer to
signal agent-level exceptions and to write handlers can be
associated either to services, agents or roles. It also enables the
programmer to configure the exception propagation policy b y
coding a concerted exception function allowing to handle
critical exceptions while logging under-critical ones until
their conjunction enables to diagnose a concerted exception.
Thus, SaGE adapts to specificities of MASs: it conforms to the
agent paradigm, deals with concurrency and asynchronous
communication capabilities of agents and integrates agents’
social organization concepts.
We plan to investigate in various directions as perspectives
for this work. We would like to extend EHS in order to be
capable of resuming the execution of a service to some chosen
p o i n t ,
a f t e r
a
successful treatment of an exception. We also plan t o
transpose
public ServiceException concert (Vector subServicesInfo)
{
int i=0;
// count the number of exception raised in subservices
for (int j=0; j<subServicesInfo.size (); j++)
if ((ServiceInfo)
(subServicesInfo.elementAt (j)).getRaisedException () != null) i++;
// if more than 30% failed, it must be due to a network failure
if (i> (0.3* subServicesInfo.size ())) return (new NetworkException ());
else return null;
}
Figure 5: SaGE concerted exception function associated with the Provider role
the EHS we provide for agents into a message-oriented
component based platform (for example, with J2EE / JMS
technologies) in order to see if the encountered needs and
difficulties are the same. We also wish to experiment our
proposition on a real application.
6. ACKNOWLEDGMENTS
This work is the result of a collaboration with Jacques Ferber,
creator of the MADKit system[4][5]. The authors thank him for
his contribution, ideas, and for the many fructuous
discussions .
7. REFERENCES
[1] ANONYMOUS, Preliminary Ada Reference Manual,
ACM SIGPLAN Notices, 14, 6A, 1979, 1-139.
[2] CAMPBELL R., RANDELL B., Error Recovery in
Asynchronous Systems, IEEE Transactions on
Software Engineering (SE), SE-12 number 8, 1986,
811-826.
[3] DONY C., Exception Handling and Object-Oriented
Programming: towards a synthesis, ACM SIGPLAN
Notices, 25, 10, 1990, 322-330, Proceedings
OOPSLA ECOOP '90, MEYROWITZ N. Ed
[4] FERBER J., Les systèmes multi-agents, vers une
intelligence artificielle distribuée, InterEditions, 1995.
(in french)
[5] FERBER J., GUTKNECHT O., A meta-model for the
analysis and design of organizations in multi-agent
systems, Third International Conference on MultiAgent Systems (ICMAS98), 1998, 128-135.
[6] GÄRTNER F. C., Fundamentals of Fault Tolerant
Distributed Computing in Asynchronous Environments, ACMCS, 31, 1, 1999, 1-26.
[7] GOODENOUGH J. B., Exception Handling: Issues
and a Proposed Notation, Communications of the
ACM, 18, 12, 1975, 683-696.
[8] ISSARNY V., Concurrent Exception Handling, In
[19], 2001.
[9] KLEIN M., DELLAROCAS C., Towards a systematic
repository of knowledge about managing multi-agent
system exceptions, ASES Working Paper ASES-WP2000-01, 2000.
[10] KLEIN M., RODRIGUEZ AGUILAR J. A., Using
Role Commitment Violation Analysis to identify
Exceptions in Open Multi-Agent Systems, ASES
Working Paper ASES-WP-2000-04, 2000.
[11]KLEIN M., DELLAROCAS C., Using DomainIndependent Exception Handling Services to Enable
Robust Open Multi-Agent Systems: The Case of
Agent Death, Journal for Autonomous Agents and
Multi-Agent Systems, 2001.
[ 1 2 ]KOENIG A. R., STROUSTRUP B., Exception
Handling for C++, Proceedings of the “C++ at Work”
Conference, 1989, 322-330..
[13]LACOURTE S., Exceptions in Guide, an ObjectOriented Language for Distributed Applications,
Springer Verlag,, ECOOP 91, 5-90 LNCS, Grenoble
(France), 1990, 268-287.
[14]LISKOV B., Distributed Programming in Argus,
Communications of the ACM, 31, 3, 1988, 300-312.
[15] MADKit, http://www.madkit.org.
[16] MEYER B., Disciplined Exceptions, Technical report
tr-ei-22/ex, 1988, Interactive Software Engineering,
Goleta, CA.
[17] PESCHANSKI F., MEURISSE T., BRIOT J.-P., Les
composants logiciels : évolution technique ou nouveau
paradigme ?, OCM 2000, 2000, 53-65. (in french)
[18] RICORDEL P.-M.DEMAZEAU Y., From analysis to
deployment: A multi-agent platform survey,
Engineering Societies in the Agents World, 1972
LNAI, Springer-Verlag, 2000, 93-105, 1st
International Workshop (ESAW'00), Berlin (Germany),
2000.
[19] ROMANOVSKY A., DONY C., KNUDSEN J. L.,
and TRIPATHI A. Eds, Advances in Exception
Handling Techniques, Lecture Notes in Computer
Science 2022, Springer-Verlag, 2001.
[20]ROMANOVSKY A., KIENZLE J., Action-Oriented
Exception Handling in Cooperative and Competitive
Concurrent Object-Oriented Systems, In [19], 2001,
147-164.
[21] TRIPATHI A., MILLER R., Exception Handling in
Agent Oriented Systems, In [19], 2001.
[ 2 2 WEINREB
]
D. L., Signaling and Handling
Conditions, Technical report, 1983, Symbolics, Inc.,
Cambridge, A.