A proposition for exception handling in multi-agent systems Frédéric Souchon Christophe Dony C hristelle Urtado L GI2P – Ecole des Mines L GI2P – Ecole des Mines LIRMM d’Alès and L IRMM d’Alès 161 rue Ada Parc scientifique G. Besse 34 392 Montpellier - France Parc scientifique G. Besse 30 035 Nîmes - France 30 035 Nîmes - France +33 4 67 41 85 33 +33 4 66 38 70 78 +33 4 66 38 70 30 [email protected] [email protected]@siteeerie.ema.fr eerie.ema.fr ABSTRACT Reliability is a major issue in new software architectures (such as multi-agent systems or software component based architectures) as they aim at integrating separately developed software parts. In this paper, we focus on both multi-agent systems and a crucial feature to ensure software reliability: an appropriate exception handling system. We first introduce the terminology and main concepts of exception handling. We then present the particularities of exception handling in a multi-agent system and identify three main criteria that an exception handling systems for an agent platform should fulfill. After having noticed the lacks of existing propositions regarding these criteria, we present our solution (the SaGE model): an exception handling system dedicated to multiagent systems that simultaneously addresses the previously identified problems. It is inspired by exception handling systems for concurrent programming that we have adapted and extended to the agent domain. It has been implemented and experimented in the multi-agent system MADKit. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features – Control structures, Concurrent programming structures. General Terms Reliability Keywords Reliability, Exception handling systems, Multi-agent systems, Concurrency. 1. INTRODUCTION New software architectures (such as multi-agent systems [4] or software component based architectures [17]) are based on the integration of numerous software entities being executed concurrently and, in many cases, communicating asynchronously [2][6]. We are interested in reliability concerns in the context of these new architectures and our first research, presented in this paper, focuses on exception handling in the context of multi-agent programming. To our opinion, exception handling capabilities are a must-have t o enable the realization of reliable large scale agent systems. To be adapted to multi-agent systems (MASs), an exception handling system (EHS) must fulfill three main characteristics: Sylvain Vauttier L GI2P – Ecole des Mines d’Alès Parc scientifique G. Besse 30 035 Nîmes - France +33 4 66 38 70 73 [email protected] • preservation of the agent paradigm (agent autonomy, need for a specific EHS, context-dependant answers t o exceptional situations), • taking concurrency and asynchronous communication means into account, • taking the social organization of agents into account. This paper describes an EHS dedicated to multi-agent programming which addresses the three points identified above and extends the MADKit platform [15]. The remainder of this paper is organized as follows. Section 2 introduces the terminology and main concepts of exception handling. Section 3 presents the problematic of exception handling in the context of MASs and studies how existing work responds. Section 4 describes our proposition: the SaGE model. The description is illustrated throughout a comprehensive example. 2. EXCEPTION HANDLING CONCEPTS Exceptional events, often called exceptions [7], are mechanisms to signal undesirable situations that hamper a program standard execution to continue. When an exception occurs, reliable software is able to react appropriately in order to continue execution or, at least, to interrupt it properly while preserving data integrity as much as possible. An exception handling system [1][3][7][12][14][16][22] offers control structures allowing the programmer to define such behaviors, alternative to the standard behavior. This materializes by the capability to s i g n a l exceptions, to define exceptional continuations in exception handlers and, in the code of the handler, to put the system back in a coherent state. This latter objective can be achieved by: • continuing execution where it was interrupted after having modified the context of the problem (resumption), • abandoning a part of the standard treatment and resuming at a reliable point (termination), • or, delegating the treatment of the exception to another entity by signaling a new one (delegation). The signal of an exception provokes the interruption of what is currently being executed and the search for an adequate handler. The definition of a handler (for one or more type of exceptions) results in associating code to determined types of program units (i.e. the source code of a procedure [1], a class definition [3], etc.). After the signal of an exception, an adequate handler is searched among those associated to the program unit in which the exception has occurred. If one i s found, it is executed. If not, the search carries on recursively i n the enclosing program unit. The set of program units the exceptions of which can be treated by a given handler is called the handler scope. Depending on the models, this scope can be determined either statically (on the basis of the lexical structure of the code) or, more often, dynamically at runtime (on the basis of the history of execution contexts). This history is being dynamically built as a program unit (the called unit or the enclosed unit) is activated by another (the calling unit or the enclosing unit). An execution context i s then created and associated to the called unit ; it contains information related to its execution (local data, return address, parameters, etc.). This way, the calling unit precedes the unit being currently executed in the history of execution contexts. Systems that provide handlers with dynamic scope rely on this history of contexts to determine the scope of a given handler at runtime. For example, the scope of a handler associated to a procedure covers the execution of this procedure and the execution of all the procedure it calls (and so on). This type of mechanism is used, for example, in the C++ and Java languages. contexts. Otherwise, handlers can only provide general (context-independent) reactions. In MASs, agents are often organized in social structures. In Aalaadin for MADKit [5], for example, an agent is a member of a group outside of which it cannot communicate. Inside a given group, agents can play one or more roles, meaning they can fulfill one or more group of services. The role then acts as a common interface for a set of agents. When an agent needs t o call a given service, it can address an agent directly to ask for the service realization or address the set of agents that fulfill a given role to ask them all for the service realization. In this latter case, the role is a means to broadcast the demand to all its member agents: The calling service asks the role to transmit the request to its member agents while ignoring both their identity and their number. 3.2 Preservation of the agent paradigm This subsection discusses various points related to the preservation of the agent paradigm by exception handling systems: 1. 3. PROBLEMATIC AND EXISTING WORK In this section, we first describe what we consider to be the basis of agent models that we used in our study. We then develop the three main issues we consider to be the problematic of exception handling in an agent context. 3.1 Basis of multi-agent execution models We have studied exception handling in the context multiagents platform, the basis of which are described in this subsection. Agents execute concurrently and autonomously pieces of code fulfilling their (local) objectives. All agents in a MAS contribute, without being aware of it, to the realization of a global objective (what the MAS is dedicated to). The agents interact with one another by requesting services to other agents (we consider agents to provide and fulfill software contracts [16]). Service requests are sent through an asynchronous messaging protocol. Thus, service calls are non-blocking: the sender can carry on its execution while the recipient is treating the request, and send other parallel service requests. This characteristic provides a means for advanced execution patterns: for example, an agent can redundantly request the same service to different agents in order to get better performance and reliability guaranties. Another example of such patterns is to calculate and consolidate partial results as soon as other agents provide some piece of information i n order to enforce quality of service policies (such as “provide the result as soon it has reached a certain level of quality” or “provide the best result without spending more than a given amount of time”). When a service has successfully been realized, the service supplier sends back its response (using the messaging protocol) to the service client. When an exception is signaled during the realization of a service, the client of the service i s the best entity to handle the exception because it knows the purpose of the service-call (the intention) and how a failure i n this call can be handled regarding the initial intention. Exceptions can be adequately managed only if exception handlers know and have access to the clients’ execution Is there a need for a specific EHS ? 2 . Are supervisor-type architectures a solution ? satisfying As argued in section 3.1, we consider agents from a software engineering point of view (we do not consider their cognitive capabilities) and, therefore, think that multi-agent systems, just as software coded in other paradigms, need exceptional situations to be dealt with adequate control structures. Moreover, we believe that exception handling for MASs requires specific mechanisms because: 1. Agents are high abstraction concepts coded in a system that transparently provides them with the capabilities t o execute concurrently and in a distributed manner and t o communicate throughout an asynchronous messaging protocol. We think exception handling for MASs must be tightly integrated into the agent paradigm (agents must be aware of certain exceptions, exception between agents must be carried by messages, exception handlers must be associated to program units which are related to agents’ activities). This characteristic cannot be reached at the programming language level: the EHS of the language used to code the system cannot itself constitute an appropriate EHS for a MAS. An EHS dedicated to MASs must then be added ; it should be able to connect to the underlying programming language EHS to constitute a unified whole. In that case, it would become possible t o treat all at a time all exception categories (as classified b y [10][11]): low-level exceptions (exceptions related t o implementation - code or virtual machine) and agent specific exceptions. 2 . Agents are autonomous proactive entities that integrally encapsulate their own behavior. Having stated this, we do not consider solutions in which exception handling is delegated to an exterior supervisor agent in order to respect the agent behavioral autonomy. Moreover delegating the handling of exceptions to some exterior entity does not allow the handler to be written i n the context of the entity that called the defective service (as argued in section 3.1). Thus, handlers cannot be contextual and can only code general reactions t o exceptions. Despite the statement made in point 1 (above), among the MASs we surveyed [18], little are those who integrate a specific EHS. Most of them limit their exception handling capabilities to those of their underlying programming language. For example, MADKit [15] is coded in Java and does not have its proper EHS. The only exceptional treatments possible are those allowed by Java. From the programming language point of view, each agent behavior is executed in a thread. Exceptions that occur during the execution of an agent’s code are treated by handlers associated to pieces of its code. When an exception cannot be handled, at the top-level of the call stack, the thread in which the agent is being run i s destroyed by the Java virtual machine. From the MAS point of view, the destruction of the Java thread corresponds to the accidental death of the concerned agent. The lack of an appropriate agent-level EHS implies that no exception signaling the agent's death will be signaled at the MAS level: thus, no other living agent will be informed of the failure. In this context, the only possible solution for the defective agent to signal a failure to the agent demanding a service is to return a special value that the demanding agent will have t o distinguish from a “normal” return value. This ad hoc solution violates software engineering “good practices” that recommend the management of errors by the means of an appropriate EHS. This solution is all the more unsatisfying because it is unlikely that separately developed agents have a concordant interpretation of such special values ; on the contrary, the signal of an exception would be a reliable means to at least be warned without ambiguity that a failure occurred. Among the MASs we have surveyed, those which had their own EHS were all based on supervisors [9][21] (also called monitors). Supervisors are agents specialized in exception handling for all (or groups of) agents of the system. They are the program units to which those systems attach handlers. This type of solution has the drawbacks we just pointed in point 2 (above). Furthermore, these models centralize the handling of exception on a single (or sometimes a few) specialized entities which become a bottleneck that might slow down the whole system, decrease the overall system reliability, etc. As a conclusion of this subsection, we can summarize b y saying that the development of a specific EHS is essential t o exception handling in MAS (the underlying language exception capabilities are not sufficient for agents) and the EHS must not solely rely on a ‘supervisor-type’ architecture. 3.3 Concurrency A multi-agent system is made of software entities (agents) that execute concurrently. Managing exceptions in such a system necessarily implies considering how exceptions have to be managed in a concurrent environment. as an autonomous entity and carry out its proper activity. Thus, in MASs, threads are not disincarnated. They are associated to first class entities and the activities in the system are the activities of these entities. The classification of [19] is adapted here to this specificity of MASs. First, disjoint concurrency is the kind of concurrency provided by systems that actually provide no concurrency management. Each agent is managed as if it was the only active entity in the system. Regarding exception handling, this means that an exception only regards the agent that caused its occurrence (local behavior). There is no system level scheme t o coordinate exception management between agents, as n o global activity of the agents is considered. Second, competitive concurrency is the kind of concurrency provided by systems that manage the isolation of each active entity. The system provides mechanisms to avoid inconsistencies caused by concurrent uses of system resources (generally thanks t o lock-based mechanisms). The aim of such system is to have every agent act as if it was the only agent in the system, but i n a disciplined way. The exception handling mechanisms provided is coherent with this policy: Exception management still is local to each agent. Finally, cooperative concurrency is the kind of concurrency provided by systems that provide some support to the management of collaborations between active entities. Typically, an agent would request some service to other agents, in order to carry out its activity. Thus, agents participate the activities of one another, the completion of which results from the collective action of the participating agents. [19] claims that such a cooperative concurrency management requires an execution model that allows such collective activities to be explicitly represented. This is the starting point to coordinated exception handling i n concurrent systems. Indeed, as global execution contexts are explicitly represented, an EHS can be designed that allows handlers to be associated with such global contexts, in order to manage the impact of the failure of a participant to the collective activity. Exception propagation schemes can thus be elaborated, from the local execution context of the individual activity an agent, to the global execution context of the collective activity t o which it participates, and recursively, to a more global execution context of an enclosing collective activity (the former collective activity is part of which). To summarize, MASs must provide support for cooperative concurrency, as collaboration is a fundamental principle of the agent paradigm (collective and distributed intelligence). As a consequence, MASs’ execution models must provide a means to explicitly represent and control the collective activities of agents. This section aims at presenting what must happen (and what extra concepts will be necessary to correctly manage) i n situations where a service demanded by a client cannot be provided to him, because of the occurrence of an exception. MASs that have their EHS based on supervisors [11][21] provide a form of activity coordination. Indeed, a supervisor can be associated to a group of collaborating agents t o monitor the global activity of the group. While allowing coordination, this category of EHSs lacks the qualities described in section 3.2. [19] provides a classification of three types of concurrency and studies their impact on exception handling. This classification is provided for classical concurrent objectoriented systems. In such systems, execution threads are orthogonal to objects: objects are passive entities that are executed by external threads. Conversely, agents are active entities that own a thread and use this processing power to act MASs that do not have their own specific EHS use the exception handling capabilities of their underlying programming language. Among those languages, Java is the one that is mostly used. To coordinate the activities of agents in such systems, MASs written in Java could use an advanced language feature to coordinate threads: thread groups. When a thread belonging to such a group cannot handle an exception, 3.3.1 A need for activity coordination it is propagated to the thread's group. The exception can therefore be caught and treated at the group level and, for example, impact other threads of the group. By this means, the group acts as a coordinator for the activities of its belonging threads. Thread groups can in turn be included in larger groups thus enabling the constitution of group hierarchies in order t o control various nested levels of collective activities. Thread groups are an interesting Java construct to coordinate concurrent thread activities but: 1. they are not used in the EHSs of the MASs we surveyed, 2. they do not provide direct support for distribution or asynchronous messaging. Therefore, they are not sufficient, as is, to manage coordinated activities of agents (cf. point 1 in section 3.2). 3.3.2 Concerted exception support Once activity coordination is supported and integrates an exception handling mechanism, a means is provided to collect all minor exceptions (that do not need to be individually handled) raised by the participants of a collective activity. A second step is to define a management policy that determines the impact of the co-occurrence of minor failures while completing the global activity. [8] suggests the integration of a concerted exception mechanism in concurrent systems supporting cooperative concurrency. Exceptions concurrently signaled by the entities participating to a collective activity are composed together, b y a resolution function, as a unique exception (called a concerted exception), reflecting the global state of the collective activity. This concerted exception is used instead of the individual exceptions raised by the participants, to trigger handlers at the collective activity level. This is a general principle that can be adapted, through different implementations of its basis concepts (concerted exception and resolution function) to the exception handling mechanism required by a specific system. We will discuss i n the next section the adaptation of these concepts to a MAS, i n order to manage concerted exception when dealing with the collective activities of cooperating agents. The main advantage of this scheme is the versatile support for exception management policy definition it provides. A redefinition of the resolution function associated with a collective activity b y the programmer is enough to change its policy. model presented in this section is the one we designed for M ADKit agents because we needed, firstly, to be able to rely o n a standard framework to manage concurrent activities among and inside agents. Intra-agent concurrent activity management is mandatory to preserve agents’ responsiveness to critical events such as exception signaling. Though initially designed for MADKit, the execution model presented here is generic and we propose it as an execution model for managing the coordination of concurrent activities that could be transposed to any MAS. 4.1.1 Services Exception handling in an agent system presuppose a rather sophisticated execution model. Indeed, agents should always remain responsive to messages, particularly critical messages such as interruption calls (an agent is signaling to another agent that it does not need anymore the service it requested) and exception signaling (an agent is signaling to the agent that requested a service that it cannot carry on its execution due to some failure). To ensure its responsiveness, any agent owns a thread that actively scans its message box in order to be able to trigger some actions as soon as a message is received. When a service request is received and accepted, the execution of the corresponding service is launched. In our model, services are reified concepts and execute in their own thread. The execution of services can explicitly be controlled by the agents in which they are run (for example to interrupt them). Figure 1 show how services are created in the SaGE model. These services fall in two categories: ß atomic services, the execution of which does not depend on other services’ call. These services do not require asynchronous communication means. ß complex services, the execution of which need to ask for other services to be completed. These services rely on the asynchronous communication capabilities of agents. ... Service s = new Service ("Search for a travel", this.getAddress(), serviceID) { public void live () { // body of the service } public ServiceException concert (Vector subServicesInfo) { // body of the concerted exception handling function } 4. SaGE: an exception handling system dedicated to multi-agent systems public void handle (NetworkException exc) { // body of a handler for NetworkException } This section presents our proposal: an EHS that tackles the three issues stressed in the previous section: preservation of the agent paradigm, concurrency and asynchronous messaging support and agents’ social organization support. 4.1 Modifications of the agent execution model to coordinate their concurrent activities Our objective is to conceive an EHS specific to MASs. The execution models proposed by MASs are very different, depending on the applications they target. As an example, the MAS we use for experimentation, MADKit, is a generic MAS that does not prescribe any predefined, fixed execution model: it only provides a framework of versatile communication and management mechanisms for agents. Thus, the execution }; public void handle (NoProviderException exc) { // body of a handler for NoProviderException } ... Figure 1: Service definition and association of handlers to services in SaGE These two kinds of services are very different in nature. An atomic service is implemented by a thread that directly executes the treatment corresponding to the requested service. A complex service is implemented by an internal agent t o which the execution of the requested service is delegated. Implementing complex services as agents is necessary and natural regarding agent oriented programming. From a functional point of view, a complex service must be able t o request the execution of other services and to retrieve their results through asynchronous messages: in other words, a complex service must have the same communication capabilities (mechanisms) as an agent. In this execution model, every service defines the execution context of a request and provides, in case of a complex service, the control structures required for managing cooperative concurrency among agents’ activities. Cascaded requests result in cascaded service executions that form a corresponding logical structure of execution contexts. Figure 3 shows the graph of execution contexts that result from the cascaded calls of services. This structure is comparable to the call stack in procedural or object-oriented programming. Every time an agent receives a request and accepts it, it records the agent to which it is providing a given service. Every time an agent successfully requests a service, on behalf of one of its services, this agent maintains a log containing the id of the service, the id of the request, and the id of the recipient agent. This log is the support of the termination policy for services. 2. Handlers associated with an agent are designed t o catch exceptions which are to be treated with generic mechanisms at the agent level (treatments which are common to all the agent’s services). For example, an agents’ death or the coherence maintenance of agent specific data (what is called the agent’s mental state) can be dealt with exception handlers at the agent level. 3. Handlers associated with a role are designed t o catch exceptions the treatment of which concern all agents fulfilling a given role. For example, if exceptions occurred after a broadcasted service request, the building of a partial response to the calling service or statistics to provide quality of service measures can be dealt with exception handlers at the role level. These handlers are proper to the execution model of our agents: • they are distinct from the handlers provided by the underlying implementation language, • they are associated to the specific execution constructs of the agent model (service, agent, role), • they are triggered by the signaling or the propagation of a MAS level exception. 4.1.2 Adaptation of roles’ semantics In order to make possible to address agents that fulfill a given role with a single message send, the execution model had to be completed with a concept to represent and manage roles. As for services, roles had to be able to send and receive asynchronous messages even if its message sending policy is original and can be compared to a broadcast mechanism. In our proposition we chose to implement the roles as specific agents (that maintains a list of participating agents, communicates through the asynchronous messaging protocol and redefine the message sending capability to be able to broadcast a message to all its participating agents). The execution model described in this subsection provides a means to coordinate the execution of agents’ activities that i s exploited to integrate the exception handling system presented in the following subsection. It can be illustrated b y the example represented on Figure 3 which is the execution model needed to realize travel agency requests in a small agent-based application. In this example, Agency is an agent, Provider role is a role to which three Providers participate, Contact the client is complex service being executed by the Provider 3 agent and Validate is an atomic service which is executed by the Client agent. 4.2 An exception handling system dedicated to MASs 4.2.1 Definition of handlers Exception handlers are naturally associated with the entities that are responsible of the execution of agents’ activities, i.e. services, agents and roles. 1. Handlers associated with a service are designed t o catch and treat the exceptions that could be raised, directly or indirectly, by the execution of the service. This enables a precise, contextual, definition of handler: the goal of the service, its current state and the impact of the exception on its completion can be taken into account when defining the handler. A handler is classically defined by the set of exception types i t can catch and by an associated treatment that defines how these exceptions are handled (as illustrated by Figure 1). A handler can: • execute some treatments, in order to manage the consequences of the abrupt interruption of a service execution, such as restoring the agent in some coherent state, sending some partial result, etc. • send in turn an exception that signals that the handler has not been able to successfully manage the exception. • re-launch a complete execution of the service, after having possibly modified some elements in the execution context in order to try to complete i t successfully. 4.2.2 Exception signaling Exceptions are signaled, during the execution of services or of handlers, thanks to calls to a specific function of the MAS infrastructure, distinct from the exception signaling primitive of the underlying implementation language. This function signals a MAS level exception (cf. Figure 2). ... signal (new ServiceException ("Bad client address", getOwnerAddress ()); ... Figure 2: Exception signaling in SaGE Both exception systems are compatible: language level exceptions, caught by language-level handlers, can be turned into MAS-level exceptions by calling the MAS exception signaling primitive within these language-level handlers. Like other exception signaling primitives, ours takes as a parameter the exception to be signaled. A call to this function internally triggers SaGE ’s exception handling mechanism. Travel agency Search for a travel Select an offer Poll providers Provider 1 Contact parties Provider role Get price Client List price Provider 3 (selected) Organize a travel Validate Contact the provider Get price Contact the client Establish Validate the contract Captions: Agent Role Service Provider 2 Get price Service call Figure 3: Execution of a request to a travel agency 4.2.3 Handler search The heart of an exception handling system is the way handlers are searched for. As a general principle in our system, handlers are searched for locally and then in the enclosing context. When an exception is signaled, the execution of the defective service is suspended. First, a handler for this type of exception is searched for locally, in the list of handlers associated to the service. If such a handler is found, its is executed. If not, the search carries on among the handlers associated to the agent that executed the defective service. In any case, the defective service is terminated. If no handler is found, associated with the service or the agent to which it belongs, the handler is searched for in the entity that requested the defective service: this is the scheme we provide to propagate exception, back in the execution context history. This entity can either be the calling service (as for the service Contact parties of the agent Travel agency i n the example of Figure 3) or the role entity that broadcasted a message to a group of agents fulfilling a given role (as for the service List prices of the Provider role in the example of Figure 3). If no handler is found, the search process reiterates (the whole process is illustrated in Figure 4). The search process ends by either the execution of a convenient handler or the agent’s death (if no handler is found until the top-level service). 4.2.4 Concerted exception support Inspired by [8] and [13], we provide SaGE user with concerted exception support that has been integrated to the exception propagation mechanism. This mechanism allows: • not to react to under-critical situations, • collect exceptions to reflect a collective or a global defect. The concerted exception mechanism is available both at the service and the role level1. Concerted exception support at the service level An exception that is propagated to a calling service, to signal the failure of a called service, is not always critical for this calling service. Indeed, a service can redundantly ask for another service to several agents, to increase reliability and performance. In such a case, only the failure of all the called services (or of a significant amount of them) might be critical for the calling service. 1 No concerted exception handling is required at the agent level. Indeed, the association of handlers to agents is provided as a facility to define handlers that are common t o all the services of the agent. As such, they are managed as any handler associated with a service. Thus, the exceptions that can trigger these handlers are concerted at the service level by the concerted exception functions associated with the services. When a role is alerted that an exception occurred in one of the services after a broadcast, the role logs the exception and does not trigger any handler. When all services are done, if all succeeded, the normal response is sent back to the demander. If one or more services failed due to an exception the concerted exception function is run. Its default behavior is to pack all exceptions that occurred in a composite exception which i s signaled to the demander. Of course, the programmer can define its own concerted exception function to either raise a concerted exception that would better reflect the situation or provide partial results to the demanding service. Calling service Agent executing Handler(s) the called service Called service Handler(s) (a) Standard request Captions: Calling service Search direction Role that broadcasted the service request Handler(s) Agent executing Handler(s) the called service Called service Handler(s) (b) Broadcasted request Figure 4: Handler search process in SaGE To enable concerted exception support, propagated exceptions are not directly signaled to the calling service. Propagated exceptions are stored in an log, associated to the service. This log maintains an history of the so far propagated exceptions (along with information such as the source of the exception). Whenever a new propagated exception is logged, the concerted exception function associated to a service is run to evaluate the situation. This function acts as a filter and a composition function. Regarding the nature or the number of logged exceptions, this function determines if an exception is to be effectively signaled to the calling service. If so, the signaled exception can be the last propagated exception (in case it i s critical enough to be signaled as is) or a new exception that i s calculated from a set of logged exceptions, the conjunction of which creates a critical situation (a concerted exception). As for handlers, concerted exception functions are associated with services. Each concerted exception function is specific t o each service: its exception handling policy is to be defined b y the developer of the service. For this purpose, the developer has access to the exception log (that is automatically maintained by the propagation mechanism), in order to decide if an exception is to be signaled, and to the exception signaling function, to effectively signal the chosen coordinated exception. Concerted exception support at the role level Concerted exception at the role level allows the member agents of the role to be managed as a set. The set of requests emitted by a role to provide a collective service is transparent for the calling service that sent a single request to a role. The role acts as a collector for responses aiming to send back a single (composite) response to the demander. We transposed this behavior in exception handling at the role level. This concerted exception support allowed by SaG E allows acute criticality evaluation. On the one hand, the service Poll Providers of the agent Travel agency broadcasts a requests to get prices from Provider agents, none of these request is individually critical. Thus, the concerted exception function (see Figure 5) associated with the Provider role will collect the exceptions signaling the failure of some Provider agent without signaling any exception, unless a chosen proportion of these agents fail. On the other hand, services as Select an offer or Contact parties are critical regarding the successful completion of the Search for a travel service: their failure immediately results i n the failure of the depending service (the concerted exception mechanisms does not filter the exception nor delay its handling). 5. CONCLUSION AND FURTHER WORK In this paper we presented a proposition to handle exceptions in a multi-agent system. This work has been motivated by the fact that we did not find any fully satisfying support of exceptions in agent platforms we surveyed and by our conviction that large scale agent systems would benefit from adequate exception handling constructs. The proposed model is based on: • our vision of agents as service providers, • and, on our conviction that a service demander is the best entity to handle an exception that occurred in a demanded service because it knows why the service has been requested and how a failure may be dealt with. Our proposition, the SaGE model, allows the programmer to signal agent-level exceptions and to write handlers can be associated either to services, agents or roles. It also enables the programmer to configure the exception propagation policy b y coding a concerted exception function allowing to handle critical exceptions while logging under-critical ones until their conjunction enables to diagnose a concerted exception. Thus, SaGE adapts to specificities of MASs: it conforms to the agent paradigm, deals with concurrency and asynchronous communication capabilities of agents and integrates agents’ social organization concepts. We plan to investigate in various directions as perspectives for this work. We would like to extend EHS in order to be capable of resuming the execution of a service to some chosen p o i n t , a f t e r a successful treatment of an exception. We also plan t o transpose public ServiceException concert (Vector subServicesInfo) { int i=0; // count the number of exception raised in subservices for (int j=0; j<subServicesInfo.size (); j++) if ((ServiceInfo) (subServicesInfo.elementAt (j)).getRaisedException () != null) i++; // if more than 30% failed, it must be due to a network failure if (i> (0.3* subServicesInfo.size ())) return (new NetworkException ()); else return null; } Figure 5: SaGE concerted exception function associated with the Provider role the EHS we provide for agents into a message-oriented component based platform (for example, with J2EE / JMS technologies) in order to see if the encountered needs and difficulties are the same. We also wish to experiment our proposition on a real application. 6. ACKNOWLEDGMENTS This work is the result of a collaboration with Jacques Ferber, creator of the MADKit system[4][5]. The authors thank him for his contribution, ideas, and for the many fructuous discussions . 7. REFERENCES [1] ANONYMOUS, Preliminary Ada Reference Manual, ACM SIGPLAN Notices, 14, 6A, 1979, 1-139. [2] CAMPBELL R., RANDELL B., Error Recovery in Asynchronous Systems, IEEE Transactions on Software Engineering (SE), SE-12 number 8, 1986, 811-826. [3] DONY C., Exception Handling and Object-Oriented Programming: towards a synthesis, ACM SIGPLAN Notices, 25, 10, 1990, 322-330, Proceedings OOPSLA ECOOP '90, MEYROWITZ N. Ed [4] FERBER J., Les systèmes multi-agents, vers une intelligence artificielle distribuée, InterEditions, 1995. (in french) [5] FERBER J., GUTKNECHT O., A meta-model for the analysis and design of organizations in multi-agent systems, Third International Conference on MultiAgent Systems (ICMAS98), 1998, 128-135. [6] GÄRTNER F. C., Fundamentals of Fault Tolerant Distributed Computing in Asynchronous Environments, ACMCS, 31, 1, 1999, 1-26. [7] GOODENOUGH J. B., Exception Handling: Issues and a Proposed Notation, Communications of the ACM, 18, 12, 1975, 683-696. [8] ISSARNY V., Concurrent Exception Handling, In [19], 2001. [9] KLEIN M., DELLAROCAS C., Towards a systematic repository of knowledge about managing multi-agent system exceptions, ASES Working Paper ASES-WP2000-01, 2000. [10] KLEIN M., RODRIGUEZ AGUILAR J. A., Using Role Commitment Violation Analysis to identify Exceptions in Open Multi-Agent Systems, ASES Working Paper ASES-WP-2000-04, 2000. [11]KLEIN M., DELLAROCAS C., Using DomainIndependent Exception Handling Services to Enable Robust Open Multi-Agent Systems: The Case of Agent Death, Journal for Autonomous Agents and Multi-Agent Systems, 2001. [ 1 2 ]KOENIG A. R., STROUSTRUP B., Exception Handling for C++, Proceedings of the “C++ at Work” Conference, 1989, 322-330.. [13]LACOURTE S., Exceptions in Guide, an ObjectOriented Language for Distributed Applications, Springer Verlag,, ECOOP 91, 5-90 LNCS, Grenoble (France), 1990, 268-287. [14]LISKOV B., Distributed Programming in Argus, Communications of the ACM, 31, 3, 1988, 300-312. [15] MADKit, http://www.madkit.org. [16] MEYER B., Disciplined Exceptions, Technical report tr-ei-22/ex, 1988, Interactive Software Engineering, Goleta, CA. [17] PESCHANSKI F., MEURISSE T., BRIOT J.-P., Les composants logiciels : évolution technique ou nouveau paradigme ?, OCM 2000, 2000, 53-65. (in french) [18] RICORDEL P.-M.DEMAZEAU Y., From analysis to deployment: A multi-agent platform survey, Engineering Societies in the Agents World, 1972 LNAI, Springer-Verlag, 2000, 93-105, 1st International Workshop (ESAW'00), Berlin (Germany), 2000. [19] ROMANOVSKY A., DONY C., KNUDSEN J. L., and TRIPATHI A. Eds, Advances in Exception Handling Techniques, Lecture Notes in Computer Science 2022, Springer-Verlag, 2001. [20]ROMANOVSKY A., KIENZLE J., Action-Oriented Exception Handling in Cooperative and Competitive Concurrent Object-Oriented Systems, In [19], 2001, 147-164. [21] TRIPATHI A., MILLER R., Exception Handling in Agent Oriented Systems, In [19], 2001. [ 2 2 WEINREB ] D. L., Signaling and Handling Conditions, Technical report, 1983, Symbolics, Inc., Cambridge, A.
© Copyright 2026 Paperzz