Mobile Agents in Mobile Data Access Systems

1
Mobile Agents in Mobile Data Access Systems
Yu Jiao and Ali R. Hurson
Computer Science and Engineering Department
Pennsylvania State University
220 Pond Laboratory, State College, PA 16802, U.S.A
{yjiao, Hurson}@cse.psu.edu
Abstract. The heterogeneity and geographical distribution of data sources in
the presence of autonomy make it a very difficult task to provide global
information sharing. When adding mobility and wireless medium to this mix
(Mobile Data Access System), the constraints on bandwidth, connectivity, and
resources worsen the problem. Application of mobile agent technology in a
global information-sharing environment releases the global mobile users from
the constraints imposed by the wireless medium and mobile devices.
This work applies the Mobile Agent technology in a Mobile Data Access
System framework (MAMDAS) using the Summary Schemas Model as the
underlying multidatabase platform. This approach provides better performance
by reducing the network traffic and higher degree of autonomy by allowing
agents to execute without the owners interference. As witnessed by our
experimental results, the MAMDAS exhibits the following advantages
compared to the first SSM prototype:
œ
It is about 6 times faster.
œ
It supports larger number of concurrent queries.
œ
It demonstrates greater scalability, portability, and robustness.
Keywords: Mobile agent, mulitdatabase, information retrieval, heterogeneous
data sources, mobile device, wireless communication
1 Introduction
In a distributed heterogeneous database environment, to overcome the obstacles
brought by the heterogeneity of local data sources, the literature has studied two
possible solutions:
œ Redesign the existing databases to form a homogeneous information sharing
system, and
œ Lay a global system on top of the heterogeneous local databases to provide a
uniform information access method (multidatabase system).
High cost associated with the first choice prevents it from becoming a feasible
solution in many cases. On the other hand, the concept of multidatabase system offers
a more practical solution to share information globally. Many multidatabase
1
The Office of the Naval Support under the contract N00014-02-1-0282 in part has supported
this work.
R. Meersman, Z. Tari (Eds.): CoopIS/DOA/ODBASE 2002, LNCS 2519, pp. 144–162, 2002.
© Springer-Verlag Berlin Heidelberg 2002
Mobile Agents in Mobile Data Access Systems
145
systems maintain a global meta-data that contains the local schema information,
called global schema. Unfortunately, as the size and number of local databases grow,
the global schema may become too large to manage and maintain.
The Summary Schemas Model (SSM) [3] was intended to alleviate problems
associated with global-schema multidatabase approaches through abstracting the
semantic of schemas. Instead of creating a global schema, the SSM relies on a
hierarchical meta-data in which a parent node maintains an abstract form of its
children’s schema, namely a summary schema. The hierarchical structure and the
schema abstraction significantly improve the robustness and provide dynamic
expansion capability to the system. Using an on-line thesaurus, the SSM also supports
imprecise queries. The Information Broker (IB) system, discussed later, is the first
prototype of the SSM. It proves that the SSM is a practical and efficient concept to
model multidatabase systems. The benefits and characteristics of the SSM are briefly
addressed in section 2.
As mobile communication technology advances and the price of mobile devices
decreases, mobile users become an important population among global information
system users. The Mobile Data Access System (MDAS) is an information-sharing
system that allows anywhere, any time access to information. However, this
flexibility comes at the expense of more complicated solutions to database issues due
to the limitation imposed by wireless communication and mobile devices.
The client-server paradigm is a dominant computation model in today’s distributed
application design. Normally, clients communicate with servers via sockets or remote
procedure call (RPC). However, both of these communication methods are not
suitable for the MDAS because they require network connectivity throughout a
session, which cannot be guaranteed in the wireless environment. Fortunately, the
agent-based programming paradigm can release this restriction.
Agents are software entities that can move from one host to another over the
network, based on a certain itinerary setup, and execute designated tasks on their
owners’ behalf. When mobile agents are introduced into the system, mobile users
only need to maintain the communication connection during the agent submission and
retraction. Therefore, the use of mobile agents relaxes requirements on mobile users’
critical resources such as connectivity, bandwidth and power, etc. Moreover, reduced
communication would also improve the system performance. In [13], the authors
showed that the performance improvement by introducing agent technology into the
IB system was as high as 63.3%.
A decision was made to use Mobile Agent technology in Mobile Data Access
System platform (MAMDAS) enhanced by the summary schemas model. This
decision was based on the following expectations:
œ Achieve higher performance compared to previous SSM prototypes by reducing
the network traffic,
œ Support anywhere anytime access under the constrained imposed by mobility and
wireless communication, and
œ Provide higher system scalability.
It should be noted that, it is not the intention of this paper to enumerate the
advantages and characteristics of the SSM. The intention is to briefly outline the
initial SSM prototypes and the experiences learned from these prototypes. We then
focus on the application of mobile agent technology to remedy the experienced
146
Y. Jiao and A.R. Hurson
shortcomings. The rest of the paper is organized as follows: The necessary
background information is briefly covered in section 2. The related work is discussed
in section 3. Section 4 gives the details of the MAMDAS system. Section 5 addresses
and analyzes our experimental results. Finally, we conclude our work and point out
some future research plan in section 6.
2 Background
2.1
The Summary Schemas Model for Multidatabase Systems
Database systems serve critical functions in government, business applications, and
academic research. In many cases, application domains are required to perform
disjoint functions on shared distributed information sources using different
computational platforms and/or different Database Management Systems (DBMS).
Multidatabase technology has been proposed to provide a transparent uniform access
method to heterogeneous data sources with minimum cost.
The literature is abounded with solutions to multidatabase system design [3].
Terms such as multidatabase language systems, global schema multidatabases,
federated databases, and interoperable systems have been frequently discussed in the
literature. Within the scope of multidatabase technology, the Summary Schemas
Model (SSM) [3] is an attempt to provide a transparent and uniform access to
heterogeneous data sources while preserving the local autonomy. It is designed to
support the identification of semantically similar/dissimilar data entities. The model
maintains a hierarchical meta-data based on access terms exported from underlying
local databases. This meta-data is used to intelligently resolve name differences using
word relationships defined in a standard dictionary such as Roget’s Thesaurus. Users
can submit imprecise queries at any site without knowing the location of requested
access terms and/or the local access terms. Based on the data semantics, the SSM
maps imprecise query terms with precise access terms found at local databases.
Figure 1 depicts the organization of the SSM model. A schema at each local node is
a list of access terms. Mapping access terms of lower level nodes to their hypernyms
and resolving semantic similarities among the hypernyms forms a summary schema.
As one can conclude, each summary schema is smaller and more abstract than the
union of its lower level schemas. The SSM model was simulated and its performance
was evaluated under various schema distributions, query complexity and network
topology [3]. The simulation results showed that both precise and imprecise queries
incur comparable cost, and hence have comparable performance. In certain cases, the
SSM imprecise query processing even outperforms a precise query processing. In
general, relative to other approaches of multidatabase implementation, the SSM is a
robust approach that preserves local autonomy, and offers higher performance and
scalability.
Mobile Agents in Mobile Data Access Systems
2.2
147
The Mobile Agent Technology
Advances in wireless communication provide great convenience for mobile users to
access information resources anywhere, at any time. However, wireless communication and mobility also brings some obstacles to access information. This work
mainly addresses issues related to mobile users who connect to the information
system through low quality, low bandwidth wireless networks.
The most common distributed application design model is the client-server model.
Many client-server based applications apply one of the two communication
mechanisms: Socket (essentially message passing) or Remote Procedure Call (RPC).
The socket allows two programs (the client and server) to communicate through a file
descriptor (a socket). The RPC hides the underlying distributed system. Programmers
treat all the operations as local operations while in fact the system may perform part
of the work on other machines. Unfortunately, within the domain of mobile
computing, the client-server model exhibits several disadvantages:
Fig. 1. A Summary Schemas Model with M local nodes and N levels.
œ The physical connection between the client and the server must be maintained
throughout the session. In case of disconnection, the whole communication
procedure needs to be started over again. In MDAS this is not acceptable due to the
limited power source at the mobile unit and low communication bandwidth of
wireless medium.
œ Congested network traffic since retransmissions might be necessary to compensate
for disconnected session.
Based on these considerations, we concluded that the client-server paradigm is not
an ideal solution for applications in a mobile environment. The use of mobile agents
alleviates the problems imposed by wireless communication such as instability
(frequent disconnections) and unreliability (message losses). Clients can disconnect
from the network after submitting their mobile agents. Mobile agents can roam the
network and fulfill their tasks such as information retrieval from a multidatabase and
negotiation with other agents. Mobile agents also have the ability to make decisions
based on different situations on their owner’s behalf. Thus, the owner only needs to
maintain the physical connection during the agent submission and retraction.
Contemporary mobile agent system implementations fall into two main groups:
Java-based and non-Java-based. Systems such as IBM Aglet Workbench [9], Odyssey
[16], Concordia [4], and Voyager [15] choose Java as the implementation language.
Some other companies developed mobile agent systems using different languages,
148
Y. Jiao and A.R. Hurson
such as Tcl, Scheme, and python. Several examples are TACOM [15], Ara [1], and
Agent Tcl [8]. We argue that Java-based agent systems are better choice for mobile
agent application design in that the Java language’s platform independent feature
makes it ideal for distributed application designs. Thus, we chose the IBM Aglet
Workbench SDK 2.0 as our implementation tool.
3
3.1
Related Work
The Information Broker System
3.1.1
System Overview
The Applied Research Lab (ARL) of Pennsylvania State University proposed the
Information Broker (IB) system as a solution for remote electronic-mechanical
equipment maintenance, diagnosis, and prognosis. It is the first prototype of the SSM.
The IB system attempts to achieve three objectives: preserve the autonomy of all data
resources, offer a uniform data search interface hiding the differences of underlying
databases, and support imprecise queries and release the requirement for users’
knowledge of databases.
The IB system adopts the conventional client-server computation model. The
system consists of three servers: a Thesaurus Server, a SSM Administration Server,
and a Query Server. Each server has a Graphical User Interface (GUI) that eases the
use of the server. Local nodes and summary-schemas nodes run on a set of hosts
connected through a network. The administrator can start and stop a node (either
summary-schemas node or local node) by sending commands to the Daemon program
residing on each host. In this system, clients communicate with the servers via
datagram sockets. Figure 2 illustrates the architecture of the IB system.
The system administrator can construct the summary-schemas hierarchy through
the SSM Admin GUI. Users submit queries through the Data Search GUI. In order to
form a query, the user needs to supply the following information: the category
preference (to narrow down the search scope), the node to start the search with, the
keyword, and a preferred semantic distance (loose match or close match). The valid
Data Search GUI is then submitted to the Query Server. After receiving a query, the
Query Server initiates query resolution process from the originating node as
designated by the user. When presenting the results, the Query Server displays all the
terms satisfying the user’s preferred semantic distance.
3.1.2
Observations
The first SSM prototype (the IB system) as anticipated, allows novice users to submit
imprecise queries without any knowledge of the location and/or the structure of the
local data sources while preserving the local database autonomy. This demonstrated
the feasibility and practicality of the SSM as a tool to model multidatabase systems.
In addition, the IB system also allowed us to observe several shortcomings; some due
to the way the IB system was implemented:
œ Lack of portability: This was due to the mix of C and Java languages used to
build the prototype system. The server and node programs were mainly
Mobile Agents in Mobile Data Access Systems
149
implemented in C, while the GUIs were written in Java. This resulted in a system
with poor portability due to the C’s strong architectural-dependence.
œ Lack of stability: As mentioned before, the IB system used datagram sockets to
handle the communication. The datagram socket applies the User Datagram
Protocol (UDP) as the transport layer protocol which does not guarantee an errorfree and in-order transmission. It leaves the responsibility of ensuring a correct
transmission to the application. As a result, the IB system demonstrated an
unpredictable behavior in the presence of incorrect transmissions.
œ Network connectivity: Continuous network connectivity during a complete
session is one of the requirements of the socket communication mechanism.
Consequently, the IB system inherited this disadvantage.
3.2
The Enhanced IB System
To overcome the deficiencies of the IB system, a decision was made to extend the
scope of our prototype utilizing agent technology [13]. As a result, the client-server
computation model was replaced by the agent-based paradigm. The extended system
under the same experimental conditions, on the average, showed a query resolution
process nearly 3 times faster than the original system [13].
Fig. 2. An overview of the IB system architecture.
3.2.1
System Overview
Figure 3 gives an overview of the enhanced IB system. Compared to the original
design (Figure 2) the new system made the following major changes:
œ Three types of mobile agents were introduced: the CategorySearcher agent, the
HierarchicalSearcher agent, and the QueryResolution agent. In addition, the new
design completely eliminated the Query Server module.
150
Y. Jiao and A.R. Hurson
œ The enhanced IB system authorizes the QueryResolution agent to start and stop
nodes when necessary. Thus, the Daemon programs were not necessary any longer.
This by default reduced the overhead on individual computing resources.
The aforementioned enhances also modified to operational flow of the system.
Initiation of the Data Search GUI by the user launches a CategorySearcher agent and
a HierarchicalSearcher agent. These two agents migrate to the Thesaurus Server and
the SSM Admin Server, respectively. The CategorySearcher agent brings back the
thesaurus category information and concurrently, the HierarchicalSearcher returns
with the summary-schemas hierarchy information to complete the data search GUI.
At this stage, the user fills in the search term(s), chooses desired search category,
starting node, semantic distance, and launches the QueryResolution agent (master)
destined to the user-designated starting node. Upon arrival at the destination, the
agent first activates the node and then performs the search. If there is no resolution,
the agent will recursively migrate to the parent of the current node and conduct the
same operation. Query resolution at a summary-schemas node allows the
QueryResolution agent (master) to create its clone(s) (slave) and direct them to the
proper child node(s). The clones then roam to the designated destination and try to
resolve the query, while the master agent stays at the current node and waits for
responses from them. After all responses from the clones, the QueryResolution agent
(master) integrates and fuses the collected information, disposes all the clones, and
returns to the originating node to display the result.
Fig. 3. An overview of the enhanced IB system architecture.
3.2.2
Lessons Learned from the Enhanced IB System
The improvement introduced by using mobile agent technology is three-fold [13].
œ The enhanced IB system provided another layer of user authentication. The agent
servers authenticate agents before allowing them to execute.
œ The enhanced system demonstrated better performance. The simulation results
indicated that the enhanced system could resolve queries nearly 3 times faster than
the original system.
Mobile Agents in Mobile Data Access Systems
151
œ The enhanced system achieved higher resource utilization. By eliminating the
Daemon programs, the enhanced system frees machine resources such as CPU
time, memory, ports, etc.
These advantages came, however, at the expense of local autonomy violation.
Allowing a user to start and stop a node leaves the operational nodes dangerously
under the user’s control. As an example, if a local database is temporarily unavailable,
say for maintenance, in the original prototype, it was the duty of the system
administrator module to temporarily stop and remove the node from the summaryschemas hierarchy. However, in the enhanced system, a malicious user could start this
node without even informing the system administrator module. In addition, each node
program needs a unique port number on the local machine for its execution. If the port
number is already occupied, other execution attempts of the same node program will
fail. Consequently, if an intruder starts a node and never stops it, no one else can
perform search on that node.
4
An Application of Mobile Agent Technology in Mobile
Data Access System Design (MAMDAS)
4.1
Design Methodology
We chose Gaia, a general agent-oriented analysis and design methodology proposed
by Wooldrige et al. [14], as the MAMDAS design methodology. Gaia divides the
whole design process into two phases: the analysis phase (conceptual design) and the
design phase (concrete design). Each phase of the Gaia method leads the developer
one step further toward the final implementation. This top-down design keeps the
system clean (no redundant entities) and well organized while guaranteeing to satisfy
all user requirements.
During the analysis phase, two conceptual models are derived from the requirement
statement: the roles model and the interactions model. The roles model identifies the
key roles in the system and specifies the responsibilities and permissions associated
with them. The interactions model defines a set of protocols that describe the
interaction between each pair of roles.
Three concrete low-level models that can be directly implemented are generated at
the design phase using the information obtained during the analysis phase: the agent
model, the services model, and the acquaintance model. The agent model documents
different agent types that will make up the system and estimates the number of
instances of each agent type that will occur at run time. The service model briefly
describes the services associated with each agent type. The acquaintance model
captures the communication relationship among various agent types by directed
graphs.
4.2
Designing MAMDAS
As the result of applying the Gaia methodology, we obtained the agent model, the
service model, and the acquaintance model of MAMDAS. Tables 1 and 2 show the
agent model and the service model. Figure 4 captures the acquaintance relation among
agents in MAMDAS (arrows represent the communication direction).
152
Y. Jiao and A.R. Hurson
4.3
System Overview
Based on the three concrete models obtained from the design process, we
implemented the MAMDAS using IBM Aglet Workbench SDK 2.0. The MAMDAS
consists of four major logical components: the host, the administrator, the thesaurus,
and the user. Figure 5 illustrates the overall architecture of the MAMDAS. In order to
avoid complication, we only demonstrate the most important agent types in this
figure. Some assisting agents are not shown.
The MAMDAS can accommodate arbitrary number of hosts. A HostMaster agent
resides on each host. A host can maintain any number and any type of nodes (local
nodes or summary-schemas nodes) based on its resource availability. Each
NodeManager agent monitors and manipulates a node. The HostMaster agent is in
charge of all the NodeManagers on that host. Nodes are logically organized into a
summary-schemas hierarchy. The system administrators have full control over the
structure of the hierarchy. They can construct the structure by using the graphical
tools provided by the AdminMaster agent. In Figure 5, the solid lines depict a possible
summary-schemas hierarchy with the arrows indicating the hierarchical relation. The
ThesMaster agent acts as an interface between the thesaurus server and other agents.
The dashed lines with arrows indicate the communication between the agents. The
DataSearchMaster agent provides a query interface, the data search window, to the
user. It generates a DataSearchWorker agent for each query. The three dashed-dotdot lines depict the scenario that three DataSearchWorkers are dispatched to different
hosts and work concurrently.
Table 1. The agent model of MAMDAS.
(n=number of hosts in the system, m=number of nodes on a specific host).
Agent Name (Role Name)
Agent Mobility
Agent Instance Qualifier
HostMaster
Stationary
Occur n times
NodeManager
Stationary
Occur m times
NodeSynchronizer
Stationary
Occur n times
HostMessageHandler
Stationary
Occur one or more times
NodeMessenger
Mobile
Occur one or more times
AdminMaster
Stationary
Occur once
AdminMessenger
Mobile
Occur n times
ThesMaster
Stationary
Occur once
DataSearchMaster
DataSearchWorker
Stationary
Mobile
Occur zero or more times
Occur zero or more times
UserMessenger
Mobile
Occur zero or more times
Mobile Agents in Mobile Data Access Systems
153
Table 2. The service model of MAMDAS.
Service
Accept users queries
Inputs
Keyword, preferred semantic distance, category, starting node
Outputs
Query result
Pre-condition
The AdminMaster is ready, the ThesMaster is ready, and the
summary-schemas hierarchy is ready.
Post-condition
True
The summary-schemas hierarchy building process in MAMDAS is very similar to
the IB system and the enhanced IB system. Once the administrator decides the
summary-schemas hierarchy, commands will be sent out to each involved
NodeManager to build the structure. NodeManagers at the lower levels export their
schemas to their parents. Parent nodes contact the thesaurus and generate an abstract
version of their children’s schemas. When this process reaches the root, the
MAMDAS is ready to accept queries.
Fig. 4. The acquaintance model of MAMDAS.
The user can initiate a query by launching the DataSearchMaster agent on his/her
own device, which can be a computer attached to the network or a mobile device.
The DataSearchMaster sends out two UserMessenger agents (not shown in the figure)
to the AdminMaster and the ThesMaster, respectively. The UserMessengers will
return to the DataSearchMaster with the summary-schemas hierarchy and the
category information. The DataSearchMaster then creates a data search window that
shows the user the summary-schemas hierarchy and the tree structure of the category.
The user then enters the keyword (s), specifies the preferred semantic distance,
chooses a category, and selects a node to start the search. After the user clicks on the
“Submit” button, the DataSearchMaster packs the inputs, creates a DataSearchWorker
agent, and passes the inputs to it as parameters. Since the DataSearchMaster creates a
DataSearchWorker to handle each query, the user can submit multiple queries
concurrently.
154
Y. Jiao and A.R. Hurson
Fig. 5. An overview of the MAMDAS system architecture.
DataSearchWorkers carry the search algorithm and can migrate from host to host.
Its first stop is the node designated by the user. Once dispatched, the
DataSearchWorker can intelligently and independently accomplish the search task by
making local decisions without the owner’s interference. The search process can be
described as follows:
œ The DataSearchWorker contacts the NodeManager to obtain its schema, and
children and parent information.
œ The DataSearchWorker performs the search algorithm with the help of the
ThesMaster. Note that this is the step that involves the most communication
among agents.
œ If there is no resolution on the current node, based on the principle of the SSM, the
DataSearchWorker will conclude that there is no resolution down this sub-tree.
Thus, if the current node is the root, the DataSearchWorker will return to its home
(where it is created) and display “no result”. If the current node is not the root, the
worker agent will recursively migrate to the current node’s parent and conduct the
same search algorithm until it reaches the root or finds a result (assume that the
DataSearchWorker does not find a solution on the current node). Another
possibility is that the current node does indicate potential results. If the current
node is a leaf-node, the DataSearchWorker will get all the local terms that satisfy
the semantic distance and go home to display the results. In the case that the
current node is a non-leaf-node, the DataSearchWorker will generate its clone for
each node that may have results. To clarify the difference between the
DataSearchWorker and its clones, we name the clones DataSearchSlaves even
though they are essentially the same. The cloning process will happen recursively
till the slaves finally reach the leaf nodes. Slaves perform the search algorithm on
their destinations in parallel. To reduce unnecessary network traffic, the slaves
only report the results to its originator and then die on the local host.
œ When the final report reaches the DataSearchWorker, it knows that the task is done
and then returns to home and display the results. After the user click on the “ok”
Mobile Agents in Mobile Data Access Systems
155
button or close the result display window, the DataSearchWorker will dispose itself
and release all the resource it occupies.
Comparing Figure 5 with Figures 2 and 3, one can conclude that the agent-based
system greatly simplifies the system architecture. It makes the system easy to
maintain and use. Moreover, we expect that the reduced communication will
significantly improve the average response time. Thanks to the agent’s independent
decision-making capability and execution autonomy, MAMDAS provides mobile
users a flexible and reliable data access environment.
4.4
Optimizing the SSM Search Algorithm
According to the SSM search algorithm implemented in the IB system and the
enhanced IB system, when a DataSearchWorker searches a node, it must compare
each global term in the node’s schema with the keyword. If the node is a local node,
the user-specified semantic distance is used as the criterion to determine whether the
term is of interest. If the node is a summary-schemas node, other criteria depending
on the implementation can be applied to determine whether a global term indicates
potential resolution or not.
Several characteristics of the SSM have drawn our attention. Observe the following
facts:
œ When searching a local node, the DataSearchWorker must compare each global
term in the node’s local-global schema in order to obtain all local terms that satisfy
the user-specified semantic distance.
œ When searching a summary-schemas node, the DataSearchWorker can stop as
soon as it finds that all the children of the current node contains potential
resolution.
œ If the search on summary-schemas node A indicates that there is no resolution in
this subtree, then the DataSearchWorker move to A’s parent node, if a global term
only exists on A (there is an entry which looks like “global term: <summaryschemas node A>”), this global term does not need to be checked. The reason is
that we already know that there is no resolution on A. When the administrator
organizes the summary-schemas hierarchy, naturally, he/she could prefer to cluster
nodes based on their data semantics (connect nodes that contain similar contents to
the same parent). Consequently, as we search down the tree, it is likely that all the
children of a node have terms that are of our interest.
Based on these observations, we were able to optimize the SSM search algorithm.
œ We represented the node’s summary schema as a two-dimensional array with node
names as row indices and global terms as column indices. If a global term’s
hyponym exists on a child node (as noted earlier, a summary schema’s global
terms are hypernyms of lower-level schemas’ global terms), the corresponding
array element is set to 1. Otherwise, it is set to 0. Table 3 shows an example of
such array.
œ By re-organizing the terms, we move the columns that have more 1’s to the left.
This allows us to examine more populated semantically similar data elements first.
Table 4 shows the re-organization of Table 3.
As a result the search algorithm was modified as depicted in Figure 6.
156
Y. Jiao and A.R. Hurson
Table 3. The array representation of a summary schema.
Term1
Term2
Term3
Term4
Term5
Term6
Term7
Term8
Child1
1
0
1
0
1
1
0
1
Child2
0
1
0
1
0
0
0
1
Child3
1
0
0
1
1
0
1
0
Assume that Term1 and Term4 in Table 4 indicate potential results in the subtree
rooted at the current node, the DataSearchWorker only needs to make two
comparisons before it proceeds to other nodes: “Term1, keyword” and “Term4,
keyword”. In contrast, the search algorithm used in the IB and the enhanced IB
systems will incur 8 comparisons.
The network traffic reduction of the algorithm depends on factors such as: the
structure of the summary-schemas hierarchy, the thesaurus implementation, the query
distribution, etc. Thus, a quantitative measurement of the reduction is difficult.
However, one thing clear is that the worst-case performance of the optimized
algorithm is the same as the original search algorithm used in the other two SSM
prototypes: compare every summary schema’s global term with the keyword.
Table 4. Re-organization of Table 3.
Term1
Term4
Term5
Term8
Term2
Term3
Term6
Term7
Child1
1
0
1
1
0
1
1
0
Child2
0
1
0
1
1
0
0
0
Child3
1
1
1
0
0
0
0
1
5 Experimental Results
The MAMDAS was evaluated based on four parameters: the average response time,
scalability, robustness, and portability.
5.1
Experimental Environment
We performed most of our experiments on Sun Ultra 5 workstations running Solaris
8. The machines are connected through a fast Ethernet network that supports up to
100Mbps. Some of our experiments were carried out on PCs with various processors
running different versions of the Windows operating system. In general, the
MAMDAS can be set up on any collection of machines that satisfy the following
requirements:
Mobile Agents in Mobile Data Access Systems
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
157
Set all child node to be unmarked;
While (there exists an unexamined terms)
If (term is of interest)
Mark all the child nodes that have its hyponym term;
Else
Continue;
If (all the child nodes are marked)
break;
End If
End While
If (no marked child node)
Go to the parent node of the current node and perform the same
search algorithm (if a summary schema term of the parent node only
exists on the current node, we can skip this term);
Else
Create a data search slave for each marked child node;
Dispatch the slaves to the destinations and perform the same
search algorithm;
End If
Fig. 6. Optimized search algorithm.
5.2
Average Response Time
We anticipate that the MAMDAS improves the average response time due to the
reduced communication, application of optimized search algorithm, and its ability to
exploit parallelism. Earlier research showed that the application of mobile agents
improves the query response time of the IB system by a factor of 3 [13]. To
demonstrate the effectiveness of the MAMDAS, we constructed the same summaryschema hierarchy as reported in [13] and conducted the same queries. Figure 7 plots
the average response time of the three SSM prototypes. The result clearly shows that
on the average, the MAMDAS is twice as fast as the improved IB system and 6 times
faster than the IB system.
5.3 Impact of the SSM Configuration
The query response time highly depends on the SSM configuration. Therefore, how to
organize the summary-schemas hierarchy should be of interest to the global DBA.
Intuitively, the global DBA may apply the following configuration strategies:
œ The Semantic-Aware Configuration: cluster the local databases based on their
semantic contents and assign semantically similar data sources to the same entrylevel summary-schemas node.
158
Y. Jiao and A.R. Hurson
œ Non-Semantic-Aware Configuration: based on the physical connectivity of the
network, assign local data sources to the nearest entry-level summary-schemas
nodes. As a result, there is a potential that the semantically similar data sources be
distributed across the summary-schemas hierarchy.
The first strategy reduces contention at higher-level summary-schemas nodes at the
expense of creating bottleneck at certain hot nodes in the network. The second
approach distributes the workload among nodes and minimizes the communication
distance between nodes on adjacent levels at the cost of longer search time at higherlevel nodes and possible longer search path. It is a difficult task to form a wellbalanced summary-schemas hierarchy and optimize the performance. The purpose of
this experiment was to compare effects of the two configuration strategies and
identify critical factors that affect the overall performance. The result can server as a
hint to help DBAs to make configuration decisions.
Average Response Time (ms)
8000
7000
6000
5000
4000
3000
2000
1000
0
IB
Improved IB
MAMDAS
SSM Prototypes
Fig. 7. Comparative response time of three SSM prototypes.
5.3.1
Semantic-Aware Configuration vs. Non-semantic-Aware Configuration
To demonstrate the impact of the aforementioned strategies, we designed two extreme
cases of the two configurations: The system was composed of 1 to 7 local nodes with
identical semantic contents. By manipulating the local-global schemas, we ensured
that the search result exists in all local nodes but one for each simulation run and the
query is always submitted to the node that does not resolve the query. The purpose is
to force the agent to travel in order to find resolutions. Different SSM configurations
will result in different agent travel paths. Consequently, the average response time
will be different.
The Semantic-Aware Configuration assigns all nodes to the same entry-level
summary-schemas node because they all have similar semantic content. The NonSemantic-Aware Configuration creates a new path starting at the root for each newly
added local node. Figure 8 illustrates structures of both configurations when the
number of local nodes is 3.
Mobile Agents in Mobile Data Access Systems
159
Fig. 8. An example of Semantic-Aware and Non-Semantic-Aware configurations.
Note that when no resolution is found at the first node (we forced a search miss), in
the Semantic-Aware configuration, the agent only needs to go up one level in order to
find other possible resolutions. In contrast, when the Non-Semantic-Aware
configuration is applied, the agent has to go all the way up to the root before it can
find any other potential resolutions. After potential resolutions are identified, both
configurations conduct searches in parallel. Intuitively, we anticipate that a shorter
search path will demonstrate better performance. Figure 9 shows the experimental
results.
As expected, the Semantic-Aware configuration outperforms the Non-SemanticAware configuration. However, after a closer examination of this experimental result,
we noticed performance degradation when the number of local nodes searched in
parallel reaches 5 (the total number of local nodes is 7). This phenomenon raises a
question: from the performance point of view, is it a good idea to build wide
summary-schemas hierarchy? In order to answer this question, we conducted the
following experiment.
5.3.2
Scalability of Parallel Searches
From the search algorithm introduced in section 3.4 one could conclude that the query
response time mainly depends on two factors, the thesaurus response time, and agent
creation and migration overhead. To identify the contribution of each factor, we
designed an experiment to separate the thesaurus response time from the system
response time. In this experiment, the Semantic-Aware configuration was applied and
the number of nodes searched in parallel ranged from 1 to 9. We also set the result to
be found on every local node. All queries in this experiment are submitted to the root.
Figure 10 depicts the result.
Figure 10 shows the scalability of parallel searches: for configurations with local
nodes less than seven, the average response time is almost the same, regardless of the
number of local nodes (note that the local nodes have the same semantic contents). A
sudden increase in the response time occurs when the number of local databases
grows greater than seven. The thesaurus server makes the major contribution to this
performance degradation. Although the thesaurus server supports multithreading, the
number of concurrent clients it can support without performance degradation is still
limited. When the number reaches a certain threshold (7 in this case), the server’s
performance degrades dramatically. Further analysis indicates that agent cloning
introduces nearly a fixed amount of overhead when agent instances increases from 1
to 10. The reason is that most part of the agent migration and execution time
160
Y. Jiao and A.R. Hurson
Average Response Time (ms)
3000
Non-Semantic-Aware
2500
Semant i c - Awar e
2000
1500
1000
500
0
1
2
3
4
5
6
7
Number of Local Nodes
Fig. 9. Impact of SSM configurations.
is overlapped. These results suggest that, based on the present MAMDAS
implementation, a fan out in the range of 3 to 5 results in a desirable summaryschemas hierarchy.
Figure 10 also implies that the optimization of the thesaurus server’s performance
is very important, since it contributes to almost 80% of the execution time. We will
summarize the possible improvements of its performance in section 5.
5.4 Robustness and Portability
As noted before, the IB system is vulnerable to message losses and exceptions. Thus,
the system is not stable and difficult to debug. The MAMDAS is much more stable
than the IB system for several reasons due to the robustness of agents, the reduced
communication, and good exception handling mechanism. During the course of our
evaluation, we did not experience any crashes or stalls.
We intended to apply MAMDAS in a distributed environment and provide special
services to mobile users. In such an environment, physical heterogeneity of the
computing devices becomes a challenging issue. Thanks to the Java language’s
platform independent feature, our system was easily ported to any machine available
to us that supported the JVM version 1.3. We successfully tested the system on PCs
that run different versions of the Windows operating system without any
modification.
Mobile Agents in Mobile Data Access Systems
161
Fig. 10. The scalability of parallel searches.
6 Conclusion and Future Work
The goal of our study was to address issues in multidatabase information retrieval
while providing special support for mobile users. By applying the Gaia agent-based
application design methodology [14], we successfully devised and implemented the
MAMDAS system – an application of Mobile Agent technology in Mobile Data
Access System design. The MAMDAS chooses the SSM as its multidatabase
organization model and the Java-based IBM Aglet Workbench SDK 2.0 as its
implementation tool.
Our experimental results showed that MAMDAS significantly improves the
average response time compared to the previous SSM prototypes. It is six times faster
than the original prototype and twice as fast as the enhanced prototype. The
MAMDAS demonstrated system scalability. The experimental results suggested that a
reasonable width of the SSM hierarchy ranges from 3 to 5. Moreover, MAMDAS’
platform-independent nature makes it an ideal choice for a distributed information
retrieval system.
The scope of this research can be extended in many directions. We are intended to
investigate the following issues in the future:
œ The thesaurus’s algorithm needs to be improved. Currently, it compares one pair of
terms at a time, which means a high degree of network traffic. Naturally, the
performance can be improved if the number of communications between the
ThesMaster and other agents can be reduced by handling a list of query terms at a
time instead of a pair of terms at a time.
œ The MAMDAS uses a centralized thesaurus server. This could become a source of
a bottleneck. It is possible to replicate the thesaurus and/or distribute it across the
SSM. This change will significantly reduce the network traffic and hence reduces
162
Y. Jiao and A.R. Hurson
the average response time of the thesaurus. Consequently, the overall average
response time will decrease.
œ The current MAMDAS implementation does not take security issues into
consideration. Because security is critical in a distributed environment, such issues
within the scope of the software agents needs to be investigated in depth and
incorporated into the MAMDAS.
œ The issue of frequent disconnection in a wireless communication medium and its
effect on MAMDAS needs further study.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Ara. http://wwwagss.informatik.uni-kl.de/Projekte/Ara/index_e.html
Beej’s Guide to Network Programming.
http://www.ecst.csuchico.edu/~beej/guide/net/html/
Bright, M. W., Hurson, A. R., and Pakzad, S. H. Automated resolution of semantic
heterogeneity in multidatabases. ACM Transactions on Databases Systems, 19(2), 1994.
Concordia – Mobile Agents White Paper.
http://www.meitca.com/HSL/Projects/Concordia/MobileAgentsWhitePaper.html
Crystaliz Inc, General Magic Inc., GMD FOKUS, IBM, TOG: OMG Joint Submission
“Mobile Agent System Interoperability Facility”, November, 1997.
Date, C. J. Relational Databases. Addison-Wesley, Reading, Maryland, 1986.
D’Atri, A. and Tarantino, L. From Browsing to Querying. Data Engineering 12, pp. 4653, 1989.
Kotz D., Gray R., Nog S., Rus D., Chawla S., and Cybenko G. Agent tcl: Targetting the
needs of mobile computing. IEEE Internet Computing, pages 58-67, July/August 1997.
IBM. Aglets Workbench. http://www.trl.ibm.co.jp/aglets/index.html.
Jiao, Y. Multidatabase Information Retrieval Using Mobile Agents. Master of Science
Thesis. Department of Computer Science and Engineering, The Pennsylvania State
University 2002.
Lange D., Oshima M. Programming and Developing Java Mobile Agents with Aglets.
Addison Wesley Longman, Inc. Reading, Massachusetts, 1998.
Mobile Agents Bibliography. http://www.zurich.ibm.com/~spl/BibAgents.html
Montero G. A Mobile Agent System within the Summary Schemas Model Multidatabase.
Master of Engineering paper. Department of Computer Science and Engineering, The
Pennsylvania State University 2000.
Wooldridge M., Jennings N. R., and Kinny D. The Gaia methodology for agent-oriented
analysis and design. Journal of Autonomous Agents and Multi-Agent Systems, 2000.
ObjectSpace. Voyager: ORB 3.0 Developer Guide, 1999.
http://www.objectspace.com.
Odyssey Research Associates. http://www.atc-nycorp.com.