4437871.pdf

A Methodology & Tool for Determining
Inter-component Dependencies Dynamically in
J2EE Environments
Umesh Bellur
[email protected]
School of Information Technology, IIT Bombay
Abstract— The emergence of server side component models
such as J2EE has considerably simplified the construction of
enterprise applications. However the dependence of the enterprise on these applications and their proliferation has shifted
the complexity in managing the environment in which these
applications operate. The difficulty in management is cheifly
due to the inter-dependencies that exist between the components
that make up these applications. Changing one component can
have unpredictable side effects in the performance and functional
behavior of other components because of this coupling. Self
management or autonomic computing looks into techniques of
self for self-configuration, self-healing and self protection. We
believe that in order for a middleware to be self-managing,
it is first necessary to obtain and maintain its topology which
characterizes the components and their inter-dependencies making up this complex distributed environment. We present here
a methodology and a tool to non-intrusively and dynamically
extract and store the component topology from a distributed
execution environment. The techniques presented in this paper
have been evolved primarily for a J2EE environment although
they can be applied without loss of generality to any distributed
object system.
I. I NTRODUCTION
eBusinesses are the norm today and this has led to a whole
host of advances to simplify the development of eBusiness
applications. Some of these include server side component
models such as J2EE and .NET along with IDEs that can
automatically generate all of the plumbing needed to build
such applications leaving the application developer with the
sole task of adding business logic. However the complexity of
the production IT environment hosting these applications has
skyrocketed and this coupled with the lack of availability of
human expertise to manage these environments has prompted
research into autonomic computing or the science of self
managed systems. These techniques have become even more
crucial in an atmosphere of constant change prompted by
frequent deployments introducing new application features. It
is interesting to note that most of this complexity originates
in the well understood concept of coupling which introduces
dependencies between components either within an application
or across applications. Hence in order to build systems that
can manage themselves, it is necessary to know oneself - the
components that make up the application and the dependencies
on other components. Design documentation such as message
sequence charts rarely keep up with changes in code and also
prove useless when it comes to documenting dependencies
across applications. It is therefore imperative to have a means
by which we can dynamically determine the component map
or topology of such computing environments. In this paper
we present a way to model this topology and to extract it
dynamically from the production environment at runtime.
This work is being done in the context of a larger autonomic computing project at IITB named LAMDA1 for Lights
out Automated Management of Distributed Applications. The
project is a comprehensive effort at providing an integrated
autonomic enterprise environment that includes self configuration and self healing. [1].
A. Topology
For our purposes we have assumed a component based
application environment which is not unreasonable in today’s
enterprise. We are currently working with J2EE technologies
but are confident that these techniques can be extended without
loss of generality to competing technologies such as .NET.
The term ’Topology’ is used here to represent the following:
• Physical infrastructure and it’s configuration. This covers
machines that make up a network along with the network
map specifying network topology. Configuration details
include physical configuration such as CPU, memory,
disk etc. and logical configuration such as OS versions,
packages installed etc.
• The static and dynamic view of the application components. By static we refer to the interfaces supported
by each component and it’s packaging details while by
dynamic we refer to the deployment view of components
relative to one another. This is necessary to create analytical or simulation models for the purposes of self
configuration eventually.
• Dependencies that exist between the application components as well as those between application components
and infrastructure (software, hardware and network).
The topology therefore characterizes the application and its
execution environment and provides a language for common
understanding of what an application is and what it depends
on. The topology can now used for every facet of autonomic
computing - self configuration, self healing and self protection.
We are currently developing a tool for fault localization in
1 This project is sponsored in part by INTEL IT Research Council via grant
20393 and by an IBM Faculty Award in Autonomic Computing
14
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
J2EE applications that models the failure dependencies of the
application using its topology and uses it for failure diagnosis
once the application is deployed. The topology extracted
from these applications is also being used for performance
prediction using queuing network models.
The rest of this paper is organized as follows. Section II
presents a taxonomy for existing approaches to topology determination along with related work. We present our dynamic
topology determination methodology and the AutoTopology
tool for J2EE applications in section III. We then present a
case study for a sample J2EE application (Dukes Bank) and
the resulting output generated by ’AutoTopology’ in section
IV. We end the paper with directions for further research in
LAMDA towards building autonomic systems that use this
topology information.
II. TAXONOMY AND R ELATED W ORK
Most interesting issues in topology extraction revolve
around determining dependencies between components and
so we present below a taxonomy of the various dependency
determination approaches and a survey of existing systems that
use these approaches.
Dependency determination approaches can be classified
broadly as static or dynamic based on whether one needs to
execute the code or not. Static approaches are obviously useful
in that we don’t need to disturb a production environment to
place probes etc. Design representations as well as source code
can be used to find services and their inter-dependencies as
shown in [2] [3] [4]. For example UML class diagrams and
message sequence charts which can be obtained by reverse
engineering source code depict interconnection patterns in
designs. However this approach is useful only when analyzable
source is available which is often not the case. More importantly static approaches do NOT allow us to determine cross
application dependencies since packaging and deployment is
usually done on a per application basis.
Dynamic approaches can further be categorized based on
the level of instrumentation as intrusive, semi-intrusive and
non-intrusive approaches. Intrusive techniques are those that
relies on code instrumentation and dependencies are calculated
by correlating data gathered during the flow of transactions
through various components [5] [6] [7]. This approach will
be unsuitable where there are multi-vendor components due
to interoperability concerns and in places where code cannot
be inserted into the system for security, licensing or other
technical constraints. Semi-intrusive approaches do not need
to instrument the application code. However the middleware
used for deploying the applications is instrumented to to tag
incoming requests and trace the request to determine the
components used to serve a particular class of request [8] [9].
Faults could be injected into the system and the propagation
of these faults could be used to infer component dependencies
[10] as well. The ADD project [11] at IBM discovers dependencies by perturbing the system and measuring change in
the system response. Such approaches are still intrusive to the
middleware used for deployment. Non-intrusive approaches
looks on the system as a black-box and involves tracing of
communications and later inferring causal paths and patterns
from the trace. The method does not call for modifying either
the application or the middleware code.In [12] these traces
are obtained by sniffing on all the participating hosts (or we
can mirror the communications taking place on all the ports
to which the hosts are connected). The disadvantage of this
method is that it cannot provide dependencies of components
co-located on a single host or process. The outputs are also
dependent on where on the communication stack the sniffer is
placed.
In general, we feel that such black box approaches will
produce useful outputs only when they can extract data at
the level of individual components in the applications - not
just at the level of the container or process. Our approach
and tool can extract the topology of any J2EE application
without instrumenting the application or the J2EE application
server that it is deployed on. We make use of the JVMPI interface provided by all JVM implementations and the concept
of Interceptors in J2EE application servers to generate call
traces for the application. The topology for the application is
generated out of the call traces by executing the use-cases for
the application and combining the dependency information in
these call traces. The logging involved does incur overheads
during extraction but can be stopped whenever the topology
extractor is not in use without the need to restart the J2EE
application server or redeploy the application. Our approach
is therefore mostly non-intrusive and dynamic.
III. DYNAMIC T OPOLOGY E XTRACTION M ETHODOLOGY
AND AUTOT OPOLOGY T OOL
The components in a J2EE application may be deployed
within the same JVM, different JVMs on the same machine
or may be distributed over multiple machines over a network.
In general, the inter-component communication can be viewed
to go across a layered application stack as depicted in Figure 9.
The component dependencies can be extracted by monitoring
the communication at one or more of these layers in the
application stack.
Fig. 1.
Topology Extraction at various layers in the system.
Packet sniffing at the network layer can be used to monitor
traffic between processes on multiple hosts. For example, a
packet sniffer like Ethereal[13] can be used to capture the
marshalled remote invocation object being passed to the RMI
15
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
port of the destination machine, which contains information
about the remote object and the method to be invoked on
that remote object. However, this approach is not very useful
since we cannot determine the client side object identify
or the specific interface being invoked. At the JVM layer,
Java Virtual Machine Profiler Interface(JVMPI) can be used
to log all method calls in the JVM which can be used to
extract intra-JVM dependencies. This approach alone does
not extend to calls that cross the JVM boundary. Remote
Method Invocation(RMI) is used to invoke methods on remote
Java objects. We can write a custom RMI socket factory
for sending/receiving data that could log all details for the
remote invocation. But the complexity of writing such a
socket factory and the overheads involved for logging all
RMI calls from the JVM makes this approach unsuitable.
At the application server layer, calls made on EJBs2 can be
intercepted using the Interceptor mechanism provided by J2EE
application server. The interceptors at the client and server side
log their respective call information which can be correlated
later and can yield inter-JVM dependencies too. It is clear from
our research that more than one layer would have to be handled
invidually and their results correlated to object the complete
set of runtime dependencies. In particular, we use JVMPI at
the JVM layer and J2EE interceptors at the application server
layer to extract the application runtime dependencies.
The remainder of this section describes our methodology to
establish the topology consisting of the following steps:
1) Profiling and logging from within the JVM using JVMPI
2) Interceptors and logging at the container level.
3) Node analysis to establish traces using these logs at each
node
4) Merging call traces across nodes to establish a single
trace.
5) Generating topology from this trace.
A. JVMPI Profiler
Java Virtual Machine Profiler Interface (JVMPI) [14] is
a two-way function call interface between the Java Virtual
Machine and an in-process profiler agent as shown in figure
2. A profiler agent can register for various events such as class
loads/unloads, method entries/exits, JVM shutdown, etc. JVM
sends an event by calling NotifyEvent with a JVMPI Event
data structure as the argument.From this information we can
determine both the client and target object identities involved
in the invocation.
Fig. 2.
2 EJB
JVMPI Profiler Agent
stands for Enterprise Java Bean which is a remotely invokable Java
object in the J2EE standard
JVMPI provides APIs to remotely turn profiling on and off
dynamically so that we can avoid the generation of large logs
as well as shutting down and restarting the application server.
B. J2EE Interceptors
An interceptor is a piece of code that intercepts calls
made into an object of an intercepted class. J2EE application
servers [15] allow to define interceptors that are hooked into
method invocations and field accesses. The points where the
interceptors are to be inserted, can be specified in configuration
files of the application server.
Fig. 3.
Server Interceptor in the JBoss Interceptor Stack.
When a method call is made on an remote component
or object, the client container creates an instance of the
invocation class that represents the client call. This object is
passed through the list of configured client-side interceptors.
We have written our own client-side interceptor that captures
this invocation object to obtain the details of the call
in progress including the name of the target object, its
location and the method to be invoked. Similarly, serverside interceptors process the invocations in the server-side
container before routing them to the server component. In
order to correlate the call information at the server and client
side, we transmit a unique link identifier along with the
originating machine details in the payload hashmap inside
the invocation object.
Once the JVMPI profiler and the client-side and server-side
interceptors are deployed (these interceptors can be dynamically (un)deployed on J2EE app servers), we drive the system
through each use case. The logs generated are then analyzed
to infer runtime dependencies in the application in the form of
a call trace i.e sequence of events that occurred in the system
while serving the request. The call trace models various execution sequences viz. unconditional sequential path, alternate
paths based on some condition, concurrent paths on multiple
threads, join path for two or more concurrent paths and a
loop path for repeated calls to a set of interfaces which can
be clubbed together [Figure 4].
The analysis steps involved in extracting the topology from
the logs generated after the profiling are discussed below.
C. Node Analysis
The logs generated by the profiler and the interceptors
are analyzed together to determine the call trace for each
16
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
ServerInfo.xml. Components.xml contains information on each
application component like its name, machine location, type,
list of interfaces, and also the contentionable nature of the
component. Trace.xml contains the call trace of the application execution. ServerInfo.xml contains the application server
specific contentionable resources.
E. Generating topology from call traces
Fig. 4.
Execution path types in a call trace.
node in the system. To correlate the entries in the interceptor logs and profiler agent logs, we maintain a unique
object identifier variable in our client-side and server-side
interceptors which is set whenever these interceptor instances
are constructed. In the profiler, when we encounter a call
on the interceptor, we fetch the value of object identifier
from JVMPI EVENT OBJECT DUMP and log this object id
together with the method called. For calls that cross machine
boundaries, we use a link identifier to complete the call
trace from the end of the originating machine to the remote
destination object. The JVMPI logs on the remote destination
machine will give us the details of the dependencies of this
remote object. Thus completing the end-to-end call trace as
shown in figure 5.
Fig. 5.
Profiler and Interceptor Logs.
The output of this step is a trace file which is generated
for each node in the distributed system. These trace files may
contain unresolved outward links to other machines which are
resolved in the next step to generate a single trace file for the
entire application.
D. Merging call traces across nodes
The individual node traces generated by the Node Analyzer
are then combined by the Trace Aggregator. It resolves intermachine dependencies and generates the complete call trace
for the usecase executed. Details are shown in [Algorithm
1]. Algorithm 2 identifies redundant information in the form
of sequence of repeated calls and represents it as a loop for
simplicity. Alternate paths based on branch conditions or different use cases having a common prefix trace is also detected
here and marked accordingly. The Trace Aggregator generates
three XML files, namely Components.xml, Trace.xml, and
The call traces generated by Topology Aggregator can be
aggregated into a topology graph. The topology graph is
directed graph with nodes representing the components in the
application and direction of the edges indicating the usage
dependency. For example, if a component A uses component
B then the graph will have an edge (A,B) representing this
dependency. Each edge is also annotated with a dependency
strength that represents the probability of a component A using
another component B in the application.
Following cases in the original call trace need to be considered while transforming them into topology graphs:
1) Sequential path : A sequential call-edge is mapped
directly to corresponding usage edge in the topology
graph with the usage probability remaining the same as
that before the call is made.
2) Alternate path : When the call trace has an AlternatePath node, the execution can take any of the available alternate paths and the probability of usage splits
at this node. If no additional information regarding the
probability distribution of taking these alternate paths is
known, a uniform distribution is assumed. If the same
call-edge occurs multiple times in a call trace, then the
probability of making that call needs to be accumulated
to generate a single value for the usage probability.
Consider the following interesting cases:
a) Nested alternate paths : If an alternate split is
nested in some other alternate split, then the calledge will be taken if both the alternate splits select
the favorable path. Hence, the total usage probability is a product of their individual probabilities.
b) Union across child subtrees : If the call edge
occurs multiple times (but not nested) in the call
trace, then total usage probability is the union of
the events that both call-edges will be taken. If they
belong to two different paths of the same alternate
split, then the union operation will reduce to a
simple ADD since the intersection of the events
will be zero.
3) Concurrent path : Concurrent paths are executed in
parallel, but are reduced similar to the sequential path
case.
4) Loop path : The loop path is treated as a single
sequential path.
Algorithm 3 generates topology graphs for individual usecases. The AggregateAcrossAlternatePaths procedure aggregates the usage probability for common usage edges
across child alternate paths through a ADD operation while
AggregateAcrossChildren aggregates usage probabilities
across child sequential paths using a UNION operation.
17
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
Algorithm 1 DoTraceAnalysis (startingM achine IP )
1: Build the DOM tree for the Trace of the starting machine
2: for each unresolved destination machine DEST IP do
3:
Read the log and build the DOM tree from trace of DEST IP
4:
for all outward links from starting machine to DEST IP do
5:
Find the corresponding inward link by matching OutwardT raceLinkID of the starting machine and the
InwardT raceLinkID from DOM of destination machine
6:
Replace outward link node from starting m/c with children of inward trace link node of DEST IP
7:
end for
8: end for
9: RemoveLoops(DocumentRoot) {Identify alternate paths and loops in the call trace}
Algorithm 2 RemoveLoops (treeN ode)
1: Initialize : previousSiblings ← φ
2: for each child node C of Node do
3:
if C.P athID = S.P athID where S ∈ previousSiblings then
4:
if nodes between S and C repeated from C then
5:
for all Matching node pairs s and c do
6:
repeat
7:
Compare subtrees of s and c
8:
until Mismatch found or both subtrees empty
9:
end for
10:
if Mismatch found then
11:
Create new AlternatePath node with the two mismatching nodes as the alternate paths
12:
end if
13:
Delete the repeated nodes after C
14:
Create new LoopPath with repeated nodes from S to C as its child nodes
15:
Adjust PathID of matched nodes and their subtrees to indicate LoopPath or AlternatePath
16:
end if
17:
else
18:
previousSiblings ← C
19:
end if
20: end for
Algorithm 3 GenerateTopology(callTree, usageProbability)
1: for each child C of CallTree do
2:
if C is an AlternatePath node then
3:
usageP robability = usageP robability/numberOf AlternateP aths
4:
for all Alternate paths P of C do
5:
altU sageEdges[P ] = GenerateTopology(P , U sageP robability)
6:
childU sageEdges[C] = AggregateAcrossAlternatePaths(altU sageEdges) {ADD}
7:
end for
8:
else
9:
childU sageEdges[C] = GenerateTopology(C, U sageP robability)
10:
end if
11:
usageEdges = AggregateAcrossChildren(childU sageEdges) {UNION}
12:
return usageEdges
13: end for
The topology of the application can be generated by computing a weighted addition across all the use-cases using the
probability of executing the usecase as weights. The use-case
probabilities can be obtained from historical data or in the
simplistic case assumed to be uniform across all usecases. In
the following section we present the output generated by our
tool for a sample J2EE application.
IV. C ASE S TUDY - D UKES BANK J2EE A PPLICATION
Duke’s Bank demonstrates a selection of J2EE technologies
working together to implement a simple on-line banking
application. It uses EJBs and web components (JSPs and
servlets) and uses a database to store the information. We
have deployed the application over JBoss 4.0.2, an open-source
18
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
J2EE application server.
A Components.xml is generated that gives information regarding all the components deployed on various nodes in
the system. The following information is listed for each
component:
• Component type : Entity Bean, Session Bean or Message
Bean
• Clustered or Non-clustered
• Location of component
• List of Interfaces.
• Resource Utilization : CPU, Memory and Disk Usage
(not generated currently)
• Contentionable resources eg. thread pool, component
object pool, etc.
The runtime dependencies in the application is captured
as call traces and different types of execution paths are
appropriately represented in Trace.xml. A sample call trace for
the ”List accounts of customer” usecase is shown in Figure 6.
V. P ERFORMANCE /C ORRECTNESS OF THE E XTRACTION
T OOL
We have made the following measurements with respect to
the topology extraction tool:
•
•
•
Profiling Overhead: This indicates the increase in response time for those transactions that were profiled for
understanding the dependencies. The overhead is caused
mainly by the interceptors which log information before
handing control back to the data path.
Scalability of Analysis: This indicates the time taken to
complete analysis of the log files and build the dependency graph.
Correctness of the tool: Correctness can be measured as
a combination of completness defined as the percentage
of dependencies captured and accuracy that we define
as the percentage of dependencies that turn out to be
false positives. Since our tool is deterministic, we will not
generate false positives and our accuracy is 100 percent.
So we focus only on completeness.
A. Test setup
Our testing and mesurement was conducted on a 3 node
distributed system which hosted the DB on one node, the
application server on the second node and the web server on
the third node. Each node was configured with an Intel P3
1.5GHz processor and 256 MB of memory. The application
we used to perform overhead measurements and scalability
analysis was a set of benchmark J2EE applications that included PetStore from SUN Microsystems, Dukes Bank, the
ECPerf and SpecJAppServer benchmarks from SPEC.
Fig. 6.
Call Trace generated by AutoTopology for ’List Accounts’ usecase.
Once call traces for all the usecases is generated and we
have the associated probabilities of executing these usecases,
we extract the topology graph from those traces [Figure 7]3 .
Fig. 7.
Topology graph generated from the ’List Accounts’ usecase.
Note that all information shown pictorially in the diagrams
is also available in XML to be consumed by other tools that
form part of the LAMDA suite. The topology tool also generates a ServerInfo.xml file listing the contentionable resources
in the application server such as thread pools, invoker pools,
etc. Information about these resources is needed so that they
can also be modeled as queues in the performance model.
3 Only a small snapshot of the topology graph shown here due to space
constraints.
B. Results
The results of the profiling overhead study and the analysis
scalability study are shown in [Figure ??]. It appears that the
overhead is linear to the number of sequentially executions
of the interceptors which is intuitive. Note that even parallel
log entries cause contention for the log file only if both
components that are called in parallel are on the same node.
As for the scalability of the analysis, it appears to be quite
sensitive to the number of actual dependencies discovered
beyond a point. The graph is flat for the first 7-10 dependencies
with a linear increase beyond that. However even with a use
case that shows about 30 dependencies the total time taken to
perform the analysis and output the topology graph is less than
a second. Given that the inter-discovery interval is expected
to be fairly large (of the order of several minutes to hours),
the time taken for analysis is more than acceptable.
Completeness of the discovery is driven largely by the
transactions that take place during profiling. Dependencies that
are sensitive to input data of the transaction (are revealed only
under specific inputs) may take longer to discover. For the use
cases we had and the data ranges of the input parameters,
we were able to generate ALL the dependencies for the
benchmark applications. This was verified by a code review
which manually created a topology graph for each application.
19
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
fairly accurate as evidenced by the performance graph shown
in Figure 11.
Fig. 8.
Overhead and Scalability Results
Fig. 10.
Graph of Response Time Versus Load
VI. U SING T OPOLOGY I NFORMATION
A. Topology to Predictive Performance Models
Once we have the topology, the next step would be to obtain
a performance model for performance prediction purposes.
This will help us understand the QoS of the existing application and identify the bottlenecks to be tuned. For our purposes
we have turned to layered queueing networks to help create
performance models.
B. From Prediction to Self Configuration
Once we are able to predict the QoS that can be gotten from
an application, we can apply control theoretic feedback techniques to configure the underlying infrastructure automatically
as shown in Figure ??.
Fig. 11.
Fig. 9.
Strategy for Self Configuration
Translating the Topology to an Analytical Performance Model.
The critical components of a performance model are the
various queueus whose parameters will have to be appropriately identified and the service time distributions at the related
queueing stations. In our case these queues represent points
of contention such as thread pools and synchronized pieces of
code protected by synchronization constructs such as monitors.
Standard component models allow us to glean this information
in a relatively painless manner. The Figure 9 shows a sample
analytical model derived from the topology.
We have built tools to automatically convert a topology into
a LQN performance model which can then be solved using
various techniques today. The results of our translations are
Our strategy is of course predicated that the infrastructure
whose configuration we are controlling can affect the performance signficantly and therefore help us bring it under
control. We are currently investigating various techniques for
the configurator and hope to report results soon.
C. Topology Based Root Cause Isolation
In dealing with self healing, it is our belief that the larger
problem is one of root cause isolation which will help us self
diagnose the set of known symptoms to a small set of highly
probably root causes. We therefore take the following approach
toward self diagnosis:
20
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.
Model dependencies in the system.
Use it to account for fault propagation
– Assumption: Failure dependencies follow usage
dependencies.
• Build a model based on causality graph of components.
• Assume no failure data available at start.
• Use Bayesian Belief Networks (BBNs) for diagnosis.
This helps us handle uncertainty and simultaneous failures.
We have developed models and tools to auto generate BBNs
from a topology graph and the results have been submitted for
review.
•
•
VII. C URRENT S TATUS AND F UTURE W ORK
The AutoTopology project[16] can extract dynamically the
topology of any J2EE application without instrumenting the
application or the server used for deployment. The output of
the tool is a topology graph with components as nodes and
usage dependency modelled by the edges with an associated
usage probability.
Other projects in LAMDA for self-healing and selfconfiguring systems work off this topology information. The
system dependencies extracted thus is being used to model the
application as a queuing network for performance prediction
and capacity planning. The root-cause analysis project models
the failure dependencies of the application using a Bayesian
Belief Network(BBN) constructed from the topology graph.
This bayesian model is used for predicting the most probable
set of faulty components given the failures observed in the
application.
[10] G. Kar S. Bagchi and J. L. Hellerstein. Dependency analysis in
distributed systems using fault injection: Application to problem determination in an e-commerce environment. In DSOM ’01: Proc. of
the twelfth international workshop on Distributed Systems:Operations
& Management, France, Oct. 2001. INRIA.
[11] G. Kar A. Brown and A. Keller. An active approach to characterizing dynamic dependencies for problem determination in a distributed
environment. In In Proc. of IFIP/IEEE International Symposium on
Integrated Network Management, pages 377–390, France, May 2001.
INRIA.
[12] Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick
Reynolds, and Athicha Muthitacharoen. Performance debugging for
distributed systems of black boxes. In SOSP ’03: Proceedings of the
nineteenth ACM symposium on Operating systems principles, pages 74–
89, New York, NY, USA, 2003. ACM Press.
[13] Brad Hards. A guided tour of ethereal. Linux J., 2004(118):7, 2004.
[14] Sun Microsystems. Java Virtual Machine Profiler Interface (JVMPI).
http://java.sun.com/j2se/1.5.0/docs/guide/jvmpi/jvmpi.html.
[15] JBoss. Application Server. http://www.jboss.org/products/jbossas.
[16] Sudhir Reddy. Dynamic topology extraction for component based
enterprise application environments. Master’s thesis, Kanwal Rekhi
School of Information Technology, IIT Bombay, 2005.
R EFERENCES
[1] Umesh Bellur. Topology based automation of distributed applications
management. In Proceedings of the fourth international workshop on
Software and performance, pages 171–173. ACM Press, 2004.
[2] Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. gprof:
a call graph execution profiler. In SIGPLAN Symposium on Compiler
Construction, pages 120–126, 1982.
[3] Hesham El-Sayed, Don Cameron, and Murray Woodside. Automation
support for software performance engineering. In Proceedings of the
2001 ACM SIGMETRICS international conference on Measurement and
modeling of computer systems, pages 301–311. ACM Press, 2001.
[4] Gordon P. Gu and Dorina C. Petriu. XSLT transformation from UML
models to LQN performance models. In Proceedings of the third
international workshop on Software and performance, pages 227–234.
ACM Press, 2002.
[5] G. Kaiser, P. Gross, G. Kc, J. Parekh, and G. Valetto. An approach to
autonomizing legacy systems. In Workshop on Self-Healing, Adaptive
and SelfMANaged Systems (SHAMAN 2002), 2002.
[6] Systems
Management:
Application
Response
Measurement.
Open-group
technical
standard
c807.
http://www.opengroup.org/tech/management/arm/.
[7] Brian Tierney, William Johnston, Brian Crowley, Gary Hoo, Chris
Brooks, and Dan Gunter. The netlogger methodology for high performance distributed systems performance analysis. In HPDC ’98:
Proceedings of the The Seventh IEEE International Symposium on High
Performance Distributed Computing, page 260, Washington, DC, USA,
1998. IEEE Computer Society.
[8] Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, and Eric
Brewer. Pinpoint: Problem determination in large, dynamic internet
services. In DSN ’02: Proceedings of the 2002 International Conference
on Dependable Systems and Networks, pages 595–604, Washington, DC,
USA, 2002. IEEE Computer Society.
[9] R. Isaacs and P. Barham. Performance analysis in loosely-coupled
distributed systems. In In 7th CaberNet Radicals Workshop, Oct. 2002.
21
Third International Conference on Autonomic and Autonomous Systems (ICAS'07)
0-7695-2859-5/07 $20.00 © 2007
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on April 29, 2009 at 04:56 from IEEE Xplore. Restrictions apply.