Integration of the OCM-G Monitoring System into the MonALISA

Integration of the OCM-G Monitoring System
into the MonALISA Infrastructure
Wlodzimierz Funika, Bartosz Jakubowski, and Jakub Jaroszewski
Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059, Kraków, Poland
email: [email protected], [bjakubow,jjarosz]@student.agh.edu.pl
Abstract
The main intention of this paper is to present a way to integrate the
grid monitoring system OCM-G into the monitoring agent infrastructure of MonALISA. OCM-G is designed to monitor distributed and
parallel applications and provide on-line information about their execution status as well as some parameters of nodes they are running
on. To provide the user with a wide set of visualization capabilities
we supply these information to the services of a scalable performance
visualization infrastructure, MonALISA, through common protocols.
We feed the services with data which can be then accessible by a MonALISA client or repository that publishes them on a server. A sample
configuration and output are provided to illustrate the idea of the operation of the integrated tools.
1
Introduction
Nowadays, when Grid has become a more and more active player in HPC the
importance of monitoring is not to be overestimated. It is crucial to provide information about computation nodes and jobs running on them. A product of the
CrossGrid project, Grid-enabled OMIS-Compliant Monitoring system (OCM-G)
[1] allows to monitor a number of the desired parameters of both execution infrastructure and interactive jobs running on it. Its compliance with On-line
Monitoring Interface Specification (OMIS) [2] enables to exploit a well defined
interface for interactions with data suppliers and monitoring clients, like performance analysis tools. The OCM-G features a wide palette of monitoring
capabilities, including the capturing of user-defined measurement values using a
special instrumentation techniques.
The idea that motivated our research was how to transform the monitoring data coming from OCM-G to an easy interpretable form and make them
available for a big number of clients. We concentrated our attention on MonALISA [4]. Its services are capable of providing on-line chart plotting from
simple datagrams containing variables values only. The services are registered
in MonALISA LookUp Service to be discoverable for clients. There can also
be MonALISA Repositories which are able to perform some more sophisticated
actions on data and publish the information in several ways including Tomcat
Server.
What we worked on is an application that is intended to pass specified information from OCM-G to MonALISA. Both of these systems provide well specified
APIs that makes the task easier. The application is fully configurable through
an XML file.
Upon a short overview of related work, the paper proceeds with the presentation of the tools OCM-G and MonALISA in the context of their roles in the
research. The next section covers our vision of the interconnection between them
and their features we use for integration activities. Then we describe the details
of our application followed by a usage example.
2
Related Work
The domain of Grid computing is not short of monitoring tools, exploiting various visualisation approaches. There is a widely recognized R-GMA architecture
[3] underlying a plenty of performance analysis tools. However, since R-GMA
exploits the idea of relational monitoring database, the information supplied to
the user can be user for longer time-related analysis. GRM [5] is a semi-online monitor of applications running in a distributed heterogeneous system. It
consists of local monitors which are responsible for writing a complete trace of
application as well as statistical data in a shared buffer. Buffer is then turned
into a file by the main monitor. The file is then sent to the PROVE visualization
tool which is capable of displaying those detailed information both off-line and
semi-on-line.
Autopilot [6] is another distributed monitor that uses local daemons to collect data about both system and network performance. It is using the concept
of sensors and actuators that are put in the source code of application. For
visualisation it uses the Virtue [7] toolkit. It defines a visualisation hierarchy for
managing performance details. It shows data in form of graphs with data from
different abstraction levels.
SCMSWeb [8] unlike the two mentioned above have its own visual interface.
It generates PHP files which can be deployed straight-forward on the server. It
is designed to monitor performance and system usage metrics. Monitors can
be organized in a hierarchical way to improve scalability. Information is passed
in XML files. It is possible to retrieve statistical performance data of different
abstraction levels presented with graphs.
Ganglia [9] uses same concept like SCMSWeb. It uses a multicast-based
listen/announce protocol and a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It has
a PHP front-end and uses RRDtool to visualise data. Ganglia uses XDR to
transfer XML files.
3
Integration Strategy
Integration is realized by an application that works as a kind of transformer
and transmitter. It obtains the data from OCM-G through OMIS interface, extracts interesting information from messages and passes them to the MonALISA
service. Below we briefly describe the used systems.
3.1
OCM-G as a Monitoring Data Supplier
OCM-G is a system providing on-line information about clusters status. It is
composed of the Service Managers which are distributed across the Grid Computation Elements. They are gathering the information from Local Monitor
processes which are started on the Worker Nodes. The data is passed using
Globus IO communication. The most interesting part of OCM-G from our point
of view is the MainSM – the principal Service Manager that is on top of the
SMs tree and collects all the information. It is started first and is identified by
an connection string which is then passed to monitored applications and every
tool that wants to communicate with it through OMIS interface. This is what
our application does. The moment after start of MainSM we can connect to it
and query for information about the nodes.
Furthermore we can monitor applications started on the Grid. The only
requirement is that they must be compiled using the cg-ocmg-cc tool and run
with the parameters defining MainSM and application name. To be able to
get an arbitrary value from the application we need to define so called probe
functions. When they are called in the application they return a value of type
int or double to the SM. We can think of them as a kind of events that are
triggering the action of sending some data to the registered monitoring tools.
3.2
MonALISA as a Publishing and Visualization System
MonALISA (stands for for Monitoring Agents using a Large Integrated Services
Architecture) is designed as a group of autonomous, agent-oriented services.
The services can be deployed anywhere on the Internet. They only need a UDP
connection to the LookUp Discovery Service in CERN, Switzerland. The services
are automatically gathering information about the host they are running on and
can be configured to receive UDP datagrams carrying data. MonALISA also
defines Repositories - a kind of clients designed to gather specified data from
particular services and store them in a relational database. There is a subgroup
called Web Repositories which are using Servlet Engine to present historical and
real time data in form of statistics and graphical charts.
Sending data to the repository is made easy through the ApMon – a library
for most popular languages. It defines operations to form UDP datagrams with
an arbitrary number of parameters of any type. We also define a list of recipients
for the datagram, which can contain any number of services. There is also a mean
of security – service can be configured to accept only datagrams from given hosts
or datagrams containing defined password. There is no reliability assurance of
UDP datagrams if lost will not be retransmitted. This is whowever orking well
because of the speed of this solution which is necessary for the amount of data
transferred.
3.3
Making an Interconnector between Two Systems
The idea of interconnecting OCM-G and MonALISA is shown in fig. 1.
Fig. 1: Architecture diagram.
To integrate these two systems we developed an application which is acting
as an interconnector. It is run on the same machine as MainSM and is bound
to one application only. At startup we pass it the connection string to communicate with the MainSM and the name of the application that we want to
monitor. The configuration file contains parameters we’re interested in, both
standard and probe-type ones. The application then registers itself as a listener
for probe events and requests standard parameters from MainSM at a defined
rate. The communication is done with OCM-G Java API which specifies some
helper classes for OMIS messages. Nevertheless, every parameter needs a special
parser to extract a single value for a given variable.
At each timestep, standard parameters are requested from OCM-G and the
values received are sent to MonALISA services through ApMon Java library.
The addresses of services are defined in configuration files together with optional
passwords to include in datagrams. Every time a probe event is triggered, the
value of the variable returned is sent to services.
4
Sample Usage
First of all MainSM is needed to get monitored parameters. After being started
cg-ocmg-monitor prints “Connection String”. Tools have to be provide with this
string to be able to connect to MainSM.
$ cg-ocmg-monitor
MainSM Connection String: 0a000001:a655
Now we are ready to start an application providing its name and MainSM’s
“Connection String”.
$ app --ocmg-mainsm 0a000001:a655 --ocmg-appname monitored_application
The application name may be any string, but must match the application
name set in our tool’s configuration.
To pass the parameters to be monitored to MonALISA we have to provide a
configuration for our tool. Let us assume our cluster is named “OurCluster” and
we use a MonALISA service located at “example.com”. We are going to monitor
the load of nodes by sending queries once per two seconds. We also assume that
the application is using a probe function called “probe example” and we are
interested in its second argument1 . Thus we get the following configuration:
<?xml version="1.0"?>
<configuration>
<cluster name="OurCluster" />
<monalisa>
<service address="example.com" />
</monalisa>
<parameters>
<monitored name="load" frequency="2" />
<probe app="monitored_application" name="probe_example" arg="1" />
</parameters>
</configuration>
Upon having saved the configuration in a file, we can run our tool. We pass
“Connection String” as first argument and the configuration file name as second2 .
%$ ./run.sh 0a000001:a655 ourcluster_conf.xml
The tool resumes the stopped application and passes the monitored parameters to MonALISA.
We are now ready to view results in MonALISA. We can achieve it using
MonALISA Interactive Client3 . It allows to view clusters in many ways including
their presentation on a world map or list of groups. Each parameter can be
plotted (fig. 2). Plots are updated in real-time when new data is arriving at the
MonALISA Service.
5
Summary
The integration of OCM-G and MonALISA is beneficial for both projects. It
takes advantage of the fine grained parameter monitoring provided by OCM-G
and a set of very convenient publication and visualisation services coming from
MonALISA. On one hand, due to the design idea of OCM-G, which assumed that
the interactive applications have much more rigorous requirements for monitoring than conventional batch-type ones, especially w.r.t the speed the monitoring
data is delivered to a client, we have got really on-line monitoring data supply.
On the other hand, MonALISA, which features very scalable publishing and
visualisation capabilities, enables to gain a good deal of synergy with OCM-G.
1 Arguments
are numbered starting from 0.
filename defaults to conf.xml, if not passed as an argument
3 Available at http://monalisa.caltech.edu
2 Configuration
Fig. 2: MonALISA Interactive Client.
Acknowledgements. This research is partially supported by the POIG PL-Grid
project and the AGH grant 11.11.120.865.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
OCM-G: http://grid.cyfronet.pl/ocmg/
OMIS: http://www.lrr.in.tum.de/~ omis/
R-GMA: http://www.r-gma.org/
MonALISA: http://monalisa.caltech.edu
Balaton, Z., Kacsuk, P., Podhorszki, N.; Application Monitoring in the Grid with
GRM and PROVE. In Proc. ICCS 2001, Part I, pp. 253-262, LNCS 2073, Springer,
2001.
Ribler, R.L., Vetter, J.S., Simitci, H., and Reed, D.A.; Autopilot; Adaptive Control
of Distributed Applications. In: Proc. 7th IEEE HPDC, 1998.
Shaffer, E., Reed, D.A., Whitmore, S., Schaeffer, B.; Virtue: Performance Visualization of Parallel and Distributed Applications. In: Computer, vol. 32, no. 12,
pp. 44-51, 1999.
Chalakornkosol, N., Uthayopas, P.; “Monitoring the Dynamics of Grid Environment using Grid Observer”, Poster Presentation in IEEE CCGRID2003, Toshi
Center, Tokyo, May 12-15, 2003.
Massie, M.L., Chun, B.N., Culler, D.E.; The ganglia distributed monitoring system: design, implementation, and experience. In: Parallel Computing, Volume 30,
Issue 7, pp. 817-840, 2004.