Integration of the OCM-G Monitoring System into the MonALISA Infrastructure Wlodzimierz Funika, Bartosz Jakubowski, and Jakub Jaroszewski Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059, Kraków, Poland email: [email protected], [bjakubow,jjarosz]@student.agh.edu.pl Abstract The main intention of this paper is to present a way to integrate the grid monitoring system OCM-G into the monitoring agent infrastructure of MonALISA. OCM-G is designed to monitor distributed and parallel applications and provide on-line information about their execution status as well as some parameters of nodes they are running on. To provide the user with a wide set of visualization capabilities we supply these information to the services of a scalable performance visualization infrastructure, MonALISA, through common protocols. We feed the services with data which can be then accessible by a MonALISA client or repository that publishes them on a server. A sample configuration and output are provided to illustrate the idea of the operation of the integrated tools. 1 Introduction Nowadays, when Grid has become a more and more active player in HPC the importance of monitoring is not to be overestimated. It is crucial to provide information about computation nodes and jobs running on them. A product of the CrossGrid project, Grid-enabled OMIS-Compliant Monitoring system (OCM-G) [1] allows to monitor a number of the desired parameters of both execution infrastructure and interactive jobs running on it. Its compliance with On-line Monitoring Interface Specification (OMIS) [2] enables to exploit a well defined interface for interactions with data suppliers and monitoring clients, like performance analysis tools. The OCM-G features a wide palette of monitoring capabilities, including the capturing of user-defined measurement values using a special instrumentation techniques. The idea that motivated our research was how to transform the monitoring data coming from OCM-G to an easy interpretable form and make them available for a big number of clients. We concentrated our attention on MonALISA [4]. Its services are capable of providing on-line chart plotting from simple datagrams containing variables values only. The services are registered in MonALISA LookUp Service to be discoverable for clients. There can also be MonALISA Repositories which are able to perform some more sophisticated actions on data and publish the information in several ways including Tomcat Server. What we worked on is an application that is intended to pass specified information from OCM-G to MonALISA. Both of these systems provide well specified APIs that makes the task easier. The application is fully configurable through an XML file. Upon a short overview of related work, the paper proceeds with the presentation of the tools OCM-G and MonALISA in the context of their roles in the research. The next section covers our vision of the interconnection between them and their features we use for integration activities. Then we describe the details of our application followed by a usage example. 2 Related Work The domain of Grid computing is not short of monitoring tools, exploiting various visualisation approaches. There is a widely recognized R-GMA architecture [3] underlying a plenty of performance analysis tools. However, since R-GMA exploits the idea of relational monitoring database, the information supplied to the user can be user for longer time-related analysis. GRM [5] is a semi-online monitor of applications running in a distributed heterogeneous system. It consists of local monitors which are responsible for writing a complete trace of application as well as statistical data in a shared buffer. Buffer is then turned into a file by the main monitor. The file is then sent to the PROVE visualization tool which is capable of displaying those detailed information both off-line and semi-on-line. Autopilot [6] is another distributed monitor that uses local daemons to collect data about both system and network performance. It is using the concept of sensors and actuators that are put in the source code of application. For visualisation it uses the Virtue [7] toolkit. It defines a visualisation hierarchy for managing performance details. It shows data in form of graphs with data from different abstraction levels. SCMSWeb [8] unlike the two mentioned above have its own visual interface. It generates PHP files which can be deployed straight-forward on the server. It is designed to monitor performance and system usage metrics. Monitors can be organized in a hierarchical way to improve scalability. Information is passed in XML files. It is possible to retrieve statistical performance data of different abstraction levels presented with graphs. Ganglia [9] uses same concept like SCMSWeb. It uses a multicast-based listen/announce protocol and a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It has a PHP front-end and uses RRDtool to visualise data. Ganglia uses XDR to transfer XML files. 3 Integration Strategy Integration is realized by an application that works as a kind of transformer and transmitter. It obtains the data from OCM-G through OMIS interface, extracts interesting information from messages and passes them to the MonALISA service. Below we briefly describe the used systems. 3.1 OCM-G as a Monitoring Data Supplier OCM-G is a system providing on-line information about clusters status. It is composed of the Service Managers which are distributed across the Grid Computation Elements. They are gathering the information from Local Monitor processes which are started on the Worker Nodes. The data is passed using Globus IO communication. The most interesting part of OCM-G from our point of view is the MainSM – the principal Service Manager that is on top of the SMs tree and collects all the information. It is started first and is identified by an connection string which is then passed to monitored applications and every tool that wants to communicate with it through OMIS interface. This is what our application does. The moment after start of MainSM we can connect to it and query for information about the nodes. Furthermore we can monitor applications started on the Grid. The only requirement is that they must be compiled using the cg-ocmg-cc tool and run with the parameters defining MainSM and application name. To be able to get an arbitrary value from the application we need to define so called probe functions. When they are called in the application they return a value of type int or double to the SM. We can think of them as a kind of events that are triggering the action of sending some data to the registered monitoring tools. 3.2 MonALISA as a Publishing and Visualization System MonALISA (stands for for Monitoring Agents using a Large Integrated Services Architecture) is designed as a group of autonomous, agent-oriented services. The services can be deployed anywhere on the Internet. They only need a UDP connection to the LookUp Discovery Service in CERN, Switzerland. The services are automatically gathering information about the host they are running on and can be configured to receive UDP datagrams carrying data. MonALISA also defines Repositories - a kind of clients designed to gather specified data from particular services and store them in a relational database. There is a subgroup called Web Repositories which are using Servlet Engine to present historical and real time data in form of statistics and graphical charts. Sending data to the repository is made easy through the ApMon – a library for most popular languages. It defines operations to form UDP datagrams with an arbitrary number of parameters of any type. We also define a list of recipients for the datagram, which can contain any number of services. There is also a mean of security – service can be configured to accept only datagrams from given hosts or datagrams containing defined password. There is no reliability assurance of UDP datagrams if lost will not be retransmitted. This is whowever orking well because of the speed of this solution which is necessary for the amount of data transferred. 3.3 Making an Interconnector between Two Systems The idea of interconnecting OCM-G and MonALISA is shown in fig. 1. Fig. 1: Architecture diagram. To integrate these two systems we developed an application which is acting as an interconnector. It is run on the same machine as MainSM and is bound to one application only. At startup we pass it the connection string to communicate with the MainSM and the name of the application that we want to monitor. The configuration file contains parameters we’re interested in, both standard and probe-type ones. The application then registers itself as a listener for probe events and requests standard parameters from MainSM at a defined rate. The communication is done with OCM-G Java API which specifies some helper classes for OMIS messages. Nevertheless, every parameter needs a special parser to extract a single value for a given variable. At each timestep, standard parameters are requested from OCM-G and the values received are sent to MonALISA services through ApMon Java library. The addresses of services are defined in configuration files together with optional passwords to include in datagrams. Every time a probe event is triggered, the value of the variable returned is sent to services. 4 Sample Usage First of all MainSM is needed to get monitored parameters. After being started cg-ocmg-monitor prints “Connection String”. Tools have to be provide with this string to be able to connect to MainSM. $ cg-ocmg-monitor MainSM Connection String: 0a000001:a655 Now we are ready to start an application providing its name and MainSM’s “Connection String”. $ app --ocmg-mainsm 0a000001:a655 --ocmg-appname monitored_application The application name may be any string, but must match the application name set in our tool’s configuration. To pass the parameters to be monitored to MonALISA we have to provide a configuration for our tool. Let us assume our cluster is named “OurCluster” and we use a MonALISA service located at “example.com”. We are going to monitor the load of nodes by sending queries once per two seconds. We also assume that the application is using a probe function called “probe example” and we are interested in its second argument1 . Thus we get the following configuration: <?xml version="1.0"?> <configuration> <cluster name="OurCluster" /> <monalisa> <service address="example.com" /> </monalisa> <parameters> <monitored name="load" frequency="2" /> <probe app="monitored_application" name="probe_example" arg="1" /> </parameters> </configuration> Upon having saved the configuration in a file, we can run our tool. We pass “Connection String” as first argument and the configuration file name as second2 . %$ ./run.sh 0a000001:a655 ourcluster_conf.xml The tool resumes the stopped application and passes the monitored parameters to MonALISA. We are now ready to view results in MonALISA. We can achieve it using MonALISA Interactive Client3 . It allows to view clusters in many ways including their presentation on a world map or list of groups. Each parameter can be plotted (fig. 2). Plots are updated in real-time when new data is arriving at the MonALISA Service. 5 Summary The integration of OCM-G and MonALISA is beneficial for both projects. It takes advantage of the fine grained parameter monitoring provided by OCM-G and a set of very convenient publication and visualisation services coming from MonALISA. On one hand, due to the design idea of OCM-G, which assumed that the interactive applications have much more rigorous requirements for monitoring than conventional batch-type ones, especially w.r.t the speed the monitoring data is delivered to a client, we have got really on-line monitoring data supply. On the other hand, MonALISA, which features very scalable publishing and visualisation capabilities, enables to gain a good deal of synergy with OCM-G. 1 Arguments are numbered starting from 0. filename defaults to conf.xml, if not passed as an argument 3 Available at http://monalisa.caltech.edu 2 Configuration Fig. 2: MonALISA Interactive Client. Acknowledgements. This research is partially supported by the POIG PL-Grid project and the AGH grant 11.11.120.865. References 1. 2. 3. 4. 5. 6. 7. 8. 9. OCM-G: http://grid.cyfronet.pl/ocmg/ OMIS: http://www.lrr.in.tum.de/~ omis/ R-GMA: http://www.r-gma.org/ MonALISA: http://monalisa.caltech.edu Balaton, Z., Kacsuk, P., Podhorszki, N.; Application Monitoring in the Grid with GRM and PROVE. In Proc. ICCS 2001, Part I, pp. 253-262, LNCS 2073, Springer, 2001. Ribler, R.L., Vetter, J.S., Simitci, H., and Reed, D.A.; Autopilot; Adaptive Control of Distributed Applications. In: Proc. 7th IEEE HPDC, 1998. Shaffer, E., Reed, D.A., Whitmore, S., Schaeffer, B.; Virtue: Performance Visualization of Parallel and Distributed Applications. In: Computer, vol. 32, no. 12, pp. 44-51, 1999. Chalakornkosol, N., Uthayopas, P.; “Monitoring the Dynamics of Grid Environment using Grid Observer”, Poster Presentation in IEEE CCGRID2003, Toshi Center, Tokyo, May 12-15, 2003. Massie, M.L., Chun, B.N., Culler, D.E.; The ganglia distributed monitoring system: design, implementation, and experience. In: Parallel Computing, Volume 30, Issue 7, pp. 817-840, 2004.
© Copyright 2026 Paperzz