General Grid Monitoring Infrastructure (GGMI) Peter kacsuk and Norbert Podhorszki MTA SZTAKI General Grid Monitoring Infrastructure (GGMI) PULSE R-GMA Browser PROVE Grid Status Monitoring Infrastructure GSMI (R-GMA) GRM Grid Appl. Monitoring Infrastructure GAMI Performance comparison of GSMI and GAMI ● Performance measurement – Loop, instrumented with GRM – For loop N, generates 2N+2 events (loop begin + loop end+ start + exit) – 1 machine – 3 machines ● M1: Producer and ProducerServlet ● M2: All the other servlets including ConsumerServlet ● M3: Consumer Performance of GSMI R-GMA 3 machines R-GMA, 1 machine 200 182 180 348 350 167 160 300 140 time [s] 250 200 150 AppTotal Total 131 time [s] 260 120 100 AppTotal Total 80 60 60 100 40 51 50 33 1 0 10 3 7 14 200 400 24 42 20 2 1 0 13 2 2 4 200 400 8 11 800 1000 0 0 600 800 1000 0 Application: Loop N times 600 N ● AppTotal = time between inserting first and last event ● Total = time between inserting first event and receiving the last event in Consumer ● ● On 3 machines, N=1000 R-GMA starts loosing events. Only 1848, 1780 from 2002 events received. For N=10K, test never finishes (one night at least) Performance of GAMI GAMI, 1 machine GAMI 3 machines 14 10 8 6 GAMI AppTotal GAMI Total 4 2 0 0 50000 100000 time [s] time [s] 12 8 7 6 5 4 3 2 1 0 GAMI AppTotal GAMI Total 0 N 50000 100000 N ● AppTotal = time between inserting first and last event ● Total = time between inserting first event and receiving the last event in Consumer ● No loosing events ● Linear scaling R-GMAvsvsGAMI GridLabon 3 machines R-GMA 3 machines 200 182 180 167 160 120 R-GMA 100 AppTotal Total AppTotal Total 80 60 GAMI 60 40 24 20 13 2 12 2 4 8 11 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 N GAMI 1 machine GridLab 1 machinevs vs33machines machines time[s] tim e [s] 140 14 13 12 11 10 9 8 7 6 5 4 3 2 1 10.000 !!! 100.000 !!! AppTotal-1 Total-1 AppTotal-3 Total-3 0 0 25000 50000 N 75000 100000 GAMI structure Local Host Site 1 PROVE Main Monitor MM Host 1 Host 2 Local Monitor LM Appl. Process Main Monitor MM Appl. Process Local Monitor LM Application Process Site 2 Host 1 Local Monitor LM Application Process GAMI ● To deliver trace data from the application to the user efficiently. – Uses TCP Socket communication – Data in XDR format and could be optimised for TCP transmission – Two sw. hops between application and GRM: local and main monitors – One hw. hop: host of main monitor Steps of application monitoring ● Step 1: user submits a job (gets GID from the broker) ● Step 2: user starts PROVE with parameter GID ● ● ● ● Step 3: PROVE looks for the execution site (search in RGMA) Step 4: PROVE looks for the address of GAMI Main Monitor of the execution site (search in R-GMA) Step 5: PROVE subscribes for application trace at the GAMI Main Monitor Step 6: GAMI Main Monitor associates the application job id (GID) with the Unix process ids. Problems of Application Monitoring ● Problem 1: To find the execution site of the application by PROVE – ● Problem 2: To find the monitor to be connected – ● What is the address of GAMI Main Monitor running at siteY? Problem 3: To find the application by the GAMI Main Monitor – ● Where is it running? -> machineX.siteY What processes (PIDs) belong to application GID? Solution: The info needed for solving these problems should be published in R-GMA => integration of RGMA and GAMI needed Problems ● ● Problem 1: To find the execution site of the application by PROVE – Where is it running? -> machineX.siteY – Broker R-GMA (discussion with WP1) Problem 2: To find the monitor to be connected – What is the address of GAMI Main Monitor running at siteY? – GAMI Main Monitor R-GMA Problems ● Problem 3: To find the application by the GAMI Main Monitor – ● ● What processes (PIDs) belong to application GID? Problem to be solved: 5 levels of job/process ids – GID (generated by the resource broker) – Condor G – ID – GRAM ID – Local job manager ID – Process ID Discussion with WP1 Temporary solution for the problem ● ● ● rd 3 User defines unique id for the application Application process publishes this id to the GAMI Local Monitor PROVE will use this id for collecting trace data User support tools General Grid Monitoring Infrastructure (GGMI) PULSE R-GMA Browser Grid Status Monitoring Infrastructure GSMI (R-GMA) Grid Appl. Monitoring Infrastructure GAMI PROVE GRM Tools ● Pulse: – ● R-GMA browser: – ● ● Analysis and presentation of Grid performance data web-based browser for available shemas and producers within the R-GMA GRM: – Instrumentation library for trace collection – On-line and off-line monitoring of sequential and MPI applications PROVE: – On-line and off-line visualization of trace for sequential and MPI applications Documents and reports ● User's Manual for the stand-alone GRM/PROVE ● GRM/PROVE User's Guide ● Versions of GRM ● Peformance Monitoring, Analysis and Presentation for Grid Applications – ● Technical report about GRM within the EU-DataGrid project http://www.lpds.sztaki.hu/~pnorbert/grm/ Publications ● From Cluster Monitoring to Grid Monitoring Based on GRM – ● Application Monitoring in the Grid with GRM and PROVE – ● EuroPar'2003, Klagenfurt Pulse: A Tool for Presentation and Analysis of Grid Performance Data – ● Proc. of the International Conference on Computational Science - ICCS 2001, San Francisco Presentation and Analysis of Grid Performance Data – ● EuroPar’2001, Manchester MIPRO'2003, Opatija http://www.lpds.sztaki.hu/~pnorbert/edg/publications/ Summary ● Advantages of the concept: – Gives a full Grid monitoring infrastructure including both ● Status monitoring ● Application monitoring – Supports on-line and off-line mpi application monitoring and visualization – Increases the chance that it will be used by LCG-2 – No special or extra requirement for R-GMA – Integration will be done by SZTAKI – Gives the potential of competing the US solutions ● already two prestigious papers at EuroPar’01 and EuroPar’03 ● Further potential publication in JOGC
© Copyright 2026 Paperzz