GRM+R-GMA+GridLab monitor

General Grid Monitoring Infrastructure
(GGMI)
Peter kacsuk and Norbert Podhorszki
MTA SZTAKI
General Grid Monitoring
Infrastructure (GGMI)
PULSE
R-GMA Browser
PROVE
Grid Status
Monitoring
Infrastructure
GSMI
(R-GMA)
GRM
Grid Appl.
Monitoring
Infrastructure
GAMI
Performance comparison of GSMI
and GAMI
●
Performance measurement
–
Loop, instrumented with GRM
–
For loop N, generates 2N+2 events
(loop begin + loop end+ start + exit)
–
1 machine
–
3 machines
●
M1: Producer and ProducerServlet
●
M2: All the other servlets including ConsumerServlet
●
M3: Consumer
Performance of GSMI
R-GMA 3 machines
R-GMA, 1 machine
200
182
180
348
350
167
160
300
140
time [s]
250
200
150
AppTotal
Total
131
time [s]
260
120
100
AppTotal
Total
80
60
60
100
40
51
50
33
1
0
10
3
7
14
200
400
24
42
20
2
1
0
13
2
2
4
200
400
8
11
800
1000
0
0
600
800
1000
0
Application: Loop N times
600
N
●
AppTotal = time between inserting first and last event
●
Total = time between inserting first event and receiving the last event in Consumer
●
●
On 3 machines, N=1000 R-GMA starts loosing events. Only 1848, 1780 from 2002
events received.
For N=10K, test never finishes (one night at least)
Performance of GAMI
GAMI, 1 machine
GAMI 3 machines
14
10
8
6
GAMI AppTotal
GAMI Total
4
2
0
0
50000
100000
time [s]
time [s]
12
8
7
6
5
4
3
2
1
0
GAMI AppTotal
GAMI Total
0
N
50000
100000
N
●
AppTotal = time between inserting first and last event
●
Total = time between inserting first event and receiving the last event in Consumer
●
No loosing events
●
Linear scaling
R-GMAvsvsGAMI
GridLabon
3 machines
R-GMA
3 machines
200
182
180
167
160
120
R-GMA
100
AppTotal
Total
AppTotal
Total
80
60
GAMI
60
40
24
20 13
2
12 2 4
8 11
0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
N
GAMI
1 machine
GridLab
1 machinevs
vs33machines
machines
time[s]
tim e [s]
140
14
13
12
11
10
9
8
7
6
5
4
3
2
1
10.000 !!!
100.000 !!!
AppTotal-1
Total-1
AppTotal-3
Total-3
0
0
25000
50000
N
75000
100000
GAMI structure
Local Host
Site 1
PROVE
Main Monitor
MM
Host 1
Host 2
Local Monitor
LM
Appl.
Process
Main Monitor
MM
Appl.
Process
Local Monitor
LM
Application
Process
Site 2
Host 1
Local Monitor
LM
Application
Process
GAMI
●
To deliver trace data from the application to the
user efficiently.
–
Uses TCP Socket communication
–
Data in XDR format and could be optimised for TCP
transmission
–
Two sw. hops between application and GRM: local
and main monitors
–
One hw. hop: host of main monitor
Steps of application monitoring
●
Step 1: user submits a job (gets GID from the broker)
●
Step 2: user starts PROVE with parameter GID
●
●
●
●
Step 3: PROVE looks for the execution site (search in RGMA)
Step 4: PROVE looks for the address of GAMI Main
Monitor of the execution site (search in R-GMA)
Step 5: PROVE subscribes for application trace at the
GAMI Main Monitor
Step 6: GAMI Main Monitor associates the application
job id (GID) with the Unix process ids.
Problems of Application Monitoring
●
Problem 1: To find the execution site of the application
by PROVE
–
●
Problem 2: To find the monitor to be connected
–
●
What is the address of GAMI Main Monitor running at siteY?
Problem 3: To find the application by the GAMI Main
Monitor
–
●
Where is it running? -> machineX.siteY
What processes (PIDs) belong to application GID?
Solution: The info needed for solving these problems
should be published in R-GMA => integration of RGMA and GAMI needed
Problems
●
●
Problem 1: To find the execution site of the
application by PROVE
–
Where is it running? -> machineX.siteY
–
Broker  R-GMA (discussion with WP1)
Problem 2: To find the monitor to be connected
–
What is the address of GAMI Main Monitor running
at siteY?
–
GAMI Main Monitor  R-GMA
Problems
●
Problem 3: To find the application by the GAMI Main
Monitor
–
●
●
What processes (PIDs) belong to application GID?
Problem to be solved: 5 levels of job/process ids
–
GID (generated by the resource broker)
–
Condor G – ID
–
GRAM ID
–
Local job manager ID
–
Process ID
Discussion with WP1
Temporary solution for the
problem
●
●
●
rd
3
User defines unique id for the application
Application process publishes this id to the
GAMI Local Monitor
PROVE will use this id for collecting trace data
User support tools
General Grid Monitoring
Infrastructure (GGMI)
PULSE
R-GMA Browser
Grid Status
Monitoring
Infrastructure
GSMI
(R-GMA)
Grid Appl.
Monitoring
Infrastructure
GAMI
PROVE
GRM
Tools
●
Pulse:
–
●
R-GMA browser:
–
●
●
Analysis and presentation of Grid performance data
web-based browser for available shemas and producers within
the R-GMA
GRM:
–
Instrumentation library for trace collection
–
On-line and off-line monitoring of sequential and MPI
applications
PROVE:
–
On-line and off-line visualization of trace for sequential and
MPI applications
Documents and reports
●
User's Manual for the stand-alone GRM/PROVE
●
GRM/PROVE User's Guide
●
Versions of GRM
●
Peformance Monitoring, Analysis and Presentation
for Grid Applications
–
●
Technical report about GRM within the EU-DataGrid project
http://www.lpds.sztaki.hu/~pnorbert/grm/
Publications
●
From Cluster Monitoring to Grid Monitoring Based on GRM
–
●
Application Monitoring in the Grid with GRM and PROVE
–
●
EuroPar'2003, Klagenfurt
Pulse: A Tool for Presentation and Analysis of Grid
Performance Data
–
●
Proc. of the International Conference on Computational
Science - ICCS 2001, San Francisco
Presentation and Analysis of Grid Performance Data
–
●
EuroPar’2001, Manchester
MIPRO'2003, Opatija
http://www.lpds.sztaki.hu/~pnorbert/edg/publications/
Summary
●
Advantages of the concept:
–
Gives a full Grid monitoring infrastructure including both
●
Status monitoring
●
Application monitoring
–
Supports on-line and off-line mpi application monitoring and
visualization
–
Increases the chance that it will be used by LCG-2
–
No special or extra requirement for R-GMA
–
Integration will be done by SZTAKI
–
Gives the potential of competing the US solutions
●
already two prestigious papers at EuroPar’01 and EuroPar’03
●
Further potential publication in JOGC