Admin Matters Enabling Grids for E-sciencE • Vera Hanser – NDGF • Jan Astalos – IINAS • COD dinner downtown on Thursday night : Fill in attendance sheet if interested • COD-15 : Lyon 06-08 Feb 2008 EGEE-II INFSO-RI-031688 ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 COD Working groups leaders Enabling Grids for E-sciencE Phone conference COD topics leaders : Jan 11th – TBC Update wiki Find deputies Straightforward mandate working groups: • GSTAT – TW, • SAM – CERN, • SAMAP – CE, - Improvement of work tools – CE - Improvement of work practices – DE-CH/FR - Release of updated documentation –SE/ - Integration of the existing tools – FR - Set-up of High Availability strategy of the operational tools for CODs – IT NEW, NEW, NEW: - Set-up of Failover Mechanisms for Grid Core Services Inter federations -- e.g. VOMS -- SWE EGEE-II INFSO-RI-031688 ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Enabling Grids for E-sciencE Proposal for COD and CE ROC 1st line support cooperation Jan Astalos, Marcin Radecky CE ROC www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks Rationale Enabling Grids for E-sciencE • The rationale on this topic can be found at the following URL: http://goc.grid.sinica.edu.tw/gocwiki/TIC_1st_line_support_integra tion Background: In CE region a team of technical grid experts is working on a 8/5 basis to help site CE admins in solving any problem with their grid site. The experts assist the site admin from the problem origin to the solution by actively searching for the solution i.e. doing detailed diagnosis at the remote site, writing necessary scripts etc. It happened that 1st line support was noticed by the site admin of a problem and already found a solution, but despite of it, due to some monitoring system latency the site was assigned a ticket from COD. That was the basis to start thinking of how COD team could benefit from 1st line support team existence in the region. EGEE-II INFSO-RI-031688 ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Integration of regional support Enabling Grids for E-sciencE In the daily operations, this would materialize in a specific dashboard set-up in the ROC section, where CE 1rst would handle SAM alarms for CE sites during their 1rst day of occurence. Then, the alarms still open, would be handled as usual by the regular COD teams. • The mechanism is thought to be transparent for the COD activity. CE federation would still be part of the regular COD teams so there would be no specific ajustments needed. Finally, discussions on the modification of the tool in the CIC operations portal would be set up for Jan 1rst 2008. Conclusions and analysis could be drawn at the end of EGEE-II. EGEE-II INFSO-RI-031688 ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 1st line support in CE ROC Enabling Grids for E-sciencE • On-duty shifts covering working hours – IISAS (4 days) and PSNC (1 day) • Problem detection – SAM, Gstat, Nagios for Central Europe • Analysis of problems – Diagnostic jobs/tests, remote analysis of log files, interactive Grid login tool • Sending notifications to sites – Direct e-mail to site contact address, IM, Skype chat • Assistance to site admins in problem solving – Interactive support usually via IM or e-mail exchange • SAMAP jobs for checking if the problem is solved • Daily reports with problem summaries – To ROC representative + other 1st line supporters – Issues to be raised on weekly operations meeting • Sending GGUS tickets to developers, etc. EGEE-II INFSO-RI-031688 ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 6 Proposed cooperation with COD Enabling Grids for E-sciencE • Main goals – To avoid tickets on problems that are already solved – To decrease the effort needed for alarm/ticket processing at COD level and also at site level • Proposal – To inform COD about status of problem analysis using alarm annotation – To pass results of detailed problem analysis to COD – To give sites and 1st line support one day grace time to fix noncritical problems if site admins do not respond to notification from 1st line support, they will receive ticket from COD • Issues – If alarm annotation is not implemented, we can use site annotation – Urgent problems at sites COD can decide to send a ticket immediately + 1st line can use other communication channels to reach site admins EGEE-II INFSO-RI-031688 ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 7
© Copyright 2026 Paperzz