jeudi 13 juillet 2017 CIC Portal/COD Activities Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France Contents CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover Use tools Each actor can use a set of operational tools (provided, integrated or interfaced) Communicate USER Tools (CIC Portal) Manage static information about my VO VO MANAGER SITE Report on site activity, submit tests, configure Track, report, diagnose and follow-up problems OPERATOR The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) REGIONAL CENTER 13/07/2017 3 What do people connect to the CIC portal for ? Distribution in 2005 OAG Av connections Dec 2004-Dec 2007 home 17% 4% 1000 users 4% 900 COD 39% VO 11% 700 600 RC 11% 500 ROC 14% 400 300 Distribution in 2007 OAG 200 0% 100 déc-07 oct-07 août-07 juin-07 avr-07 févr-07 déc-06 oct-06 août-06 juin-06 avr-06 févr-06 déc-05 oct-05 août-05 juin-05 avr-05 févr-05 0 déc-04 number of connections 800 home 28% COD 37% users 1% VO 6% month ROC 5% RC 23% Titre de l'axe ROC 5% 200 150 100 50 0 Number of sent Broadcasts ao 6 ût -0 se 6 pt -0 6 oc t -0 6 no v06 dé c0 jan 6 v07 fé vr m 07 ar s-0 7 av r-0 7 m ai -0 7 ju in -0 7 ju il07 ao ût -0 se 7 pt -0 7 oc t -0 7 no v07 dé c0 jan 7 v08 60 l-0 06 RC 23% n- users 1% VO 6% ju i COD 37% home 28% ju i 0% ju in ju 06 i ao l-06 û se t-06 pt oc 06 t no -06 v dé -06 jan c-06 v fé -07 v m r-0 ar 7 sav 07 r m -07 ai ju -07 in ju 07 i ao l-07 ût se -07 pt oc 07 t no -07 v dé -07 jan c-07 v08 m ar s-0 av 5 r m -05 ai ju 05 in ju 05 ilao 05 ût se -05 pt oc 05 t no -05 vdé 05 c jan -05 v fé -06 v m r-0 ar 6 s-0 av 6 r m -06 ai ju 06 in ju 06 il ao -06 ût se -06 pt oc 06 tno 06 vdé 06 c jan -06 v fé -07 v m r-0 ar 7 s-0 av 7 r m -07 ai ju -07 in ju 07 il ao -07 ût se -07 pt oc 07 tno 07 vdé 07 c jan -07 v08 Connections and process Distribution in 2007 OAG Total nb of registered VOs 140 120 133 100 80 40 60 20 0 250 New registrations 20 18 16 14 12 10 8 6 4 2 0 Tasks handled by CIC portal Development team Between October 2006 and February 2007 Task repartition per type Internal tools & synchronization 18% High level or political action 9% Task repartition per origin of the request Failover OCC 7% 8% Incidents and Bug fixing 25% Others 2% internal 28% OAG + VOs 13% Technical investigation 5% Tests and verifications 7% Development of new features 6% Improvement of existing features 30% ROCs 17% COD 25% Between February 2007 and January 2008 Task repartition per type Internal tools & synchronization 18% Technical investigation 6% Task repartition per origin of the request High level or political action Incidents and 5% Bug fixing 20% Tests and verifications 12% Improvement of existing features 20% Development of new features 30% Others 15% internal 17% Failover 4% OCC 12% OAG + VOs 17% COD 25% ROCs 10% Contents CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover Latest changes in 6 months Last technical changes – authentication is now based on full certificate DN instead of CN Work on VO ID cards – – – – changes in Database schema for VO/VOMS information VO ID card interface improved Integration of the YAIM VO Configurator to the CIC portal Downloadable XML dump of VO ID card info Scheduled downtimes procedure Integration of the regional 1rst line support dashboard – prototype with CE On-going developments CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover What is left for next release in March 2159 Adapt to new components released into production, cf YAIM tool. 1559 Development of a new version report taking into account several feedback. 1920 Follow SAM migration to gridview on CIC portal side IDLE Internal Tasks include quick fixes/bug fixes, documentation, background clean-up work, code optimization/prospective for EGEE-III. COD activity CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover ARM Meeting, EGEE’07, Budapest 13/07/2017 11 A tool for Grid Operators: COD dashboard Sites info Monitoring tool #1 Operato r Monitoring tool #2 Sites info Operato r Monitoring tool #n Mail client Monitoring tool #2 Monitoring tool #n Dashboard Monitoring tool #1 Mail sender Ticketing system Ticketing system MANY ENTRY POINTS SINGLE ENTRY POINT Start of EGEE The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) Now 13/07/2017 12 Interaction with EGEE services IN2P3-CC, Lyon, France OPERATIONS PORTAL - View ticket GGUS SOAP - Create ticket - Update ticket Site1 Site2 Site3 Site4 status status status status status status status status ticket #28 ticket #32 No ticket ticket #14 http FZK, Karlsruhe, Germany GOC-DB - Site info - Scheduled downtimes GIIS status per site Test results on nodes SAM CERN, Geneva, Switzerland The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) Gstat ASGC, Taipei, Taiwan 13/07/2017 13 Outline CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 13/07/2017 14 Statistics Proportion of COD tickets against GGUS tickets for all ROCs 800 700 600 500 Tickets opened by COD teams 400 Tickets opened through GGUS All GGUS tickets 300 200 100 0 31-juil. 31-août 30-sept. 31-oct. 30-nov. 31 Dec % of opened tickets CE SE SRM RGMA sBDII October 39 15 14 11 6 cod tickets 269 268 228 November 34 14 18 6 10 ggus tickets ass. To ROCs 277 281 307 December 29 18 21 9 8 ALL SU 364 427 709 Solution time [hours] Oct Nov Dec CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Duties and Working groups Zoom on Failover The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 13/07/2017 16 COD Duties Rotations of 10 federations/teams -1/5 weeks. Quarterly face-to-face meetings to update tools, procedures and uniformize working habits. =================================== 10 federations over 18 months in EGEE-I Working groups for over 18 months now…. There is more to it …. Straightforward mandate working groups: - GSTAT -- TW, SAM -- CERN, SAMAP – CE, topped by Tools for Improvement for COD, TIC – CE (EGEE’07) Working groups mandate - Integration of the existing tools CIC– FR Integration platform of all COD tools to ease-up the daily operational job - Improvement of BEST PRACTICES -- DE-CH Identifity, raise and analyse with COD how to have homogeneous operations Release of updated documentation OPM –SE Documentation under constant evolution - - Set-up of Failover Mechanisms for GRID CORE SERVICES – SWE, What is done at a federation level, what is done at the project level (need help from JShiers group), what could be done (operational point of view) and what is needed at the ROC/Site level (from a m/w point of view). - Set-up of High Availability strategy of the operational tools for CODs FAILOVER– IT Failover working group CIC Portal Usage : who/how Latest Release Portal Characteristics On-going developments CIC portal overview for COD Statistics and results Working groups Zoom on Failover for Operational Tools The 8th IEEE/ACM International Conference on Grid Computing (Grid 2007) 13/07/2017 20 EGEE Failover: purpose Propose, implement and document failover procedures for the collaboration, management and monitoring tools used in EGEE/WLCG Grid. – Solution is based on DNS and consists in: • mapping the service name to one or more destinations • update this mapping whenever some failure is detected Geographical failover for the EGEE-WLCG Grid collaboration tools – CHEP 2007, Victoria BC, Canada (September 2007) COD Work aspects to keep in EGEE IIII Dedication : Working groups recognized within federations to provide expertise and by federations to make the needs come to the central operations. Collaboration : Up to now, each federation had found a way to contribute actively to improve their COD work environment, when not proactively leading a working group. Also, each person/tool developper/expert recognized as of « global interest » eventhough out of COD scope has been integrated happily in this « closed community », e.g SAMAP TIC scope to monitor this aspect with Nagios prototype for example. Flexibility : Purpose of the groups to evolve together with their mandate with time and the upcoming of the needs e.g. Core grid services HA, EGI Anticipation : e.g. Strategy of the Operational Failover Working Group. Experiment : e.g regionalisation of tools and the future modular « NGI dashboards » to widen the CE 1rst line support experience. COD Work aspects to make evolve in EGEE IIII Mandate and Assessment of the COD activity Integration of NDGF/NE as a COD team – other teams ? Catch-all and global operations center -- what core services are to be monitored centrally , and how to monitor them and how to properly switch to backup -- How to aggregate local data and what local data would be concerned Assess metrics in order to assess the most problematic m/w components, recurrently unreliable sites Operational tools reliability assessment /ENOC test as a start base? Strenghten need on HA/Failover of operational tools and grid core services Vision of the COD tools long-term evolution : 1 set of tools /federation + aggregation? Which set of tools is to be regionalized ? SAM, GOC DB, COD? what else? How are they going to interact => need for a global schema, NOW. COD Work aspects to make evolve in EGEE IIII Leverage on « project labeled » tools in order for operational use-cases for not to remain « pending ». developements strategy/priorities are coherent. -- data workflow – synch GOCDB/BDII/SAM/COD -- development strategy – depends on the stretegy of the COD tools long-term evolution -- priority decision workflow – Who and how to drive the « project labeled » tools requests priority for operational use-cases for not to remain « pending ». - critical tests monitoring/accounting or ARC CE. - ca update procedure, - need for SAM failover… staffing is adequate for proper reactivity not only for bugfix. Interoperability/interoperations (item to be followed up) – OSG : rather informal for the moment, BUT NOW, users do have problems and sites are the relay of their users cf GGUS ticket 31037. – NDGF : existing critical test monitoring ? and what are the consequences on operational procedures? Conclusions and References Where, how, when do we adress these topics?? Some can be adressed here or can be thought at at COD meetings, some are relevant to OCC/ROC first and COD working groups can then make suggestions/recommendations. References: CIC portal: a Collaborative and Scalable Integration Platform for High Availability Grid Operations Grid 2007 (IEEE), Austin Tx, United-States (September 2007) Geographical failover for the EGEE-WLCG Grid collaboration tools CHEP 2007, Victoria BC, Canada (September 2007)
© Copyright 2026 Paperzz