EGI-InSPIRE EGI Network Support task force Mario Reale IGI / GARR [email protected] January 24, 2011 EGI OMB f2f meeting Amsterdam EGI.eu EGI-InSPIRE RI-261323 1 www.egi.eu Overview EGI-InSPIRE • Introduction to the Task Force • Definition of the identified use cases • Answers from the NGI EGI-InSPIRE RI-261323 2 www.egi.eu Goals and duration • Mandate: assessment of the current stand of Network Support for EGI and the formulation of a proposal for it – – – – – – Gather user requirements from NGIs Assess the status of the available tools Further develop and consolidate new proposed tools Identify missing bits / tools Propose tools and workflows to the EGI Net Sup community Define draft workplan for the next months • Started on October 20, 2010, ended on January 21, 2011 – around 8 working weeks duration – coordinated from remote • met 5 times in VideoConference: 20/10, 10/11, 22/11,10/12, 14/1 EGI-InSPIRE RI-261323 3 www.egi.eu Membership • • • • • • • • • • • Etienne Duble France-Grille (UREC CNRS) Xavier Jeannin France-Grille (UREC CNRS) Esther Robles (RedIRIS) Alberto Escolano (RedIRIS) Bruno Hoeft (D-GRID KIT) Mario Reale (IGI GARR) Fulvio Galeazzi (IGI GARR) Alfredo Pagano (IGI GARR) Wenshui Chen (ASGC) Domenico Vicinanza (DANTE Int.Rel.Team) Szymon Trocha (PSNC/GN3 SA2 T3 PerfSONAR) EGI-InSPIRE RI-261323 4 www.egi.eu What has been done • Identified 7 network related Use Cases • Organized a questionnaire about them for the NGIs, gathered and published the results • Identified a strategy for all of them – although we specified strategies at different levels of accuracy and technical insight • Some of us worked on further development of tools – PerfSONAR live-CD, HINTS, NetJobs • Designed the GGUS network support workflow to be implemented for EGI • Liaised with GN3 about the current PerfSONAR status/tools EGI-InSPIRE RI-261323 5 www.egi.eu What has NOT been done • Brought all proposed new tools to a final, frozen production status after extensive validation phase – But all proposed tools can usefully be used by early adopters • Made a world-wide, general assessment of all available tools for network monitoring and network support in general • Developed new tools in all cases we felt either a brand new tool or a major improvement of the existing ones would be required – Example: Network-related Scheduled Maintenances EGI-InSPIRE RI-261323 6 www.egi.eu • Identified Use Cases (7) • Answers from the NGI Questionnaire EGI-InSPIRE RI-261323 7 www.egi.eu GGUS • Grid Users and Site Administrators open a ticket in the GGUS support system when they think a network issue is behind the problems they are experiencing. Tickets are assigned to the GGUS Network Support Unit and processed until solved. • We need to give a home to all network related issues in EGI – currently unattended • To whom assign network related issues ? – A support team made by network experts from volunteering NGIs or NRENs ? – Skip the Grid community and assign tickets directly to the NRENs and/or GEANT/DANTE ? • Many parties involved in ticket processing: Site Admins, NREN NOCs and APMs, GEANT NOC and APMs EGI-InSPIRE RI-261323 8 www.egi.eu Answers on GGUS GGUS Provided Answer Type (n.) 1 22 21 2 3 4 20 4 3 19 14 5 5 2 1 18 6 0 GGUS 17 12 7 16 8 15 9 14 10 10 13 11 12 8 GGUS: answer from each NGI Provided Answer Type (n.) 6 6 5 4 4 2 3 GGUS 2 0 1 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Answer n.3: Having a GGUS support unit for Network Support is useful, but tickets should be handled automatically according to a given workflow and routed to NRENs/NGIs contacts; no need to have a permanent team behind this unit EGI-InSPIRE RI-261323 9 www.egi.eu EGI PERT • Grid Users experiencing poor performances in data transfers can refer to a global EGI PERT Contact Team (with both Grid Middleware/Applications and Network Know-How) to get support • The idea would be to have EGI-wise a unique team of experts with both Grid Middleware/Applications and Network knowhow (merging the 2 communities) • Expensive idea, but useful: – bottleneck identification involve digging into both domains and its interface/interaction – Middleware and Application experts (VO,VRCs) could start excluding higher level issues in the ISO/OSI stack before NRENs and Federated EduPERT networking experts come in • It turned out to be too expensive for the NGIs’ manpower/budget – at least at this stage EGI-InSPIRE RI-261323 10 www.egi.eu Provided Answers on EGI PERT PERT: answer from each NGI Provided Answer Type (n.) 6 5 4 16 3 14 PERT 2 12 1 10 0 1 8 2 3 5 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Provided Answer Type (n.) PERT 6 1 4 21 22 20 2 19 18 0 1 2 3 4 5 5 4 3 2 1 0 2 3 4 5 6 17 7 16 8 15 Answer n.4: PERT 9 14 13 11 10 12 Having a Global EGI PERT access point for users experiencing poor performances – forming a PERT Team with Grid-added know how – is useful, but we cannot commit any resource/manpower to it EGI-InSPIRE RI-261323 11 www.egi.eu Scheduled Maintenances • When an identified accident or the scheduled maintenances of network devices/PoPs is impacting on a Grid resource center/site, users, site admins and Operations teams are warned in advance (Sched Maint) or informed asap (Accident) • The idea would be inform users/site Admins about why things are not working when there are obvious reasons for experiencing problems – Currently GOCDB is used for Gridrelated Sched M. • Requires NREN-NGI communication/coordination: – a mapping between Network devices/PoPs and Grid resource centers/sites – a mapping between Grid resource centers/sites and Users • Can be managed using a pull or a push logic – Users subscribe to a given site and get notified – Impacted sites publish information on a web site and users fetch information from there EGI-InSPIRE RI-261323 12 www.egi.eu Provided Answers on Scheduled Maintenances 18 Sched Maintenance 16 1 14 21 22 3 4 3 19 10 2 4 20 12 5 5 2 1 18 6 0 Serie1 8 17 7 16 6 8 15 9 14 4 Sched Maintenance 13 11 10 12 2 0 1 • 2 3 4 5 6 Answer n.3 Having a global EGI tool/service to warn users and site administrators about Sched Maint is useful; storing the information in one place is the solution to go for, but we cannot commit any manpower/resource to develop nor maintain such a tool Scheduled Maintenances: Answer from each NGI 6 5 4 3 Scheduled Maintenances 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 EGI-InSPIRE RI-261323 13 www.egi.eu Network TroubleShooting on Demand • Grid site administrators, Operation Centers or authorized users experiencing problems in reaching a given site/resource perform troubleshooting on demand to exclude basic network issues behind the problems they’re experiencing • Requires local deployment at the sites of probes controlled by a central system • Results in the introduction of different roles • Basic checks would involve ping, traceroute, reverse DNS checks, port scan, available bandwidth measurements EGI-InSPIRE RI-261323 14 www.egi.eu Provided answers on Network Troubleshooting on Demand Troublesh On Dem Troublesh on Dem 1 4 22 2 21 18 3 3 20 16 19 14 4 2 5 1 12 18 10 17 6 0 Troublesh On Dem 7 Troublesh on Dem 8 16 6 8 15 9 14 4 10 13 11 2 12 0 1 2 3 4 • Answer n.3: Having a network tool for troubleshooting on Demand is useful, but we cannot commit any resource/manpower to contribute to develop nor test it EGI-InSPIRE RI-261323 Troubleshooting On Demand: Answer from each NGI 4,5 4 3,5 3 2,5 Troub On Dem 2 1,5 1 0,5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 15 www.egi.eu e2e MultiDomain monitoring • Users and Site Administrators get network performances measurements for a subset of e2e paths within the EGI Infrastructure, getting monitoring information gathered by scheduled, periodic measurements • Muldidomain: NRENs, GEANT • Monitoring data may include – Link Availability ( i/f utilization, Input Errors, Output Drops) – One-way Delay – RTT, number of hops – IPDV(Jitter) – Available TCP Bandwidth EGI-InSPIRE RI-261323 16 www.egi.eu Provided answers on e2e multidomain monitoring e2e MD Sched Mon e2e MD Sched Mon 1 24 23 3 4 4 21 5 3 2 20 7 2 5 22 8 6 6 1 19 6 0 7 18 8 17 5 9 16 10 15 e2e MD Sched Mon 4 e2e MD Sched Mon 11 14 12 13 3 e2e MultiD sched mon: Answer from each NGI 2 7 6 1 5 0 4 1 2 3 4 5 6 e2e MultiD sched mon 3 Answer n.3: 2 1 0 Having an e2e MultiDomain monitoring tool for a specific subset of of the whole set of e2e paths within EGI is useful, but we cannot commit resources nor manpower and cannot afford deploying anything locally at the sites 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 EGI-InSPIRE RI-261323 17 www.egi.eu DownCollector • Users, Site Admins and Operation Centers need to check if services available at various grid sites are reachable and responsive • DownCollector developed during EGEE for monitoring Grid services at the sites • Migrated from EGEE ENOC to EGI • Checks services are reachable on specific ports from a central location, star-based architecture • Possible evolution would be to have additional geographically distributed instances, gathering results EGI-InSPIRE RI-261323 18 www.egi.eu Provided answers on DownCollector DownCollector DownCollector 1 24 4 2 23 3 3 22 10 21 20 9 8 5 6 1 19 7 6 4 2 0 18 8 17 5 9 16 DownCollector 10 15 11 14 4 3 DownCollector 7 12 13 DownCollector: Answer from each NGI 2 1 4,5 4 0 3,5 1 2 3 4 3 2,5 DownCollector 2 Answer n.3: 1,5 1 0,5 Having a DownCollector tool is useful but we cannot commit any manpower nor resources to contribute to its deployment EGI-InSPIRE RI-261323 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 19 www.egi.eu Policy & Collaboration • establish an EGI group of people, a body permanently in charge of interfacing the NRENs, EGI.eu, EMI, DANTE, GEANT and TERENA to discuss issues related to – the provisioning of network connectivity or the upgrade of existing links, – new services and new standards – new tools for monitoring, – new joint initiatives on tutorials, dissemination on tools, – testing and prototyping of middleware with respect to the network layer so that the requirements, coming from the EGI user community and the VRCs could be shipped to the Network community and relevant information is exchanged EGI-InSPIRE RI-261323 20 www.egi.eu Provided Answers on Policy & Cooperation Policy and Coop Policy and Coop 1 24 23 2 3 5 22 12 6 4 4 21 5 3 2 20 10 6 1 19 8 0 7 18 8 17 6 Policy and Coop Policy and Coop 9 16 10 15 11 14 12 13 4 Policy and Cooperation: Answer from each NGI 2 7 0 6 1 2 3 4 5 6 5 4 Answer n.2: Policy and Cooperation 3 2 1 Having a Policy and Cooperation Group is useless. EGI-InSPIRE RI-261323 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 21 www.egi.eu How we structured today’s meeting • • • • 1. Introduction to the TF and its objectives 2. Report on what we propose for each use case 3. Presentation of tools 4. General Discussion/Feedback from NGIs – We should decide upon • Approve a GGUS workflow – So that it can be implemented within the GGUS system • Adopting or dropping the proposed tools • Identify volunteering NGIs for early adoption, initial extended deployment of tools • Identify possible missing bits or uncovered use cases/unsatisfied requirements to work upon EGI-InSPIRE RI-261323 22 www.egi.eu
© Copyright 2026 Paperzz