Event Correlation & Life Cycle Management How will they coexist in the NFV world? May 10, 2017 Dale Sorsby Bill Coward Michael Evenchick Introduction & Monitoring Introduction to NFV • Network Function Virtualization as defined by Wikipedia is: – "a network architecture concept that uses the technologies of IT virtualization to virtualize entire classes of network node functions into building blocks that may connect, or chain together, to create communication services." – It is not a Virtual Network Function (VNF) but the environment in which VNFs exist. Network Service Architecture NFV Functional Swim Lanes Resource Mgmt VNF Mgmt Service Mgmt VIM ETSI X X SDNC ETSI X X VNFM ETSI X NFVO X ETSI ETSI CDO X X X NO X Onboard VNF/Srvs Monitoring – What? • • • • • • • • Cloud Infrastructure (Compute, Network, Storage) Management Infrastructure (VIM, VNFM, NFVO) Virtual resources (cpu, mem, disk) Virtual Instances (VNFs) Virtual Services (Full Service chain through cloud) Overlay networks (Tunnel to Cloud, Cloud Networks) Underlay networks (Interfaces and connectivity of devices) Physical network devices (Routers, Switches, etc) Network Monitoring Cross Domain Orchestrator SDWAN Controller Visualization Correlation Network Orchestrator Collection VIM NFV Orchestrator SDNC VNF Manager OpenStack - VIM SDN Controller Servers VNF Managed Access Provider Core Branch/ Campus Regional Data Center SDWAN Distributed Cloud Internet Central Data Center Centralized Cloud Infrastructure Monitoring Cross Domain Orchestrator SDWAN Controller Visualization Correlation Network Orchestrator Collection VIM NFV Orchestrator SDNC VNF Manager OpenStack - VIM SDN Controller Managed Access Provider Core Branch/ Campus Regional Data Center SDWAN Distributed Cloud Internet Central Data Center Centralized Cloud Application Monitoring Cross Domain Orchestrator SDWAN Controller Visualization Correlation Network Orchestrator Collection VIM NFV Orchestrator SDNC VNF Manager OpenStack - VIM SDN Controller Managed Access Provider Core Branch/ Campus Regional Data Center SDWAN Distributed Cloud Internet Central Data Center Centralized Cloud Service Monitoring – Coordinated Effort Cross Domain Orchestrator SDWAN Controller Visualization Correlation Network Orchestrator Collection VIM NFV Orchestrator SDNC VNF Manager OpenStack - VIM SDN Controller Servers VNF Managed Access Branch/Cam pus SDWAN Internet (IPsec) Provider Core Regional Data Center Distributed Cloud Internet Central Data Center Centralized Cloud Operationalizing NFV • Don't reinvent the wheel – Use built in intelligence – Utilize strength of existing systems • Brownfield – Integrate with existing assurance systems – Don't introduce new applications/views to operators Service Assurance - Collection and Challenges • Integration • How do you know everything has been gathered? • Incomplete alerts/notifications – Reliable – Comprehensive • New approach for monitoring NFV environments – things change Event Correlation Event Correlation - Definition • Event Correlation: an automated process of understanding and revealing relationships between complex system events. • Requires Holistic Awareness Telemetry/Intelligence • Integrate Multiple Data Sources & Types, Protocols • Scale, 1000’s Physical & Logic Resource Elements • Event Relationships can be Topology, Temporal or Service … • Data, Information, Knowledge, Wisdom Event Correlation - Challenges • Challenge, make sense of events/or lack of, so they are actionable. • Internal OpenStack telemetry may have scalability & sizing challenges, Ceilometer, RabbitMQ, Nagios, Heka… • Integration/Access to multiple systems & Data Types/Formats • Healing Collisions, VNFO, NFVM, Heat … must Yield to CDO • Open Source Event Correlation Tools, Simple Event Correlator (SEC), Drools, RiverMuse… SP/CG ? Event Correlation - Opportunity • • • • Closed Loop Monitoring, Alerting, and Healing Event Aggregation, Suppression, Prioritization, Routing, Enrichment Historical Knowledge Support & Identify Service Impact / Ripple Effective event correlation supports decision making knowledge and Automation/LCM • Reduced Operation Expense, Improved Incident Inter-Department Coordination • Improve Event-to-Incident Resolution Process/Time • Smarter not Harder Event Correlation – Future • • • • • ONAP, Data Collection, Analytics, and Events (DCAE) OPNFV, VNF Event Streaming (VES) Project, common data model Open Source, Vitrage, Monasca, Zabbix Assisted/Machine Learning & A.I. Capabilities Need for comprehensive Framework for life cycle management Life Cycle Management Life Cycle Management, Where? And When? • It is great to automate! So... • Everything wants to perform Life Cycle Management – VIM: "OpenStack Heat" will perform life cycle management when it detects issues with the items that it Orchestrated. – SDN Controller (e.g. Contrail) can perform life cycle management under some condition similar to Heat when it owns the Service Instances. – VNFM will perform life cycle management of the VNF – NFVO will perform life cycle management of the NFV service through the cloud – Cross Domain Orchestrator(CDO) can also perform life cycle management of NFV services and possibly the underlay network • Result is confusion and overlap – So which system is really responsible for what? – And where is the correlation? Trouble Scenario – VNF Fails Trouble Scenario – VNF Fails • • • • • Heat determines the failure and attempts to restore the VNF SDN Controller determines the failure and attempts to restore the VNF VNFM determines the failure and attempts to restore the VNF NFVO determine the service outage and attempts to restore the service The Cross Domain Orchestrator determines the service outage and attempts to restore the service. • Will all 5 try to heal? Probably not but experiences have shown multiple elements heal and the result is failed healing and systems being out of sync. – – – – Heat determines the failure and instantiates a new VNF The new VNF claims ports on the network The NFVO determines the failure and attempts to destroy the Heat stack This fails due to not being able to free all the ports on the network • Unwanted Result: Systems are now out of sync and the status of the service is in question. If brought up by Heat, any Customer specific configuration placed on the VNF by the VNFM will be missing. Trouble Scenario – VNF Management Network Failure Trouble Scenario – VNF Management Network Failure • VNFM determines VNF failure and attempts to heal the VNF • NFVO determines service issues and attempts to heal the service. • Will these occur? It will depend on what and how the VNF is being monitored but experience shows this happens. If either one of these happens, the service which could actually have been fine, will be impacted. • Unwanted Result: a 5 minute outage caused entirely by the systems put in place to minimize the customer impact. Trouble Scenario – Network Outage Trouble Scenario – Network Outage • Detection of traffic flow issue – VNFM determines VNF failure and attempts to heal the VNF – NFVO determines service issues and attempts to heal the service. – Cross Domain Orchestrator determines there is an issue with the service and attempts to heal the service? • Will these occur? It will depend on what and how the VNF is being monitored but experience shows this happens. If either one of these happens, the service which could actually have been fine, will be impacted. • Unwanted Result: a 5 minute outage caused entirely by the systems put in place to minimize the customer impact. While in reality the customer may have seen a limited outage 10-30 seconds Cross Domain Orchestration • • • • It is immature and in its infancy but could be part of the answer. It will need to insure all of the necessary events are collected It will need to tightly integrate with correlation It will need to tightly integrate with all Domain Orchestrators to have an end to end view of the service • It will need to permit each Domain Orchestrator to control their domain Life Cycle Management & Existing Correlation • Lessons Learned – Work with Operations and the system they use - The best tools are only the best because they get used! – Systems working with other systems is the only way to achieve success with end to end life cycle management. So plan to integrate with other systems from the beginning not as an after thought! – Existing systems will need to learn and handle the new world or they will eventually have to be replaced! – "Stay in your swim lane but understand the world is bigger than you!"
© Copyright 2026 Paperzz