- OpenStack

Event Correlation & Life Cycle Management
How will they coexist in the NFV world?
May 10, 2017
Dale Sorsby
Bill Coward
Michael Evenchick
Introduction & Monitoring
Introduction to NFV
• Network Function Virtualization as defined by Wikipedia is:
– "a network architecture concept that uses the technologies of
IT virtualization to virtualize entire classes
of network node functions into building blocks that may connect, or
chain together, to create communication services."
– It is not a Virtual Network Function (VNF) but the environment in which
VNFs exist.
Network Service Architecture
NFV Functional Swim Lanes
Resource
Mgmt
VNF
Mgmt
Service
Mgmt
VIM
ETSI
X
X
SDNC
ETSI
X
X
VNFM
ETSI
X
NFVO
X
ETSI
ETSI
CDO
X
X
X
NO
X
Onboard
VNF/Srvs
Monitoring – What?
•
•
•
•
•
•
•
•
Cloud Infrastructure (Compute, Network, Storage)
Management Infrastructure (VIM, VNFM, NFVO)
Virtual resources (cpu, mem, disk)
Virtual Instances (VNFs)
Virtual Services (Full Service chain through cloud)
Overlay networks (Tunnel to Cloud, Cloud Networks)
Underlay networks (Interfaces and connectivity of devices)
Physical network devices (Routers, Switches, etc)
Network Monitoring
Cross Domain Orchestrator
SDWAN Controller
Visualization
Correlation
Network Orchestrator
Collection
VIM
NFV Orchestrator
SDNC
VNF Manager
OpenStack - VIM
SDN Controller
Servers
VNF
Managed
Access
Provider
Core
Branch/
Campus
Regional Data
Center
SDWAN
Distributed
Cloud
Internet
Central Data Center
Centralized Cloud
Infrastructure Monitoring
Cross Domain Orchestrator
SDWAN Controller
Visualization
Correlation
Network Orchestrator
Collection
VIM
NFV Orchestrator
SDNC
VNF Manager
OpenStack - VIM
SDN Controller
Managed
Access
Provider
Core
Branch/
Campus
Regional Data
Center
SDWAN
Distributed
Cloud
Internet
Central Data Center
Centralized Cloud
Application Monitoring
Cross Domain Orchestrator
SDWAN Controller
Visualization
Correlation
Network Orchestrator
Collection
VIM
NFV Orchestrator
SDNC
VNF Manager
OpenStack - VIM
SDN Controller
Managed
Access
Provider
Core
Branch/
Campus
Regional Data
Center
SDWAN
Distributed
Cloud
Internet
Central Data Center
Centralized Cloud
Service Monitoring – Coordinated Effort
Cross Domain Orchestrator
SDWAN Controller
Visualization
Correlation
Network Orchestrator
Collection
VIM
NFV Orchestrator
SDNC
VNF Manager
OpenStack - VIM
SDN Controller
Servers
VNF
Managed
Access
Branch/Cam
pus
SDWAN
Internet
(IPsec)
Provider
Core
Regional Data
Center
Distributed
Cloud
Internet
Central Data Center
Centralized Cloud
Operationalizing NFV
• Don't reinvent the wheel
– Use built in intelligence
– Utilize strength of existing systems
• Brownfield
– Integrate with existing assurance systems
– Don't introduce new applications/views to operators
Service Assurance - Collection and Challenges
• Integration
• How do you know everything has been gathered?
• Incomplete alerts/notifications
– Reliable
– Comprehensive
• New approach for monitoring NFV environments – things change
Event Correlation
Event Correlation - Definition
• Event Correlation: an automated process of understanding and
revealing relationships between complex system events.
• Requires Holistic Awareness Telemetry/Intelligence
• Integrate Multiple Data Sources & Types, Protocols
• Scale, 1000’s Physical & Logic Resource Elements
• Event Relationships can be Topology, Temporal or Service …
• Data, Information, Knowledge, Wisdom
Event Correlation - Challenges
• Challenge, make sense of events/or lack of, so they are actionable.
• Internal OpenStack telemetry may have scalability & sizing
challenges, Ceilometer, RabbitMQ, Nagios, Heka…
• Integration/Access to multiple systems & Data Types/Formats
• Healing Collisions, VNFO, NFVM, Heat … must Yield to CDO
• Open Source Event Correlation Tools, Simple Event Correlator (SEC),
Drools, RiverMuse… SP/CG ?
Event Correlation - Opportunity
•
•
•
•
Closed Loop Monitoring, Alerting, and Healing
Event Aggregation, Suppression, Prioritization, Routing, Enrichment
Historical Knowledge Support & Identify Service Impact / Ripple
Effective event correlation supports decision making knowledge and
Automation/LCM
• Reduced Operation Expense, Improved Incident Inter-Department
Coordination
• Improve Event-to-Incident Resolution Process/Time
• Smarter not Harder
Event Correlation – Future
•
•
•
•
•
ONAP, Data Collection, Analytics, and Events (DCAE)
OPNFV, VNF Event Streaming (VES) Project, common data model
Open Source, Vitrage, Monasca, Zabbix
Assisted/Machine Learning & A.I. Capabilities
Need for comprehensive Framework for life cycle management
Life Cycle Management
Life Cycle Management, Where? And When?
• It is great to automate! So...
• Everything wants to perform Life Cycle Management
– VIM: "OpenStack Heat" will perform life cycle management when it detects issues with the
items that it Orchestrated.
– SDN Controller (e.g. Contrail) can perform life cycle management under some condition
similar to Heat when it owns the Service Instances.
– VNFM will perform life cycle management of the VNF
– NFVO will perform life cycle management of the NFV service through the cloud
– Cross Domain Orchestrator(CDO) can also perform life cycle management of NFV services
and possibly the underlay network
• Result is confusion and overlap
– So which system is really responsible for what?
– And where is the correlation?
Trouble Scenario – VNF Fails
Trouble Scenario – VNF Fails
•
•
•
•
•
Heat determines the failure and attempts to restore the VNF
SDN Controller determines the failure and attempts to restore the VNF
VNFM determines the failure and attempts to restore the VNF
NFVO determine the service outage and attempts to restore the service
The Cross Domain Orchestrator determines the service outage and attempts to restore the service.
• Will all 5 try to heal? Probably not but experiences have shown multiple elements heal and the result
is failed healing and systems being out of sync.
–
–
–
–
Heat determines the failure and instantiates a new VNF
The new VNF claims ports on the network
The NFVO determines the failure and attempts to destroy the Heat stack
This fails due to not being able to free all the ports on the network
• Unwanted Result: Systems are now out of sync and the status of the service is in question. If brought
up by Heat, any Customer specific configuration placed on the VNF by the VNFM will be missing.
Trouble Scenario – VNF Management Network Failure
Trouble Scenario – VNF Management Network Failure
• VNFM determines VNF failure and attempts to heal the VNF
• NFVO determines service issues and attempts to heal the service.
• Will these occur? It will depend on what and how the VNF is being
monitored but experience shows this happens. If either one of these
happens, the service which could actually have been fine, will be
impacted.
• Unwanted Result: a 5 minute outage caused entirely by the systems
put in place to minimize the customer impact.
Trouble Scenario – Network Outage
Trouble Scenario – Network Outage
• Detection of traffic flow issue
– VNFM determines VNF failure and attempts to heal the VNF
– NFVO determines service issues and attempts to heal the service.
– Cross Domain Orchestrator determines there is an issue with the service and
attempts to heal the service?
• Will these occur? It will depend on what and how the VNF is being monitored but
experience shows this happens. If either one of these happens, the service which could
actually have been fine, will be impacted.
• Unwanted Result: a 5 minute outage caused entirely by the systems put in place to
minimize the customer impact. While in reality the customer may have seen a limited
outage 10-30 seconds
Cross Domain Orchestration
•
•
•
•
It is immature and in its infancy but could be part of the answer.
It will need to insure all of the necessary events are collected
It will need to tightly integrate with correlation
It will need to tightly integrate with all Domain Orchestrators to have
an end to end view of the service
• It will need to permit each Domain Orchestrator to control their
domain
Life Cycle Management & Existing Correlation
• Lessons Learned
– Work with Operations and the system they use - The best tools are
only the best because they get used!
– Systems working with other systems is the only way to achieve success
with end to end life cycle management. So plan to integrate with other
systems from the beginning not as an after thought!
– Existing systems will need to learn and handle the new world or they
will eventually have to be replaced!
– "Stay in your swim lane but understand the world is bigger than you!"