Hybrid OpenFlow - OpenDayLight Wiki

Time Series Data
Repository (TSDR)
Project Proposal
www.opendaylight.org
TSDR Functional Objectives

To capture ODL data into a persistent time series data repository
This includes:
 Statistics counters
 Performance data


Health status information
Operational configuration data

To facilitate various applications built on top of TSDR
Applications include:




Operational configuration optimization
Traffic engineering
Network analytics with automated intelligence


Major functions




Security risk detection
Performance analysis
Data Collection
Data Storage
Data Queries


Data Aggregation
Data Purge
Lithium Focus

TSDR functionalities on OpenFlow Statistics data
www.opendaylight.org
2
TSDR Design Objectives

Generic and Extensible architectural framework

Generic and extensible TSDR Data Model.
 Abstract and generic TSDR Persistence Layer
−

Allow implementation of various data store plugins under TSDR
Persistence Layer
−

with TSDR Persistence APIs
with HBase Plugin as an example TSDR Data Store implementation.
Scalable with high performance

Providing both integrated and distributed architectures
−

to handle different scales of time series data
Fully utilizing MD-SAL’s clustering capability
−
to handle performance and scalability in large scale deployment scenarios
www.opendaylight.org
3
TSDR Integrated Architecture





TSDR Data Services
including Data Collection,
Data Storage, Data
Query, Data Purging, and
Data Aggregation are MDSAL services.
Data Collection service
receives time series data
published on MD-SAL
messaging bus from MDSAL southbound plugins.
Data Collection service
communicates with Data
Storage service to store
the data into TSDR.
TSDR data services
access TSDR Data Store
such as HBase through
generic TSDR Data
Persistence Layer.
Needs MD-SAL
notification subsystem
support.
www.opendaylight.org
4
TSDR Distributed Architecture




In large data center
deployment scenarios,
TSDR Distributed
Architecture would be
needed to handle the
performance and
scalability.
In distributed
architecture, TSDR data
services are deployed in
a separate MD-SAL
instance.
The data pushed onto
MD-SAL messaging bus
by ODL southbound
plugin are propagated
to the other MD-SAL
instance for TSDR data
services to process into
TSDR data repository.
Needs ODL clustering
support.
www.opendaylight.org
5
TSDR Data Flow with multiple data
models




TSDR Data Flow involves
multiple data models including
source data model ( OpenFlow
statistics), TSDR data model,
and TSDR plugin ( HBase) data
model.
Data Collection Service
subscribes to receive OpenFlow
Statistics data from MD-SAL
Notification Subsystem and
passes the data to Data Storage
Service.
Data Storage Service converts
OpenFlow Statistics data model
to TSDR data model.
HBase TSDR Plugin converts
TSDR data model to HBase
specific data model based on
HBase TSDR schema design.
www.opendaylight.org
6
Unstructured or Semi-Structured data consideration – for
future release





For unstructured or semi-structured
data such as syslog data, MD-SAL
receives the data in the format of
syslog specifica data model.
Data Filtering and Preprocessing can
be added to filter out the data noise
and optionally extract structured
information from the semi-structured
data.
Third party specific TSDR plugin such
as Splunk Plugin could be added
under TSDR Data Persistence Layer
to work with proprietary data stores.
Data Aggregation Service is not
needed when handling unstructured
data.
Third party tools such as Splunk
could leverage Data Query Service to
obtain the unstructured data from
TSDR and add application specific
processing on top of it.
www.opendaylight.org
7
TSDR Data Model

The goal of the TSDR data
model design:



Generic
Extensible

Scalable
Performance
Optimized

The data model captures:

Statistics data

Log type of data
Note: To add a new group, extend
TSDRBaseRecord

DataCategory contains:

Flow Group Stats
Flow Stats
Flow Meter Stats

Interface Stats 

Log Records

Queue Stats
Note: More categories can be added to the
above list.


RecordKeys contains:

A list of composite keys
Different categories contain different
set of keys
Key set validation is needed based on
different data categories


www.opendaylight.org
8
TSDR Persistence APIs
Interface Name
Description/comments
Extends from ODL
Common APIs?
Specific to
TSDR
Persistence
API?
Will be
implemented in
HBase plugin in
Lithium?
save()
Including saving one or a list of
objects
Yes
No
Yes
find()
Including query based on a list of
IDs, with specified criteria, and
paging support
Yes
No
No
Yes
No
Yes
count()
delete()
Including delete with one or a list
of IDs, and delete the entire table
Yes
No
No
exists()
Including query based on one or
a list of IDs
Yes
No
Yes
min(), max(), avg()
For Data Aggregation purpose
No
Yes
No
www.opendaylight.org
9
HBase TSDR Schema – Raw Data
TableName
RowKey
Column Family:
Column Qualifier =
Cell Value
FlowMetrics
MetricID_NodeID_TableID(_FlowID)_timestamp
‘raw’ = metric_value
InterfaceMetrics
MetricID_NodeID_TableID(_PortID)_timestamp
‘raw’ = metric_value
QueueMetrics
MetricID_NodeID_TableID_PortID_QueueID_timestamp
‘raw’ = metric_value
GroupMetrics
MetricID_NodeID_GroupID(_GroupBucketID)_timestamp
‘raw’ = metric_value
MeterMetrics
MetricID_NodeID_GroupID(_MeterID)_timestamp
‘raw’ = metric_value
Schema Design considerations:

General HBase Schema Design Rules applied:
 Keep RowKey, Column Family Key,
Column Qualifier as short as possible.
 Design the RowKey properly so as to
keep rows evenly distributed in multiple
data nodes.
 Keep the number of column family low

Other performance considerations:
 Multiple tables are created based on the
data categories in the TSDR data model.
 Data storage and query operations run
much faster on smaller data sets stored in
HBase tables with structured keys.
www.opendaylight.org
10
HBase TSDR Schema – Aggregated Data
TableName
RowKey
Column Family: Column
Qualifier = Cell Value
HourlyFlowMetrics
MetricID_NodeID_TableID(_FlowID)_timestamp
‘min = metric_value
‘max’ = metric_value
‘avg’ = metric_value
HourlyInterfaceMetrics
MetricID_NodeID_TableID(_InterfaceID)_timestamp
‘min = metric_value
‘max’ = metric_value
‘avg’ = metric_value
HourlyQueueMetrics
MetricID_NodeID_TableID_PortID_QueueID_timestamp
‘min = metric_value
‘max’ = metric_value
‘avg’ = metric_value
HourlyGroupMetrics
MetricID_NodeID_GroupID(_GroupBucketID)_timestamp
HourlyMeterMetrics

MetricID_NodeID_GroupID(_MeterID)_timestamp
For performance consideration, we design
multiple aggregation tables with different
granularity.

‘min = metric_value
‘max’ = metric_value
‘avg’ = metric_value
‘min = metric_value
‘max’ = metric_value
‘avg’ = metric_value
Aggregation tables with different granularity will
have similar schema as displayed above
www.opendaylight.org
11
HBase TSDR Data Model

TSDR HBase Plugin
converts the generic
TSDR data model into
HBase specific data
model based on HBase
schema design.

TSDR HBase Plugin
leverages this HBase
specific data model to
implement the generic

TSDR Persistence APIs
including storage, query,
purging, and aggregation
to complete the TSDR
data services in HBase.
www.opendaylight.org
12
TSDR Scope in Lithium
In the Lithium release, we will focus on the following deliverables:

Architectural framework




TSDR Integrated Architecture
HBase on Hadoop single node
deployment scenario

Data Collection
Data Storage
Data Model implementation


TSDR Data Model to support
OpenFlow Statistics
HBase Data Model for HBase Plugin
implementation
Data Type Support


OpenFlow Statistics
Data Collection mechanisms


Functionality implementation


as specified in the architectural
design
Deployment scenarios support


Implement Pub/Sub collection
mechanism
Data Persistence Layer


Complete TSDR Persistence
APIs with interface definition
TSDR Plugin


HBase plugin as an example
implementation
Focus on the storage API
implementation in HBase plugin to
support Data Storage Service in Lithium
www.opendaylight.org
13