View File

Real-Time Data Triage: A SME Approach to General
Purpose Analytics
September 2015
All questions and enquiries regarding this white paper should be directed to:
Adam Gerhart
Assistant Director
[email protected]
Dan Cybulski
Chief Technologist
[email protected]
Cognitio Corp
September 2015 —Page ii
Table of Contents
Background ........................................................................................................................... 3
Market Gaps ......................................................................................................................... 3
Transformation ........................................................................................................................ 3
Analytics................................................................................................................................... 4
Visualization............................................................................................................................. 4
Store and Analyze .................................................................................................................... 4
Specialized Personnel............................................................................................................... 4
Licensing .................................................................................................................................. 5
Immediate Insight ................................................................................................................. 5
Rich Analytics ........................................................................................................................... 5
Flexible Ingest .......................................................................................................................... 5
User Interaction ....................................................................................................................... 6
One-Click Analytics .................................................................................................................. 6
Customized Tagging ................................................................................................................ 6
Data Router ............................................................................................................................. 6
Action Scripts ........................................................................................................................... 6
Workflow ................................................................................................................................. 7
Summary ............................................................................................................................... 8
Next Steps ............................................................................................................................. 7
Astroturfing / Bot Detection .................................................................................................... 7
Community Clustering ............................................................................................................. 7
Temporal Message Resonation ............................................................................................... 8
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.
Cognitio Corp
September 2015 —Page 3
Background
Commercial and Government organizations are wrestling with the challenges revolving around
ad-hoc and real-time data analytics. Deriving timely and actionable insight from your data is
key to maintaining a competitive advantage and achieving mission goals. Despite this
importance and focus, acquiring insight is becoming increasingly difficult as the volume, variety
and velocity of available data sources continues to explode. The prevailing opinion is to ship all
data to prized Data Scientists and let them sort it out. While these specialists are able to
manage and derive value from mountains of data, they are only a piece of the puzzle and they
often lack the domain specific knowledge that accelerate exploration efforts. A perfect solution
would enable Subject Matter Experts (SME) to gain access as quickly as possible, while deftly
managing the ‘Big Data’ problem.
To address the SME disconnect, organizations most often look to general-purpose analytic tools
to rapidly triage and correlate incoming data. By making the data quickly accessible to everyone
they are able to derive value from perishable data sources while also driving more specialized
downstream analytic capabilities. In today’s market, the number of general-purpose analytic
tools vying for market share is staggering. Whether an organization is looking for a commercial
or open source solution, there are truly too many to evaluate.
This paper will illustrate how the Immediate Insight platform from Firemon can help
organizations overcome the limitations and gaps inherent to the current analytic market. In
doing so, it will highlight how Immediate Insight can provide real-time analysis and data triage,
empowering SME’s to apply their expertise to deriving value from perishable data while also
freeing up data science resources to focus on broader downstream analytics. Lastly, this paper
will demonstrate several novel applications for the analytic and real-time intelligence
capabilities of Immediate Insight.
Market Gaps
Analytical tools are an invaluable part of any organizations data analysis workflow. As such, in
addition to any custom analytic tools, most organizations employ a combination of general
purpose commercial or open source tools to perform preliminary data triage and basic analysis
of incoming data for real-time alerting and filtering purposes. Out of the box these general
purpose tools are expected to provide basic transformation, analytics, and visualization
capabilities. While there are countless tools in the market claiming to seamlessly provide these
most basic of features, nearly all of them fall short in one or several key areas leading to
functional gaps in the analytic market.
Transformation
One of the most significant issues faced when deploying commercial and open source analytic
tools is the walled garden approach to data transformation. In order to analyze data, many
market solutions in this space require that the data meet strict, sometimes proprietary,
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.
Cognitio Corp
September 2015 —Page 4
formatting requirements before ingestion occurs. Achieving this requires that the data be
extracted from its original format, transformed into a format required by the tool, and loaded
into the tool. This means that organizations must pre-process data before it can be applied
their analytic tools for triage and value assessment. This transformation process, referred to as
ETL (Extract, Transform, Load), is a key component of the analytic workflow, and it must be
reliable and repeatable in order to preserve the fidelity of the source data while also ensuring
the accuracy of derivative analyses. Performing this pre-processing, or transformation, requires
that organizations invest in additional tools and dedicated resources to develop and maintain
custom ETL routines. Moreover, each time a new data source is introduced, these ETL tools and
routines must be modified, or new ones created, to accommodate them. This need for tool
specific data transformation causes significant inefficiency that impacts the overall timeline,
infrastructure, and personnel costs of an analysis project.
Analytics
All analytic tools include a base set of analytic capabilities focused on helping users make sense
of their data. These analytics enable exploration and provide less technical users a means to
interact with valuable data sets. Unfortunately, the analytics included in most tools are focused
on a pre-defined set of reporting capabilities based on simple term frequency analysis and
Boolean search strings. These basic analytic capabilities rarely provide unique insights, and
more often than not leave the users wanting more. This can lead to significant frustration as
broader and deeper interaction necessitates custom development or additional external tools
to implement mission specific data knowledge.
Visualization
Visualizations tend to be an area of great importance to analytic users. The ability to distill
complex information into the simplest form in order to communicate analytic results is key in
any organization. However, visualization capabilities are continually under delivered by toolsets
as vendors focus on the technical backbone and proprietary feature sets. This lack of
visualization capabilities limits the development of interactive analytic capabilities and forces
organizations to implement complex data flows, or connectors to external visualization tools.
Store and Analyze
Most analytic tools in this market leverage a store and analyze approach where data is ingested
into indexes that the tool maintains, and then read back in to apply analytics. This inefficient
approach can lead to significant data churn and place increased demands on the underlying
infrastructure leading to increased resource demands and performance bottlenecks. Moreover,
by not applying any analytics “on the wire,” these tools delay the delivery time for data usability
and insight discovery.
Specialized Personnel
A key barrier of entry for current generation analytical tools, both commercial and open source,
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.
Cognitio Corp
September 2015 —Page 5
is that they often require specialized technical skill sets held by Data Scientists and Information
Technology personnel in order to derive value.
Many solutions rely on complex
hardware/software deployment requirements and implement proprietary query languages.
This learning curve hinders less technical analysts from exploring the data to its full potential
and greatly increases the time and costs associated with deriving insight.
Licensing
Complicated and often exorbitant licensing models plague traditional commercial software in
the analytic space. More often than not these tools are licensed based on the number of users
and/or volume of incoming data, which drives organizations to limit tool access or perform preingest filtering to avoid volume based license violations. While these commercial tools can
provide excellent features and functionality, the pricing and licensing restrictions often limits
the value that an organization can realize from their investment.
Immediate Insight
Immediate Insight brings a unique platform to the analytical market, providing SME’s and data
scientists alike with real-time access to incoming data, coupled with a broad set of native
analytic capabilities that empower rapid exploration and triage of new data. Immediate Insight
is designed to automate as much of the data aggregation and analytic process as possible,
resulting in a system that is simple to use without the need of extensive analytic or technical
skills to derive meaning and value from your data.
Rich Analytics
One of the biggest differentiators that Immediate Insight brings to the table is the built in Entity
Extraction for network related data (i.e. IPs, Locations, hostnames, etc). While the use case for
those entities is limited, by utilizing Immediate Insight’s Action Script capabilities, other entity
types can be easily extracted and applied for later analysis. These entities can then be coupled
with Immediate Insight’s Reputation system to match event components, such as user name or
screen name, across datasets.
Flexible Ingest
Immediate Insight implements a broad suite of data ingestion capabilities that simplify the
process of integrating new data into the analytic framework. This means that not only can
IMMEDIATE INSIGHT manage different formats of data, but the actual transfer mechanism can
be managed through the file system, across the network, or streamed directly to a specified
port. The provided flexibility of data import fits well into existing infrastructures without the
need to modify existing systems.
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.
Cognitio Corp
September 2015 —Page 6
User Interaction
Immediate Insight’s ease of use is a result of keeping human/machine interaction in mind. This
is achieved by designing searches around natural language. The implementation of relative
date ranges in one example of the user focused design. No longer must a user input a known
date range, or worse, convert to a predefined timestamp. Immediate Insight resolves this issue
utilizing relative dates and human language durations.
One-Click Analytics
Extending on usability is the concept of predefined or ‘One-Click’ analytics. Immediate Insight
understands the limitations of user-defined dashboards, which are set up statically under
established criteria. By enabling a user to quickly explore data and trends while they are
happening, they avoid missing events of interest that do not match their prediction models, all
without having to wait for analytics post factum. Immediate Insight includes generalized
implementations of commonly desired relationships that users are most interested in when
exploring their data.
Examples include ‘More/Fewer like this’, which matches and compares other records based on
similar data signatures; ‘Trending Up/Down’, which displays the temporal trend of similar
records; and ‘Most Common/Unusual’, which highlights common and uncommon records for
the currently selected dataset.
Customized Tagging
Collaboration is a key part of the analytic process. Once datasets are culled and analytical value
is derived, it is common for users to share their insights. Immediate Insight natively enables this
collaboration by providing a customizable tagging system. Through tagging, experts can attach
comments to events or groups of events to inform other users of the system, or even configure
Immediate Insight to automatically tag future records that match specific criteria for easy
identification.
Data Router
Immediate Insight’s analytic capabilities are powerful, but are of little use without your data in
the system. Immediate Insight provides an incredibly verbose and flexible Extract, Transform
and Load (ETL) system that easily ingests known data types while providing data owners with
the ability to add customized User Defined Functions (UDFs). All of the data management is
handled by the Data Router, which allows the system to automatically perform a desired action
based on user defined criteria. These actions include: tag, delete, alert, learn, execute custom
scripts, develop data feeds, and trigger defined workflow stages.
Action Scripts
The ability to incorporate domain specific knowledge into the analytic workflow is key for any
organization. As such, Immediate Insight provides a standard mechanism for allowing SME’s
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.
Cognitio Corp
September 2015 —Page 7
and data owners to introduce customized ingest time transforms. When the default list of
actions available in Immediate Insight needs to be extended with custom UDFs, data managers
can create Action Scripts. Action Scripts are simple JavaScript routines that are executed over
each record of the matching Data Router criteria. Among other things these scripts are useful
for parsing additional record fields, defining entities for automated analytics, or conditionally
applying tags.
Workflow
Immediate Insight also incorporates a workflow for data management, routing, and alerts.
Users are presented with a flow chart diagram where they may define States (actions) and
Transitions (criteria). By associating customized States and Transitions, users area able to
define a workflow and specify criteria and actions as data proceeds through. An example of this
utility is creating an alert for specific personnel via email when events matching criteria
(‘terrorist AND bomb’) occur.
Next Steps
The rich set of out of the box analytic capabilities included in the Immediate Insight platform
enable organizations to derive immediate value with little to no customization, while its
extreme flexibility allows organizations to implement the necessary customization to realize
additional mission specific value. With Immediate Insight’s human-focused, data-driven
approach, it can easily be applied to disparate data sets. Nearly any entity-based dataset can
be parsed and analyzed. This lends Immediate Insight to be exploitable in a variety of
specialized analytic fields. Some key areas where Immediate Insight capabilities could be easily
exploited may include Astroturfing Detection, Community Clustering, and Temporal Message
Resonation.
Astroturfing / Bot Detection
Astroturfing is the attempt to manufacture support that appears to be a grassroots effort.
These attempts are typically carried out by programmable bots that try to emulate human
behavior while spreading a consistent message. By applying the ‘More Like This’ analytic to a
specific record it seem quite possible to easily create an entity list of unique users that are
proliferating identical information. Since the ‘More Like This’ algorithm matches records with
the same event/profile information, it can be hypothesized that bots under similar control
would resonate with the same messages. This is especially true on a re-messaging focused
platform such as Twitter.
Community Clustering
Programatic clustering is the automated attempt to correctly group individuals or organizations
by their connections. An example of this would be, solely by analyzing frequent
communications, identifying who belongs to an individuals social, family, and professional
groups. ‘Frequent Entities’, coupled with Entity search, may do this by providing a list of co-
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.
Cognitio Corp
September 2015 —Page 8
occurring properties across entities. If these user-defined properties are set up to be group
memberships, then selecting one of Immediate Insight’s Frequent Entities will yield a list of all
events with the selected membership.
Temporal Message Resonation
By viewing the volume of an identical message over time, one could assess how prolific or
popular a single idea changes due to external circumstances. Immediate Insight’s Trending
analytic shows change in common events between two time frames. Analysts can use this
built-in functionality to monitor the rise and fall of specific message frequency over time.
Summary
Every organization today is seeking to use data and analytics to derive insight and gain a
competitive edge. This has driven organizations to demand ad-hoc and real-time analysis tools
to meet these needs, leading to a crowded market of undifferentiated, commercial and open
source general-purpose analysis solutions. Within this market, most of the tools suffer from one
or more key functional gaps that can introduce significant inefficiency into an organizations
analytic process. However, with the Immediate Insight platform it is possible for organizations
to gain real-time insight across their heterogeneous data sources without the need for
complicated pre-analyses data transformation or time consuming pre-analysis indexing. With
Immediate Insight, a powerful yet easy to use set of analytic capabilities can be applied to data
in real-time, allowing subject matter experts to apply their expertise and interrogate data as it
arrives. The simplicity and real-time capabilities of the Immediate Insight platform enable
organizations to focus their assets on deriving intelligence from data, rather than on the
management of a cumbersome, inefficient analytic pipeline.
For more information on Immediate Insight, or to start a free trial and test these capabilities for
yourself visit: https://www.firemon.com/free-evaluation/
© 2015 Cognitio Corp and/or its affiliates. All rights reserved.