Real-Time Data Triage: A SME Approach to General Purpose Analytics September 2015 All questions and enquiries regarding this white paper should be directed to: Adam Gerhart Assistant Director [email protected] Dan Cybulski Chief Technologist [email protected] Cognitio Corp September 2015 —Page ii Table of Contents Background ........................................................................................................................... 3 Market Gaps ......................................................................................................................... 3 Transformation ........................................................................................................................ 3 Analytics................................................................................................................................... 4 Visualization............................................................................................................................. 4 Store and Analyze .................................................................................................................... 4 Specialized Personnel............................................................................................................... 4 Licensing .................................................................................................................................. 5 Immediate Insight ................................................................................................................. 5 Rich Analytics ........................................................................................................................... 5 Flexible Ingest .......................................................................................................................... 5 User Interaction ....................................................................................................................... 6 One-Click Analytics .................................................................................................................. 6 Customized Tagging ................................................................................................................ 6 Data Router ............................................................................................................................. 6 Action Scripts ........................................................................................................................... 6 Workflow ................................................................................................................................. 7 Summary ............................................................................................................................... 8 Next Steps ............................................................................................................................. 7 Astroturfing / Bot Detection .................................................................................................... 7 Community Clustering ............................................................................................................. 7 Temporal Message Resonation ............................................................................................... 8 © 2015 Cognitio Corp and/or its affiliates. All rights reserved. Cognitio Corp September 2015 —Page 3 Background Commercial and Government organizations are wrestling with the challenges revolving around ad-hoc and real-time data analytics. Deriving timely and actionable insight from your data is key to maintaining a competitive advantage and achieving mission goals. Despite this importance and focus, acquiring insight is becoming increasingly difficult as the volume, variety and velocity of available data sources continues to explode. The prevailing opinion is to ship all data to prized Data Scientists and let them sort it out. While these specialists are able to manage and derive value from mountains of data, they are only a piece of the puzzle and they often lack the domain specific knowledge that accelerate exploration efforts. A perfect solution would enable Subject Matter Experts (SME) to gain access as quickly as possible, while deftly managing the ‘Big Data’ problem. To address the SME disconnect, organizations most often look to general-purpose analytic tools to rapidly triage and correlate incoming data. By making the data quickly accessible to everyone they are able to derive value from perishable data sources while also driving more specialized downstream analytic capabilities. In today’s market, the number of general-purpose analytic tools vying for market share is staggering. Whether an organization is looking for a commercial or open source solution, there are truly too many to evaluate. This paper will illustrate how the Immediate Insight platform from Firemon can help organizations overcome the limitations and gaps inherent to the current analytic market. In doing so, it will highlight how Immediate Insight can provide real-time analysis and data triage, empowering SME’s to apply their expertise to deriving value from perishable data while also freeing up data science resources to focus on broader downstream analytics. Lastly, this paper will demonstrate several novel applications for the analytic and real-time intelligence capabilities of Immediate Insight. Market Gaps Analytical tools are an invaluable part of any organizations data analysis workflow. As such, in addition to any custom analytic tools, most organizations employ a combination of general purpose commercial or open source tools to perform preliminary data triage and basic analysis of incoming data for real-time alerting and filtering purposes. Out of the box these general purpose tools are expected to provide basic transformation, analytics, and visualization capabilities. While there are countless tools in the market claiming to seamlessly provide these most basic of features, nearly all of them fall short in one or several key areas leading to functional gaps in the analytic market. Transformation One of the most significant issues faced when deploying commercial and open source analytic tools is the walled garden approach to data transformation. In order to analyze data, many market solutions in this space require that the data meet strict, sometimes proprietary, © 2015 Cognitio Corp and/or its affiliates. All rights reserved. Cognitio Corp September 2015 —Page 4 formatting requirements before ingestion occurs. Achieving this requires that the data be extracted from its original format, transformed into a format required by the tool, and loaded into the tool. This means that organizations must pre-process data before it can be applied their analytic tools for triage and value assessment. This transformation process, referred to as ETL (Extract, Transform, Load), is a key component of the analytic workflow, and it must be reliable and repeatable in order to preserve the fidelity of the source data while also ensuring the accuracy of derivative analyses. Performing this pre-processing, or transformation, requires that organizations invest in additional tools and dedicated resources to develop and maintain custom ETL routines. Moreover, each time a new data source is introduced, these ETL tools and routines must be modified, or new ones created, to accommodate them. This need for tool specific data transformation causes significant inefficiency that impacts the overall timeline, infrastructure, and personnel costs of an analysis project. Analytics All analytic tools include a base set of analytic capabilities focused on helping users make sense of their data. These analytics enable exploration and provide less technical users a means to interact with valuable data sets. Unfortunately, the analytics included in most tools are focused on a pre-defined set of reporting capabilities based on simple term frequency analysis and Boolean search strings. These basic analytic capabilities rarely provide unique insights, and more often than not leave the users wanting more. This can lead to significant frustration as broader and deeper interaction necessitates custom development or additional external tools to implement mission specific data knowledge. Visualization Visualizations tend to be an area of great importance to analytic users. The ability to distill complex information into the simplest form in order to communicate analytic results is key in any organization. However, visualization capabilities are continually under delivered by toolsets as vendors focus on the technical backbone and proprietary feature sets. This lack of visualization capabilities limits the development of interactive analytic capabilities and forces organizations to implement complex data flows, or connectors to external visualization tools. Store and Analyze Most analytic tools in this market leverage a store and analyze approach where data is ingested into indexes that the tool maintains, and then read back in to apply analytics. This inefficient approach can lead to significant data churn and place increased demands on the underlying infrastructure leading to increased resource demands and performance bottlenecks. Moreover, by not applying any analytics “on the wire,” these tools delay the delivery time for data usability and insight discovery. Specialized Personnel A key barrier of entry for current generation analytical tools, both commercial and open source, © 2015 Cognitio Corp and/or its affiliates. All rights reserved. Cognitio Corp September 2015 —Page 5 is that they often require specialized technical skill sets held by Data Scientists and Information Technology personnel in order to derive value. Many solutions rely on complex hardware/software deployment requirements and implement proprietary query languages. This learning curve hinders less technical analysts from exploring the data to its full potential and greatly increases the time and costs associated with deriving insight. Licensing Complicated and often exorbitant licensing models plague traditional commercial software in the analytic space. More often than not these tools are licensed based on the number of users and/or volume of incoming data, which drives organizations to limit tool access or perform preingest filtering to avoid volume based license violations. While these commercial tools can provide excellent features and functionality, the pricing and licensing restrictions often limits the value that an organization can realize from their investment. Immediate Insight Immediate Insight brings a unique platform to the analytical market, providing SME’s and data scientists alike with real-time access to incoming data, coupled with a broad set of native analytic capabilities that empower rapid exploration and triage of new data. Immediate Insight is designed to automate as much of the data aggregation and analytic process as possible, resulting in a system that is simple to use without the need of extensive analytic or technical skills to derive meaning and value from your data. Rich Analytics One of the biggest differentiators that Immediate Insight brings to the table is the built in Entity Extraction for network related data (i.e. IPs, Locations, hostnames, etc). While the use case for those entities is limited, by utilizing Immediate Insight’s Action Script capabilities, other entity types can be easily extracted and applied for later analysis. These entities can then be coupled with Immediate Insight’s Reputation system to match event components, such as user name or screen name, across datasets. Flexible Ingest Immediate Insight implements a broad suite of data ingestion capabilities that simplify the process of integrating new data into the analytic framework. This means that not only can IMMEDIATE INSIGHT manage different formats of data, but the actual transfer mechanism can be managed through the file system, across the network, or streamed directly to a specified port. The provided flexibility of data import fits well into existing infrastructures without the need to modify existing systems. © 2015 Cognitio Corp and/or its affiliates. All rights reserved. Cognitio Corp September 2015 —Page 6 User Interaction Immediate Insight’s ease of use is a result of keeping human/machine interaction in mind. This is achieved by designing searches around natural language. The implementation of relative date ranges in one example of the user focused design. No longer must a user input a known date range, or worse, convert to a predefined timestamp. Immediate Insight resolves this issue utilizing relative dates and human language durations. One-Click Analytics Extending on usability is the concept of predefined or ‘One-Click’ analytics. Immediate Insight understands the limitations of user-defined dashboards, which are set up statically under established criteria. By enabling a user to quickly explore data and trends while they are happening, they avoid missing events of interest that do not match their prediction models, all without having to wait for analytics post factum. Immediate Insight includes generalized implementations of commonly desired relationships that users are most interested in when exploring their data. Examples include ‘More/Fewer like this’, which matches and compares other records based on similar data signatures; ‘Trending Up/Down’, which displays the temporal trend of similar records; and ‘Most Common/Unusual’, which highlights common and uncommon records for the currently selected dataset. Customized Tagging Collaboration is a key part of the analytic process. Once datasets are culled and analytical value is derived, it is common for users to share their insights. Immediate Insight natively enables this collaboration by providing a customizable tagging system. Through tagging, experts can attach comments to events or groups of events to inform other users of the system, or even configure Immediate Insight to automatically tag future records that match specific criteria for easy identification. Data Router Immediate Insight’s analytic capabilities are powerful, but are of little use without your data in the system. Immediate Insight provides an incredibly verbose and flexible Extract, Transform and Load (ETL) system that easily ingests known data types while providing data owners with the ability to add customized User Defined Functions (UDFs). All of the data management is handled by the Data Router, which allows the system to automatically perform a desired action based on user defined criteria. These actions include: tag, delete, alert, learn, execute custom scripts, develop data feeds, and trigger defined workflow stages. Action Scripts The ability to incorporate domain specific knowledge into the analytic workflow is key for any organization. As such, Immediate Insight provides a standard mechanism for allowing SME’s © 2015 Cognitio Corp and/or its affiliates. All rights reserved. Cognitio Corp September 2015 —Page 7 and data owners to introduce customized ingest time transforms. When the default list of actions available in Immediate Insight needs to be extended with custom UDFs, data managers can create Action Scripts. Action Scripts are simple JavaScript routines that are executed over each record of the matching Data Router criteria. Among other things these scripts are useful for parsing additional record fields, defining entities for automated analytics, or conditionally applying tags. Workflow Immediate Insight also incorporates a workflow for data management, routing, and alerts. Users are presented with a flow chart diagram where they may define States (actions) and Transitions (criteria). By associating customized States and Transitions, users area able to define a workflow and specify criteria and actions as data proceeds through. An example of this utility is creating an alert for specific personnel via email when events matching criteria (‘terrorist AND bomb’) occur. Next Steps The rich set of out of the box analytic capabilities included in the Immediate Insight platform enable organizations to derive immediate value with little to no customization, while its extreme flexibility allows organizations to implement the necessary customization to realize additional mission specific value. With Immediate Insight’s human-focused, data-driven approach, it can easily be applied to disparate data sets. Nearly any entity-based dataset can be parsed and analyzed. This lends Immediate Insight to be exploitable in a variety of specialized analytic fields. Some key areas where Immediate Insight capabilities could be easily exploited may include Astroturfing Detection, Community Clustering, and Temporal Message Resonation. Astroturfing / Bot Detection Astroturfing is the attempt to manufacture support that appears to be a grassroots effort. These attempts are typically carried out by programmable bots that try to emulate human behavior while spreading a consistent message. By applying the ‘More Like This’ analytic to a specific record it seem quite possible to easily create an entity list of unique users that are proliferating identical information. Since the ‘More Like This’ algorithm matches records with the same event/profile information, it can be hypothesized that bots under similar control would resonate with the same messages. This is especially true on a re-messaging focused platform such as Twitter. Community Clustering Programatic clustering is the automated attempt to correctly group individuals or organizations by their connections. An example of this would be, solely by analyzing frequent communications, identifying who belongs to an individuals social, family, and professional groups. ‘Frequent Entities’, coupled with Entity search, may do this by providing a list of co- © 2015 Cognitio Corp and/or its affiliates. All rights reserved. Cognitio Corp September 2015 —Page 8 occurring properties across entities. If these user-defined properties are set up to be group memberships, then selecting one of Immediate Insight’s Frequent Entities will yield a list of all events with the selected membership. Temporal Message Resonation By viewing the volume of an identical message over time, one could assess how prolific or popular a single idea changes due to external circumstances. Immediate Insight’s Trending analytic shows change in common events between two time frames. Analysts can use this built-in functionality to monitor the rise and fall of specific message frequency over time. Summary Every organization today is seeking to use data and analytics to derive insight and gain a competitive edge. This has driven organizations to demand ad-hoc and real-time analysis tools to meet these needs, leading to a crowded market of undifferentiated, commercial and open source general-purpose analysis solutions. Within this market, most of the tools suffer from one or more key functional gaps that can introduce significant inefficiency into an organizations analytic process. However, with the Immediate Insight platform it is possible for organizations to gain real-time insight across their heterogeneous data sources without the need for complicated pre-analyses data transformation or time consuming pre-analysis indexing. With Immediate Insight, a powerful yet easy to use set of analytic capabilities can be applied to data in real-time, allowing subject matter experts to apply their expertise and interrogate data as it arrives. The simplicity and real-time capabilities of the Immediate Insight platform enable organizations to focus their assets on deriving intelligence from data, rather than on the management of a cumbersome, inefficient analytic pipeline. For more information on Immediate Insight, or to start a free trial and test these capabilities for yourself visit: https://www.firemon.com/free-evaluation/ © 2015 Cognitio Corp and/or its affiliates. All rights reserved.
© Copyright 2026 Paperzz