Analytical Sandbox

WHITE PAPER
Analytics Best Practices:
The Analytical Sandbox
Sponsored by:
Composite Software
www.compositesw.com
Rick Sherman
Athena IT Solutions
CONTENTS
INTRODUCTION ................................................................................................................................................ 2
SECTION 1: BUSINESS NEED ............................................................................................................................. 2
SECTION 2: DEFINITION.................................................................................................................................... 3
SECTION 3: ARCHITECTURE DESIGN PRINCIPLES ............................................................................................ 4
SECTION 4: ARCHITECTURE OPTIONS ............................................................................................................. 5
Business Analytics ........................................................................................................................................ 5
Sandbox Platform......................................................................................................................................... 6
Data Access and Integration ........................................................................................................................ 6
SECTION 5: ADVICE .......................................................................................................................................... 8
INTRODUCTION
The whitepaper “A Better Way to Fuel Analytical Needs” discussed the key inhibitors to implementing
analytics and enabling self-service business intelligence (BI). It made four key recommendations for
overcoming the barriers to pervasive and self-service BI:
1.
2.
3.
4.
Establish an overall data-integration portfolio
Add data virtualization to the data integration portfolio
Differentiate analytical discovery from recurring business analysis
Create self-service data environments for self-service BI
In the fourth recommendation, two architectural frameworks, analytical sandboxes and analytical hubs,
were mentioned as the foundation to create self-service data environments for self-service BI. The
purpose of this paper is to focus on the specific business needs and technology solutions for implementing
analytical sandboxes.
SECTION 1: BUSINESS NEED
Enterprises are flooded with a deluge of data about their customers, prospects, business processes,
suppliers, partners and competitors. It comes from traditional internal systems, cloud applications, social
networking and mobile communications. With the flood of new data comes the opportunity for business
people to perform new types of analysis to gain greater insight into their business and customers.
The opportunity, however, comes with new challenges. Performing business analytics used to mean using
pre-defined reports. But now, with the flood of data and constantly-changing business environment,
people don’t know what they need ahead of time, so pre-defined reports aren’t relevant. Instead, people
need to make new queries based on what is happening right here, right now. As a result, business
analytics has to be “situational,” that is, it needs to respond to rapid changes in the business, economic
and competitive environment.
The change in analytics means changes for IT. Traditionally, IT received detailed BI requirements and then
created reports. Because this approach is not meeting the needs of discovery and situational analytics, we
need a new approach. The answer is analytical sandboxes, a new paradigm that address the multiplequery challenges of situational business analytics and avoids the pitfalls of the makeshift data shadow
systems.
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 2
SECTION 2: DEFINITION
The goal of an analytical sandbox is to enable business people to conduct discovery and situational
analytics. This platform is targeted for business analysts and “power users” who are the go-to people that
the entire business group uses when they need reporting help and answers. This target group is the
analytical elite of the enterprise.
The analytical elite have been building their own makeshift sandboxes, referred to as data shadow
systems or spreadmarts. The intent of the analytical sandbox is to provide the dedicated storage, tools and
processing resources to eliminate the need for the data shadow systems.
The key components of an analytical sandbox (Figure 1: Analytical Sandbox - Functional Layers) are:




Business analytics - contains the self-service BI tools used for discovery and situational analysis
Analytical sandbox platform - provides the processing, storage and networking capabilities
Data access and delivery - enables the gathering and integration of data from a variety of data
sources and data types
Data sources – sourced from within and outside the enterprise, it can be big data (unstructured)
and transactional data (structured); e.g., extracts, feeds, messages, spreadsheets and documents.
Figure 1: Analytical Sandbox - Functional Layers
Data commonly comes from the enterprise data warehouse or a specific business application, but it can
even come a spreadsheet used in another analysis, or from outside the enterprise.
Today, those data sources can be physically local, virtual or in the cloud. Earlier attempts to source data
from these types of environments required exploratory data marts or OLAP cubes, but were thwarted by
big-data integration and BI backlogs, so people created were forced to create data shadow systems.
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 3
SECTION 3: ARCHITECTURE DESIGN PRINCIPLES
When creating analytical sandboxes for business users, follow these design principles to provide the right
environment for an enterprise:
•
Data across the enterprise needs to be accessible and timely
Business analytics is inhibited by the difficulty in accessing data across an enterprise and by the length
of time it takes to get that data integrated. Business needs to operate and react to constantly
changing conditions, so timely access to data scattered across an enterprise is necessary to make
more informed decisions, even if the data in not “perfect.” The analytical sandbox needs to enable
timely access across data silos and provide business people with an integrated view of the best data
that is available at the time of analysis. This view will be a mix of physically or virtually integrated data
to expedite time-to-analysis and avoid the productivity and error-prone trap of data shadow systems.
•
Time-to-solution must be fast and disposable
Today’s competitive business environment and fluctuating economy are putting the pressure on
businesses to make fast, smart decisions. Analysts using the sandbox need to be able to gather the
data, combine it, analyze it, and then act upon the resulting insights -- fast. The analytical elites can no
longer accept analysis that is delayed by days, weeks or even months as they wait for their requests to
make it through BI and data-integration backlogs. Tools, data and infrastructure need to be
architected to ensure that ad-hoc analysis can take place when it is needed by the business.
•
The business analyst needs to be “in control”
IT has traditionally managed the data and application environments. In this custodial role, IT has
controlled access and has gone through a rigorous process to ensure that data is managed and
integrated as an enterprise asset. The time has arrived when business analytical elite need to assume
data ownership, get access to data from across the enterprise, and augment that data with other data
that that they feel is appropriate -- all of their own volition. This does not mean they should abandon
data governance and data quality efforts, rather, they should leverage them in the proper business
context, i.e. when recurring, production-quality information is necessary.
•
Sufficient infrastructure must be available for conducting business analytics
The infrastructure for an analytical sandbox includes:
 Processing, such as PCs and servers (physical, virtual and cloud)
 Storage (physical and cloud)
 Integration capabilities (physical and virtual)
 Self-service BI tool(s)
This infrastructure must be scalable and expandable as the data volumes, integration needs and
analytical complexities naturally increase. Insufficient infrastructure has historically limited the depth,
breadth and timeliness of analytics as business people used their PC and spreadsheets to fill shortfalls.
•
Solutions must be cost- and resource-effective
All enterprises need to operate within budgetary and resource constraints, whatever that means for
their size and industry. The solution should be to be right-sized to meet the enterprise’s data and
analytical needs along with the resources and skills that will sustain the solution.
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 4
SECTION 4: ARCHITECTURE OPTIONS
The overall analytical sandbox is depicted in “Figure 2: Analytical Sandbox - Architecture” with its three
layers of business analytics, sandbox platform and data access and integration connecting to a variety of
data sources. Architectural options are outlined for each layer using the design principles above.
Figure 2: Analytical Sandbox - Architecture
Business Analytics
The goal of the business analytics layer is to provide the analytical tools to support self-service BI. The
technology selected in this layer needs to support the needs of business people who are in charge of their
own analytics, and not relying on IT to design reports or dashboards. Their analytical styles and the BI
platforms are important considerations:
•
Multiple BI analytical styles
Business people use different analytical styles depending on the type of analysis they are performing,
the data volume and variety, and their skills. Analytical styles include: data visualization, data
discovery, On-Line Analytical Processing (OLAP), ad-hoc, dashboards, scorecards and reporting. It is
important to accommodate the business people’s various analytical styles and not force them to use a
style that limits their effectiveness.
One of the topics widely discussed in regards to providing multiple styles is whether it is best to get all
the tools from a single vendor or in a single BI suite. With the vendor and product landscape
constantly changing, along with enterprise preferences in regards to vendor selection, an enterprise
should make this choice based what is best-fit tool to deliver that functionality to its users.
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 5
•
Multiple BI delivery and access platforms
The analytical sandbox needs to provide access from and delivery to business analytics performed on
the desktop, in the cloud, on mobile devices (tablets and smartphones), and Microsoft Office
applications. This enables business people to perform their analysis on the most appropriate platform
for their needs.
Sandbox Platform
There are many architectural choices for hosting processing and storage capabilities. These include
analytical processing, in-memory business analytics and database options:
•
Analytical processing
o
BI appliances vs. traditional distributed servers
Analytical sandboxes typically start on traditional distributed servers that IT manages and
supports. Enterprises often deploy in this type of environment because it meets initial data and
processing needs, and because of their experience with these platforms. Depending on the
analytical sophistication and data volumes, a BI platform dedicated to deploying analytical
sandboxes may be the only platform capable of meeting these needs. Many of the advances in
hardware, database, BI and data integration processing have been used in the design of the BI
appliances. There is a wide variation in the underlying architectures, and an enterprise needs to
evaluate what best fits their need and budget.
o
On-premise vs. cloud infrastructure
Another architectural consideration is whether all the components of an analytical sandbox should
be on the traditional on-premise platform, or if some or all can be moved onto the cloud.
Historically, the cloud options have been limited, but that has dramatically changed. Often, cloud
components are seen as a cost- and resource-effective solution that speeds up time-to-solution.
•
In-memory business analytics
A significant advancement that has enabled more in-depth and speedier analytics has been leveraging
the advances in memory on the devices on which BI is performed, and on the BI server if it is part of
the architecture. In-memory analytics architectural options include in-memory analytics in the BI
tools, as part of the database or on the BI appliance platform.
•
Database options
The traditional database deployment option has been relational databases, but there are more
options available based on advances in technology and increased data variety. Options include:
o
o
o
Relational vs. columnar vs. others
Structured vs. unstructured (particularly Big Data)
Hybrid mix of above
Data Access and Integration
Business people typically perform data access and integration by accessing an application (silos) directly,
using a data warehouse, or with a combination, where they likely will use spreadsheets as the superglue
creating a data shadow system. Analytical sandboxes need to provide business people with the ability to
access, filter, augment and combine data from many sources and in many varieties from within and
outside their enterprise.
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 6
With self-service BI, the goal was truly shifting the analytical workload to the business. With data access
and integration, however, the goal is not self-service data integration, but rather empowerment. Typically,
data integration has emphasized physically integrating the data into a DW or another application. This has
proven to be very time consuming, resulting in significant backlogs and limiting business analytics. In
addition, business people have often been granted limited access to non-integrated data to protect them
from potential inconsistencies.
The data access and integration layer needs to empower the business people to get the data they need as
quickly as possible, recognizing that getting the best available data, even if not perfect, is better than
making a decision with incomplete data or by using a data shadow system.
There are several considerations for the architectural options of this layer:
•
Data access
The access options, provided that security and privacy requirements are met, include query sources
directly, data services, using local files and data virtualization. The first three alternatives are all pointto-point access where the business person must know about the source, secure access and then
navigate the source. Data virtualization (below) is an architectural option that creates a data source
catalog that can be saved, shared and documented for business analysts and augmented by the IT
staff.
•
Data filtering, aggregating, joining and metrics calculations
Today, business people rely on IT-built reporting fed by data-integration tools, and then use
spreadsheets to fill the gaps. Gathering requirements, designing and building the IT-built reports or
dashboards severely slows down the time-to-solution. The analytics sandbox leverages business
analytics tools, such as data discovery or data virtualization to enable the business analyst to perform
this functionality.
•
Augmenting enterprise data sources
Often, critical data to classify, filter and analyze is not available from enterprise sources, but may
require an external data feed or an import from another business group. The sandbox needs to
provide the storage and ability to extract that data, and then import it into the environment.
•
Data virtualization versus ETL (Extract, Transform & Load) data integration
Data integration, data management and building a consistent, clean and conformed data warehouse
will continue to be responsibility of IT group. The data-integration capability will expand beyond
traditional ETL to include data virtualization.
Data virtualization empowers business people in a couple of ways. First, it enables them to expand the
data used in their analysis without requiring that it be physically integrated. Second, they do not have
to get IT involved (via business requirements, data modeling, ETL and BI design) every time data needs
to be added. This iterative and agile approach supports data discovery more productively for both
business and IT.
Data virtualization eliminates the undocumented, overlapping and time-consuming point-to-point
direct access connections that business people got stuck doing in the past with their data shadow
systems. With data virtualization, IT and business people can add data sources into a repository that
will document them, identify relationships between sources and uses, and encourage reuse. To the
business analyst the virtualization repository provides an information catalog to the relevant data
needed for their analysis.
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 7
SECTION 5: ADVICE
To conclude, we offer some key advice for designing and operating analytical sandboxes that enables
the analytical elite to conduct their situational analysis quickly and then act upon their insights:
•
Build for the analytical elites, not the masses
The analytical elite, i.e. business analysts and “power” users, are the people who build data shadow
systems and spreadmarts. They are the go-to people when management needs answers, and they are
the people that IT goes to understand what the business masses, i.e. casual users, need. Trust them.
Give them the BI tools they want (not just what meets IT standards or controls), the data they request
(even if it not perfect) and the platform to do their analysis. And then get out of their way!
•
Create an enterprise data view
Business needs access to an enterprise view of its data. Based on reality, an enterprise will not be able
to physically integrate everything, nor should it. Leverage and expand an enterprise DW if you have
one, but the business will need to get data from many other sources, i.e. data silos. It is easy to give
business direct access to these data silos, but working with data shadow systems is likely to result in
inconsistent data and wasted time.
Embrace data virtualization and a hybrid data view mixing physically- and virtually-integrated data.
Virtualization enables business relationships and metrics to be built into the data view without having
to go through the lengthy ETL integration process. In addition, it enables you to include various data
types and data sources that should not be physically integrated.
•
Establish separate but complementary business and IT roles
Historically, IT has built the entire analytical solution, When that solution did not have the data that
the business needed or could not deliver it quickly enough, the analytical elites were forced to build
their own data shadow systems that included BI and data integration.
It is time to turn BI and analytics over to the analytical elites and let IT concentrate on data integration
and delivery. The first ingredient for successful self-service BI is an analyst with business knowledge
and analytical expertise. The second ingredient is IT that can enable self-service data to feed the
analyst.
•
Do not be afraid to try something new
The technologies and design approaches for business analytics and data integration are continually
evolving in terms of capabilities, scale and total cost of ownership. Also,the vendor landscape has
been vibrant with startups bringing new technologies to the market, while mergers and acquisitions
consolidate and expand existing product capabilities. To meet the demands of the analytical elite,
analytical sandboxes need to be designed differently than the standard production BI solution. Do not
be afraid to try new database, in-memory, virtualization and integration technologies from new
vendors. Meeting the needs of situational analytics is going to mean thinking “out of the box.”
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 8
About the Author:
Rick Sherman is the founder of Athena IT Solutions, a firm that provides business intelligence, data
integration and data warehouse consulting, training and vendor services. In addition to having more than
25 years of experience in BI solutions, Rick writes on IT topics and is a frequent speaker at industry events.
He blogs at The Data Doghouse and can be reached at [email protected].
For More Information:
For More Information:
To learn more about how Composite Software
can simplify information access at your
enterprise, please contact us.
To learn more about how Athena IT Solutions can
increase the success of your BI, data integration or
data warehouse project, please contact us.
[email protected]
Phone (650) 227-8200
Fax (650) 227-8199
www.compositesw.com
[email protected]
Phone (978) 897-3322
Fax (978) 461-0809
www.athena-solutions.com
Composite Software
2655 Campus Drive, Suite 200
San Mateo, CA 94403
Athena IT Solutions
Two Clock Tower Place, Suite 540
Maynard, MA 01754
Analytics Best Practices: The Analytical Sandbox
©2013 Athena IT Solutions
Page 9