WHITE PAPER Analytics Best Practices: The Analytical Sandbox Sponsored by: Composite Software www.compositesw.com Rick Sherman Athena IT Solutions CONTENTS INTRODUCTION ................................................................................................................................................ 2 SECTION 1: BUSINESS NEED ............................................................................................................................. 2 SECTION 2: DEFINITION.................................................................................................................................... 3 SECTION 3: ARCHITECTURE DESIGN PRINCIPLES ............................................................................................ 4 SECTION 4: ARCHITECTURE OPTIONS ............................................................................................................. 5 Business Analytics ........................................................................................................................................ 5 Sandbox Platform......................................................................................................................................... 6 Data Access and Integration ........................................................................................................................ 6 SECTION 5: ADVICE .......................................................................................................................................... 8 INTRODUCTION The whitepaper “A Better Way to Fuel Analytical Needs” discussed the key inhibitors to implementing analytics and enabling self-service business intelligence (BI). It made four key recommendations for overcoming the barriers to pervasive and self-service BI: 1. 2. 3. 4. Establish an overall data-integration portfolio Add data virtualization to the data integration portfolio Differentiate analytical discovery from recurring business analysis Create self-service data environments for self-service BI In the fourth recommendation, two architectural frameworks, analytical sandboxes and analytical hubs, were mentioned as the foundation to create self-service data environments for self-service BI. The purpose of this paper is to focus on the specific business needs and technology solutions for implementing analytical sandboxes. SECTION 1: BUSINESS NEED Enterprises are flooded with a deluge of data about their customers, prospects, business processes, suppliers, partners and competitors. It comes from traditional internal systems, cloud applications, social networking and mobile communications. With the flood of new data comes the opportunity for business people to perform new types of analysis to gain greater insight into their business and customers. The opportunity, however, comes with new challenges. Performing business analytics used to mean using pre-defined reports. But now, with the flood of data and constantly-changing business environment, people don’t know what they need ahead of time, so pre-defined reports aren’t relevant. Instead, people need to make new queries based on what is happening right here, right now. As a result, business analytics has to be “situational,” that is, it needs to respond to rapid changes in the business, economic and competitive environment. The change in analytics means changes for IT. Traditionally, IT received detailed BI requirements and then created reports. Because this approach is not meeting the needs of discovery and situational analytics, we need a new approach. The answer is analytical sandboxes, a new paradigm that address the multiplequery challenges of situational business analytics and avoids the pitfalls of the makeshift data shadow systems. Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 2 SECTION 2: DEFINITION The goal of an analytical sandbox is to enable business people to conduct discovery and situational analytics. This platform is targeted for business analysts and “power users” who are the go-to people that the entire business group uses when they need reporting help and answers. This target group is the analytical elite of the enterprise. The analytical elite have been building their own makeshift sandboxes, referred to as data shadow systems or spreadmarts. The intent of the analytical sandbox is to provide the dedicated storage, tools and processing resources to eliminate the need for the data shadow systems. The key components of an analytical sandbox (Figure 1: Analytical Sandbox - Functional Layers) are: Business analytics - contains the self-service BI tools used for discovery and situational analysis Analytical sandbox platform - provides the processing, storage and networking capabilities Data access and delivery - enables the gathering and integration of data from a variety of data sources and data types Data sources – sourced from within and outside the enterprise, it can be big data (unstructured) and transactional data (structured); e.g., extracts, feeds, messages, spreadsheets and documents. Figure 1: Analytical Sandbox - Functional Layers Data commonly comes from the enterprise data warehouse or a specific business application, but it can even come a spreadsheet used in another analysis, or from outside the enterprise. Today, those data sources can be physically local, virtual or in the cloud. Earlier attempts to source data from these types of environments required exploratory data marts or OLAP cubes, but were thwarted by big-data integration and BI backlogs, so people created were forced to create data shadow systems. Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 3 SECTION 3: ARCHITECTURE DESIGN PRINCIPLES When creating analytical sandboxes for business users, follow these design principles to provide the right environment for an enterprise: • Data across the enterprise needs to be accessible and timely Business analytics is inhibited by the difficulty in accessing data across an enterprise and by the length of time it takes to get that data integrated. Business needs to operate and react to constantly changing conditions, so timely access to data scattered across an enterprise is necessary to make more informed decisions, even if the data in not “perfect.” The analytical sandbox needs to enable timely access across data silos and provide business people with an integrated view of the best data that is available at the time of analysis. This view will be a mix of physically or virtually integrated data to expedite time-to-analysis and avoid the productivity and error-prone trap of data shadow systems. • Time-to-solution must be fast and disposable Today’s competitive business environment and fluctuating economy are putting the pressure on businesses to make fast, smart decisions. Analysts using the sandbox need to be able to gather the data, combine it, analyze it, and then act upon the resulting insights -- fast. The analytical elites can no longer accept analysis that is delayed by days, weeks or even months as they wait for their requests to make it through BI and data-integration backlogs. Tools, data and infrastructure need to be architected to ensure that ad-hoc analysis can take place when it is needed by the business. • The business analyst needs to be “in control” IT has traditionally managed the data and application environments. In this custodial role, IT has controlled access and has gone through a rigorous process to ensure that data is managed and integrated as an enterprise asset. The time has arrived when business analytical elite need to assume data ownership, get access to data from across the enterprise, and augment that data with other data that that they feel is appropriate -- all of their own volition. This does not mean they should abandon data governance and data quality efforts, rather, they should leverage them in the proper business context, i.e. when recurring, production-quality information is necessary. • Sufficient infrastructure must be available for conducting business analytics The infrastructure for an analytical sandbox includes: Processing, such as PCs and servers (physical, virtual and cloud) Storage (physical and cloud) Integration capabilities (physical and virtual) Self-service BI tool(s) This infrastructure must be scalable and expandable as the data volumes, integration needs and analytical complexities naturally increase. Insufficient infrastructure has historically limited the depth, breadth and timeliness of analytics as business people used their PC and spreadsheets to fill shortfalls. • Solutions must be cost- and resource-effective All enterprises need to operate within budgetary and resource constraints, whatever that means for their size and industry. The solution should be to be right-sized to meet the enterprise’s data and analytical needs along with the resources and skills that will sustain the solution. Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 4 SECTION 4: ARCHITECTURE OPTIONS The overall analytical sandbox is depicted in “Figure 2: Analytical Sandbox - Architecture” with its three layers of business analytics, sandbox platform and data access and integration connecting to a variety of data sources. Architectural options are outlined for each layer using the design principles above. Figure 2: Analytical Sandbox - Architecture Business Analytics The goal of the business analytics layer is to provide the analytical tools to support self-service BI. The technology selected in this layer needs to support the needs of business people who are in charge of their own analytics, and not relying on IT to design reports or dashboards. Their analytical styles and the BI platforms are important considerations: • Multiple BI analytical styles Business people use different analytical styles depending on the type of analysis they are performing, the data volume and variety, and their skills. Analytical styles include: data visualization, data discovery, On-Line Analytical Processing (OLAP), ad-hoc, dashboards, scorecards and reporting. It is important to accommodate the business people’s various analytical styles and not force them to use a style that limits their effectiveness. One of the topics widely discussed in regards to providing multiple styles is whether it is best to get all the tools from a single vendor or in a single BI suite. With the vendor and product landscape constantly changing, along with enterprise preferences in regards to vendor selection, an enterprise should make this choice based what is best-fit tool to deliver that functionality to its users. Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 5 • Multiple BI delivery and access platforms The analytical sandbox needs to provide access from and delivery to business analytics performed on the desktop, in the cloud, on mobile devices (tablets and smartphones), and Microsoft Office applications. This enables business people to perform their analysis on the most appropriate platform for their needs. Sandbox Platform There are many architectural choices for hosting processing and storage capabilities. These include analytical processing, in-memory business analytics and database options: • Analytical processing o BI appliances vs. traditional distributed servers Analytical sandboxes typically start on traditional distributed servers that IT manages and supports. Enterprises often deploy in this type of environment because it meets initial data and processing needs, and because of their experience with these platforms. Depending on the analytical sophistication and data volumes, a BI platform dedicated to deploying analytical sandboxes may be the only platform capable of meeting these needs. Many of the advances in hardware, database, BI and data integration processing have been used in the design of the BI appliances. There is a wide variation in the underlying architectures, and an enterprise needs to evaluate what best fits their need and budget. o On-premise vs. cloud infrastructure Another architectural consideration is whether all the components of an analytical sandbox should be on the traditional on-premise platform, or if some or all can be moved onto the cloud. Historically, the cloud options have been limited, but that has dramatically changed. Often, cloud components are seen as a cost- and resource-effective solution that speeds up time-to-solution. • In-memory business analytics A significant advancement that has enabled more in-depth and speedier analytics has been leveraging the advances in memory on the devices on which BI is performed, and on the BI server if it is part of the architecture. In-memory analytics architectural options include in-memory analytics in the BI tools, as part of the database or on the BI appliance platform. • Database options The traditional database deployment option has been relational databases, but there are more options available based on advances in technology and increased data variety. Options include: o o o Relational vs. columnar vs. others Structured vs. unstructured (particularly Big Data) Hybrid mix of above Data Access and Integration Business people typically perform data access and integration by accessing an application (silos) directly, using a data warehouse, or with a combination, where they likely will use spreadsheets as the superglue creating a data shadow system. Analytical sandboxes need to provide business people with the ability to access, filter, augment and combine data from many sources and in many varieties from within and outside their enterprise. Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 6 With self-service BI, the goal was truly shifting the analytical workload to the business. With data access and integration, however, the goal is not self-service data integration, but rather empowerment. Typically, data integration has emphasized physically integrating the data into a DW or another application. This has proven to be very time consuming, resulting in significant backlogs and limiting business analytics. In addition, business people have often been granted limited access to non-integrated data to protect them from potential inconsistencies. The data access and integration layer needs to empower the business people to get the data they need as quickly as possible, recognizing that getting the best available data, even if not perfect, is better than making a decision with incomplete data or by using a data shadow system. There are several considerations for the architectural options of this layer: • Data access The access options, provided that security and privacy requirements are met, include query sources directly, data services, using local files and data virtualization. The first three alternatives are all pointto-point access where the business person must know about the source, secure access and then navigate the source. Data virtualization (below) is an architectural option that creates a data source catalog that can be saved, shared and documented for business analysts and augmented by the IT staff. • Data filtering, aggregating, joining and metrics calculations Today, business people rely on IT-built reporting fed by data-integration tools, and then use spreadsheets to fill the gaps. Gathering requirements, designing and building the IT-built reports or dashboards severely slows down the time-to-solution. The analytics sandbox leverages business analytics tools, such as data discovery or data virtualization to enable the business analyst to perform this functionality. • Augmenting enterprise data sources Often, critical data to classify, filter and analyze is not available from enterprise sources, but may require an external data feed or an import from another business group. The sandbox needs to provide the storage and ability to extract that data, and then import it into the environment. • Data virtualization versus ETL (Extract, Transform & Load) data integration Data integration, data management and building a consistent, clean and conformed data warehouse will continue to be responsibility of IT group. The data-integration capability will expand beyond traditional ETL to include data virtualization. Data virtualization empowers business people in a couple of ways. First, it enables them to expand the data used in their analysis without requiring that it be physically integrated. Second, they do not have to get IT involved (via business requirements, data modeling, ETL and BI design) every time data needs to be added. This iterative and agile approach supports data discovery more productively for both business and IT. Data virtualization eliminates the undocumented, overlapping and time-consuming point-to-point direct access connections that business people got stuck doing in the past with their data shadow systems. With data virtualization, IT and business people can add data sources into a repository that will document them, identify relationships between sources and uses, and encourage reuse. To the business analyst the virtualization repository provides an information catalog to the relevant data needed for their analysis. Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 7 SECTION 5: ADVICE To conclude, we offer some key advice for designing and operating analytical sandboxes that enables the analytical elite to conduct their situational analysis quickly and then act upon their insights: • Build for the analytical elites, not the masses The analytical elite, i.e. business analysts and “power” users, are the people who build data shadow systems and spreadmarts. They are the go-to people when management needs answers, and they are the people that IT goes to understand what the business masses, i.e. casual users, need. Trust them. Give them the BI tools they want (not just what meets IT standards or controls), the data they request (even if it not perfect) and the platform to do their analysis. And then get out of their way! • Create an enterprise data view Business needs access to an enterprise view of its data. Based on reality, an enterprise will not be able to physically integrate everything, nor should it. Leverage and expand an enterprise DW if you have one, but the business will need to get data from many other sources, i.e. data silos. It is easy to give business direct access to these data silos, but working with data shadow systems is likely to result in inconsistent data and wasted time. Embrace data virtualization and a hybrid data view mixing physically- and virtually-integrated data. Virtualization enables business relationships and metrics to be built into the data view without having to go through the lengthy ETL integration process. In addition, it enables you to include various data types and data sources that should not be physically integrated. • Establish separate but complementary business and IT roles Historically, IT has built the entire analytical solution, When that solution did not have the data that the business needed or could not deliver it quickly enough, the analytical elites were forced to build their own data shadow systems that included BI and data integration. It is time to turn BI and analytics over to the analytical elites and let IT concentrate on data integration and delivery. The first ingredient for successful self-service BI is an analyst with business knowledge and analytical expertise. The second ingredient is IT that can enable self-service data to feed the analyst. • Do not be afraid to try something new The technologies and design approaches for business analytics and data integration are continually evolving in terms of capabilities, scale and total cost of ownership. Also,the vendor landscape has been vibrant with startups bringing new technologies to the market, while mergers and acquisitions consolidate and expand existing product capabilities. To meet the demands of the analytical elite, analytical sandboxes need to be designed differently than the standard production BI solution. Do not be afraid to try new database, in-memory, virtualization and integration technologies from new vendors. Meeting the needs of situational analytics is going to mean thinking “out of the box.” Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 8 About the Author: Rick Sherman is the founder of Athena IT Solutions, a firm that provides business intelligence, data integration and data warehouse consulting, training and vendor services. In addition to having more than 25 years of experience in BI solutions, Rick writes on IT topics and is a frequent speaker at industry events. He blogs at The Data Doghouse and can be reached at [email protected]. For More Information: For More Information: To learn more about how Composite Software can simplify information access at your enterprise, please contact us. To learn more about how Athena IT Solutions can increase the success of your BI, data integration or data warehouse project, please contact us. [email protected] Phone (650) 227-8200 Fax (650) 227-8199 www.compositesw.com [email protected] Phone (978) 897-3322 Fax (978) 461-0809 www.athena-solutions.com Composite Software 2655 Campus Drive, Suite 200 San Mateo, CA 94403 Athena IT Solutions Two Clock Tower Place, Suite 540 Maynard, MA 01754 Analytics Best Practices: The Analytical Sandbox ©2013 Athena IT Solutions Page 9
© Copyright 2026 Paperzz