Adam Sutton: Do you mind if I record our interview

Big Data Analytics: Profiling the Use of Analytic Platforms in User Organizations
Sponsored by BeyeNetwork
Speaker: Wayne Eckerson, Director of Research,
Business Applications & Architecture Media Group, TechTarget
Wayne Eckerson: Welcome to this webcast on Big Data Analytics. My name is Wayne Eckerson, a long
time industry analyst and thought leader in the business intelligence market and I’ll be your speaker
today. If you do have questions for me, please don’t hesitate to send me an email at
[email protected]. I’d be very happy to dialogue with you about this important topic. The research
and findings that I’ll present in this webcast are based on a report that you can download for free from the
BeyeNETWORK website or from Bitpipe. In fact, the URL is in the handout section on your screen. The
report is 40 pages in length, so I hope you take the time to download and cruise through its details. This
60 minute webcast will present highlights from that report.
First, I’ll talk about the big data analytics movement, what’s behind it, what it is and best practices for
doing it. Second, I’ll talk about big data analytics engines. I’ll explain the technology most of these
engines use to turbocharge analytical queries. Then I’ll catalog vendors in this space. Third, I’ll lump
analytic engines into four major categories and present results from our survey that show what causes
customers to buy each category of product and make some recommendations for you. Finally, and
perhaps most importantly, I’ll describe a framework for implementing big data analytics and show how you
can extend your existing business intelligence and data warehousing architectures to handle new
business requirements for big data.
Before I begin, let me thank our sponsors who made this research and webcast possible; 1010data,
ParAccel, SAN, Infobright, Informatica, MarkLogic, SAP and SAS. Thank you to all! Thank you very much!
So, why big data? There has been a lot of talk about big data in the past year, which I personally find a bit
puzzling. I’ve been in the data warehousing field for more than 15 years and data warehousing has
always been about big data. So, what’s new in 2011? Why are we talking about big data today? I think
there are several reasons. One is changing data types. Organizations are capturing different types of data
today, as you probably know. Until about five years ago, most data was transactional in nature, consisting
of numeric data that fits easily into rows and columns in relational databases. Today, the growth in data is
fueled by largely unstructured data from web servers as well as machine generated from an exploding
number of sensors.
Two, technology advances, hardware has finally caught up with software. The exponential gains in price
performance exhibited by computer processors, memory and disk storage have finally made it possible to
store and analyze large volumes of data at an affordable price. Simply put, organizations are storing and
analyzing more data because they can, and so we’re seeing an outsourcing. Because of the complexity
and cost of storing and analyzing web traffic data, most of the organizations have outsourced this function
to third-party service bureaus. But, as the size and importance of corporate e-commerce channels have
increased, many companies are now eager to insource this data to gain greater insights about customers.
At the same time, virtualization technology is making it attractive for organizations to begin considering
moving large-scale data processing to private hosting networks or even public clouds.
Fourth, developers discovered data. Now, the biggest reason for the popularity of the term big data is that
web and application developers have discovered the value of building data-intensive applications. To
application developers, big data is new and exciting. Of course, for those of us who have made their
careers in the data world, the new era of big data is simply another step in the evolution of our data
warehousing and data management systems that support reporting the analysis applications.
Now, big data by itself, regardless of the type, is worthless unless business users do something with it
that delivers value to the organization. That’s where analytics comes in. Although organizations have
always run reports against data warehouses, most haven’t opened these repositories to ad hoc
exploration. This is partly because analysis tools have been too complex for the average user, but it’s
also because the repositories, these data warehouses, often don’t contain all of the data needed by
power users to finish or complete their analysis, but this is changing. A few things to know, patterns, a
valuable characteristic of big data is that it contains more patterns and interest in the anomalies than
small data. Thus, organizations can gain greater value by mining large data volumes than smaller ones.
Fortunately, techniques already exist to mine big data thanks to companies such as SAS and SPSS,
which is now part of IBM, that ship analytical workbenches, something we call data mining workbenches.
Real time is also a contributor here. Organizations that accumulate big data recognized very quickly that
they need to change the way they capture, transform and move data from a nightly batch process, which
has been typical of data warehousing environments, to a continuous unloading process using microbatch
loads or even event-driven updates. This technical constraint pays big business dividends because it
makes it possible to deliver critical information to users in near real time. So, it’s safe to say that the
movement to big data is actually forcing us to move to real-time data at the same time.
Complex calculations or analytics…I apologize for the letters running off into the end here. In addition,
during the past 15 years the analytical IQ of many organizations has evolved from reporting and
dashboarding, and lightweight analysis. Many are now on the verge of upping their analytical IQ by
predictive analytics against both structured and unstructured data. This type of analytics can be used to
do (Inaudible) highly tailored cross-sell recommendations to predicting failure rates of aircraft engines.
Finally, sustainable advantage, at the same time executives have recognized the power of analytics to
deliver a competitive advantage, thanks to the pioneering work of thought leaders such as Tom
Davenport, who co-wrote the book Competing on Analytics. In fact, forward thinking executives recognize
that analytics may be the only true source of sustainable advantage since it empowers employees at all
levels of an organization with information to help them make smarter decisions, and that’s invaluable.
Now, the road to big data analytics is not easy and success is not guaranteed. Analytical champions are
still rare today and that’s because succeeding with big data analytics requires the right culture, the right
people, the right organization, the right architecture and the right technology. So, let me go through each
of these. In terms of the right culture, the outer ring on this diagram here, analytical organizations are
championed by executives who believe in making fact-based decisions with validating their intuition with
data. These executives create a culture of performance measurement which individuals and groups and
held accountable for the outcomes of predefined metrics underlying the strategic objectives. The right
people, you can’t do big data analytics without power users, and more specifically business analysts,
analytical modelers and now data scientists. These folks possess a rare combination of skills and
knowledge. They have deep understanding of business processes and the data that sits behind those
processes, and they’re skillful in the use of various analytical tools, including Excel, SQL, analytical
workbenches and various coding languages.
Number three, you need the right organization. Historically, business analysts with the aforementioned
skills were pooled in pockets of an organization and hired by departmental heads. But, analytical
champions create a shared service organization that is an analytical center of excellence and makes
analytics a pervasive competence. Analysts are still assigned to specific departments and processes, but
they are also part of a central organization that provides collaboration, comradery and a career path for
these analysts.
And, number four, analytical platform. At the heart of an analytical infrastructure is an analytic platform,
the underlying data management system that consumes, integrates and provides user access to
information for reporting and analysis activities. Today, many vendors, including most sponsors of this
webinar, provide specialized analytical platforms that provide dramatically better, great performance than
existing systems. There are many different types of analytical platforms sold by dozens of vendors. And,
fifth, which I skipped over, you need the right architecture and that is for the last part of this webinar,
which we’ll get to at the end.
I want to drill now on analytical platforms, as I just mentioned. So, what is an analytical platform? As you
can see here, it’s a data management system optimized for the query processing and analytics that
provides superior price performance and availability, compared with general purpose database
management systems. So, given this definition, some of our survey respondents said they already have
an analytic platform. This is actually surprisingly how these platforms, except for Teradata and Sybase IQ,
have really only been generally available for the past five years or so. Looking at the survey responses, I
did see a lot of Microsoft customers who think SQL Server fits this definition, which it actually doesn’t, but
nonetheless I think the results speak volumes for the power of these analytical platforms to optimize the
performance of analytical applications. Customers have recognized their power and have adopted them
by the boat load.
Now, analytic platforms offer superior price performance for many reasons, and while product
architectures vary considerably from vendor to vendor most support the following characteristics.
Massively parallel processing is the first. Most analytical platforms spread data across multiple nodes.
Each contains their own CPU, memory and storage, which are connected to a high-speed back plane.
When the user submits a query or runs an application, the shared-nothing system divides the work across
the nodes, each of which process the query on their piece of the data, and then ship the results to a
master node. It assembles the final result and sends it to the user. MPP systems are highly scalable since
you simply add nodes to increase processing power.
Number two is balanced configurations. Analytic platforms optimize configuration of CPU, memory and
disk for query processing rather than transaction processing. Analytic appliances essentially hardwire the
configuration into the system and don’t let customers change it, whereas analytical bundles or analytical
databases, and the software only solutions, will help customers to configure the underlying hardware to
match the unique application requirements. Now, there is…instead of forcing you to always do the tuning
of these elements, the memory, the CPU and the disk, now you can buy systems that are preconfigured
and save you a lot of time and money.
Another advancement is storage-level processing. In fact, Netezza’s big innovation was to move some
database functions, specifically data filtering functions, into the storage system using (Inaudible)
programmable data arrays. This storage-level filtering reduces the amount of data that the TBMS has to
process, which significantly increases (Inaudible). Many vendors have followed suit in moving various
database functions into hardware essentially. Another characteristic is columnar storage and
compression, and many vendors have followed the lead of Sybase, SAN technology, ParAccel and other
commoner pioneers by storing data in columns rather than rows. Since most queries ask for a subset of
columns in a row rather than all rows, whereas rather than all columns, storing data in columns minimizes
the amount the of data that needs to be retrieved from disk and processes by the database, which
accelerates query performance. In addition, since data elements in many columns are repeated, such as
male and female in the gender field, column store systems can eliminate duplicates and compress data
volumes significantly, sometimes as much as 10:1. This enables more data to fit into memory, also
speeding processing.
Memory is another…use of memory in sophisticated ways is another characteristic of these systems.
Many make little use of memory caches to speed query processing and some products, such as SAP
HANA, (Inaudible) store all data in memory while others store recently queried results in a smart cache so
others who need to retrieve the same data can pull it from memory rather than disk, which is obviously
much faster. Given the growing affordability of memory and the widespread deployment of 64-bit
operating systems, many analytic platforms are expanding their memory footprints to speed processing.
Query optimizers, always very important, and analytic platform vendors invest a lot of time and money
researching ways to enhance their query optimizers to handle various types of analytical workloads. In
fact, a good query optimizer is the biggest contributor to query performance of all of these characteristics
I’ve been talking about here. In this respect, the older vendors with established products actually have an
edge on their newer competitors.
Finally, plug-in analytics (Inaudible) the name. Many analytic platforms offer built-in support for complex
analytics. This includes complex SQL, such as correlated sub-queries as well as procedural code
implemented as plug-ins to the database. Some vendors offer a library of analytical routines, from fuzzy
matching algorithms to market basket calculations, and others provide native support for mass produced
programs and can call those using SQL. So, that’s a little bit about the technology driving these analytical
platforms. Let’s take a look at these platforms and the vendors that supply them.
Here, you can see a table. There are types of analytic platforms with the vendors who provide that type.
Let me just go through each of these characteristics a little bit in more detail. MPP analytic database at
the top, as I said earlier these are row-based databases designed to scale out on a cluster of commodity
servers and run complex queries in parallel against large volumes of data. Columnar databases are
database management systems that store data in columns, not rows, and support high data compression
ratios. Analytic appliances, preconfigured hardware and software systems designed for query processing
and analytics that require little setup and tuning. Analytical bundles, predefined hardware and software
configurations that are certified to meet specific performance criteria, but the customer must purchase the
hardware and configure the software on it. In-memory databases and these are systems that load data
into memory to execute complex queries. Distributed file-based systems, these are designed for storing,
indexing, and manipulating, and querying large volumes of unstructured and semi-structured data for the
most part, although there is no reason they can’t work on structured data as well. And, this is the realm of
Hadoop and the NULL SQL databases.
Analytic services, analytic platforms delivered as a hosted or public cloud-based service should increase
in popularity over time, especially with small and medium-sized businesses that don’t have much of an IT
department. Non-relational databases, these are optimized for querying unstructured data as well as
structured data, very related to Hadoop. A lot of them come out of the open source world, but not all.
Typically, they are databases rather than distributed file-based systems. Last but not least, CEP and
streaming engines. These systems ingest, filter, calculate and correlate large volumes of discrete events
and apply rules to trigger alerts when certain conditions are met. Now, that’s a broad overview of
subtypes of analytic platforms and the vendors that supply them, as you can see from this table.
Now, our survey grouped all of those platforms that we just talked about into four major categories to
make it easier to compare and contrast for his product offerings and here are the four categories. Analytic
databases really are the software-only analytic platforms that run on hardware that customers have to
purchase. As a rule of thumb, analytic databases are good for organizations that want to tune database
performance for specific workloads or run the database software on a virtualized private cloud. As you
can see here, 46% of our survey respondents have implemented an analytic database of some sort. An
analytic appliance, implemented by 49% of our surveys have said earlier their hardware-software
combinations designed to support ad hoc queries, as a rule these analytic appliances are fast to deploy,
easy to maintain and make good replacements for Microsoft SQL Server or Oracle data warehouses that
have run out of gas. They also make great standalone data marts to offload complex queries from large,
maxed out data warehousing hubs.
Analytic services, as I mentioned earlier, only 5% of them respond to using this form of analytic platform.
As a rule of thumb, these services are great for development, test and prototyping applications, as well as
for organizations that don’t have an IT department or want to outsource data center operations, or get up
and running very quickly with a minimum of capital expenditure. Finally our file-based analytic system
into which I’ve also lumped the NULL SQL products, these are ideal for storing and analyzing large
volumes of unstructured data and don’t require an upfront schema design. So, we’ll talk a little bit more
about this type of system when we get into our architecture section of the report, very hot right now, a lot
of interest in Hadoop and NULL SQL.
When examining the business requirements driving purchases of analytic platforms overall, three
percolate to the top, faster queries, storing more data and reduce costs. These are followed by three
more, more complex queries, higher availability and quicker to deploy. And, this ranking that I just went
through, and which is depicted in this chart, is based on summing the percentages of all four deployment
options for each requirement. But, more importantly, this chart shows that customers purchase each
deployment option for slightly different reasons. Analytic database customers, for instance in the blue,
they value quick to deploy, built-in analytics and easier maintenance, more than other requirements.
Analytic service customers favor storing more data, high availability and reduced costs. Not suprisingly,
customers with file-based systems were interested most in the ability to support more diverse data and
more flexible schemas, two hallmarks of Hadoop or NULL SQL offering. The most distinctive of the four
categories, in terms of motivations for purchasing, were analytic appliances. Here, customers, in fact
almost two-thirds of customers, valued faster queries, more complex queries and faster load times. This
suggests that analytic appliance customers seek to offload complex ad hoc queries from data
warehouses.
We also asked respondents if they were looking for a specific deployment option when evaluating
products. Except for customers with file-based systems, most customers investigated products across all
four categories. For example, Blue Cross Blue Shield of Kansas City looked at columnar databases in an
appliance before making its decision. Interesting, no analytic service customers intended to subscribe to a
service prior to evaluating products. That’s because many analytic service customers subscribe to such
services on a temporary basis, either to test or prototype a system, or to wait until the IT department
readies the hardware to have the system. Some of these customers decide to continue with the services,
recognizing that they provide a more cost-effective environment than an in-house system.
BI Delivery Framework 2020
Business Intelligence
End-User Tools
Reports and Dashboards
Design Framework
Key-value pair indexes
Universal Information Access
Hadoop, Map Reduce
Search, NoSQL, Java
Dashboard Alerts
Event-Driven Alerts and
Dashboards
Analytic
Sandboxes
Analytic Sandboxes
Event detection and
correlation
Event-driven
Reporting &
Analysis
Continuous Intelligence
Architecture
Data Ware-
Data Warehousing
housing
CEP, Streams
Content Intelligence
MAD Dashboards
Ad hoc
hoc exploration
Ad
query,
Spreadsheets, OLAP,
Visual Analysis, Analytic
Excel,
Access, SAS, Visual Hadoop
Analysis
Workbenches,
Analytics Intelligence
12
Okay, so that’s just a highlight of some of the survey results showing motivations and drivers for the
purchase of different types of analytical systems. Now, I want to switch gears and get into how you
architect your data warehousing environment, or rather how you extend it to support analytics and big
data. So, this is the overall framework that I introduced in my first report of the year. I called this the BI
Delivery Framework 2020. It’s basically my vision for what BI environments will look like in about 10
years. The thing you should notice is that instead of one intelligence, that is business intelligence, and
one architecture to support it, to support all reporting and analysis applications, there are actually four
intelligence nouns, so let me briefly describe each. So, at the top, in green, is the business intelligence
sector and this represents the classic data warehousing environment that delivers reports and
dashboards primarily to casual users via a MAD framework. MAD is…my framework stands for monitor,
analyze and drill to detail. It’s a framework I use to help organizations understand how to best design
performance dashboards.
Moving to the right, the second intelligence is continuous intelligence, and this delivers near real-time
information alerts to operational workers primarily, but not exclusively, using Aventure as an architecture
to handle both simple and complex events using a variety of architectures, including streaming and
complex event processing. At the bottom is analytics intelligence and this enables power users to submit
ad hoc queries against any data source using a variety of analytical tools, ideally supported by analytical
sandboxes that are built into the top-down data warehousing environments. To the left is content
intelligence and this makes unstructured data an equal target for reporting analysis applications, equal
compared to structured data, which has always been a key component of data warehousing
environments. These content intelligent systems use a variety of indexing technologies to store both
structured as well unstructured data, and allows users to submit queries against them. And, many people
call this a unified information architecture because you can query all types of data with one query. It’s also
a fast growing area that encompasses Hadoop, NULL SQL and many search-based technologies that are
moving into the business intelligence space. Now, if you want more information about this framework,
please download my first report, titled Analytic Architectures: BI Delivery Framework 2020, from
BeyeNETWORKS’ website, basically the same place you can download the report that this webcast is
based on. But, before leaving this framework, I want to drill down on two of these intelligence depicted
here. One is the top-down business intelligence. Here is the bottom-up analytics intelligence. These are
two of the most interrelated intelligence in the framework and the two most problematic for organizations
to manage and balance. So, this is another depiction of those two intelligence, top-down and bottom-up
business intelligence, and analytics intelligence.
So, I already mentioned business intelligence is a top-down environment that delivers reports and
dashboards to casual users. The output is based on predefined metrics that allow strategic goals and
objectives. In other words, in a top-down environment, you know in advance what questions users want to
ask and you model the environment accordingly. The benefits of this environment, as you see on the left,
is that it delivers information and consistency, and alignment, the proverbial single version of truth to all
users and all departments of the organization. The downside, is that many have implemented data
warehouses have come to know, is that they’re hard to build, hard to change, costly and politically
charged, and by that I mean once you try to cross departmental boundaries and come up with a single
definition of commonly used terms, you run into political issues. Now, in contrast, analytics intelligence is
the opposite. It’s a bottom-up environment geared to power users, not casual users, who submit ad hoc
queries against a variety of sources and usually to optimize processes in projects rather than to
implement standard metrics that represent top-down goals and objectives. So, this is a totally ad hoc
environment. Typically, you don’t know what questions you’re going to ask until the day of. As a result,
surprisingly and maybe ironically, it’s quick to build, easy to change, low cost and politically uncharged.
That’s because it is usually one person with a spreadsheet or an access database who decides how to
define all of the key terms and metrics that they’re going to use. The problem with this environment is that
it creates a myriad of analytical silos, data shadow systems, spreadmarts, whatever you want to call
them, and totally forfeits information and alignment, and consistency, which eventually will catch up with
an organization. So, the problem here with top-down and bottom-up is that most companies try to do all
BI in one or the other. And, they start with a top-down environment, get discouraged because it’s
expensive, doesn’t support ad hoc requests and really hard to change to meet new needs. So, they
abandon it in favor of analytics intelligence in a bottom-up environment, which works really well for a while
until they realize they’re overwhelmed with these analytical silos and spreadmarts, and have forfeited any
kind of common understanding of business performance.
So, the first key here to recognize is that you need both top-down and bottom-up environments. They are
synergistic. As you can see here, an office (Inaudible) reports and reports (Inaudible) office. For instance,
you do some analysis, you find something interesting and you turn it into a regular scheduled report for
everyone else to see. But, that report in turn triggers additional questions which call for additional analysis
done by power users with spreadsheets, or whatever tools they’re using, and so on and so forth. So, you
need both of these two. The problem is that most companies try to shove all of their business intelligence
requirements and activities into one architecture or the other. So, a second key is to apply the right
architecture to the right task. Typically, top-down environments address 80% of your users’ information
requirements, and bottom-up 20%. Yet, at the same time, the bottom-up environment may uncover 80%
of your most valuable insights. So, both are equally important and must be treated equivalently when
building your corporate information architecture.
Alright, so here is the architecture behind the BI Delivery Framework 2020. That’s the architecture that I
think a lot of you will end up having in the next five to ten years, so let me step you through this. So,
what’s pictured in blue below is the classic top-down business intelligence and data warehousing
environment that most organizations have already built. As you can see, the warehouse pulls data from
operational systems. This is mostly structured data, transactional data. This data warehouse stages that
data then lightly integrates it, and then pushes it out to data mart which could be a logical or physical
database. That is done in usually a star schema or a cube format, designed for a specific group of users
who access that data mart through BI tools which have a BI server that actually issues the query, and
from that server they get reports and dashboards. So, that’s the environment we’ve all known, have come
to know and love, or hate I suppose, based on your perspective.
So, what’s pictured in pink are new components that address the other three intelligence that I want on
the framework. So, to the left on the screen are the new sources of data that typically have not been
loaded into data warehouses historically. This includes machine-generated data, such as from sensors,
web data, audio and video data, which is truly unstructured data, so it’s external data. Now, in front of
these sources, in fact in front of all of the sources, is a Hadoop cluster. As I said earlier, Hadoop is ideal
for processing in-batch, large volumes of unstructured and semi-structured data, although it can also
manage structured data. Many companies today are using Hadoop to pre-process their web data before
submitting it to a data warehouse for reporting and analysis purposes, although some are actually
analyzing the data right in Hadoop as well.
Now, atop of the data warehouse is a streaming complex event processing engine for handling
continuous intelligence and alerting applications. Now, this is not to say that you can’t do near real-time
data delivery through a data warehouse. Many companies already do this using many batch loads, but at
some point for specific types of applications you need real-time data capture and delivery and real-time
alerting. Fraud detection, for instance, is a good use case for CEP engines. Below the data warehouse is
a freestanding database or sandbox that offloads bottom-up analytic processing from the data
warehouse, if desired. Many organizations are recognizing that the data warehouse really is best
designed for reporting and dashboarding, the top-down types of activities, where you know the questions
in advance. And, any kind of ad hoc workload that runs against a warehouse only proves to slow down
the performance of that warehouse and frustrate the analytical users who are submitting those ad hoc
queries. So, one tactic is to offload either replica of data or a whole new set of data that can’t be loaded in
the warehouse into a freestanding sandbox or database, typically running on one of the analytic
platforms, like an appliance that we talked about earlier, to turbocharge those analytical queries and
provide a safe place for analytical users to play and do their analysis.
Now, to the right and bottom is the power user who traditionally has been left out of this architecture,
which is the blue architecture anyway. All of the IT folks, the data warehousing folks who manage the
blue components in this architecture, have been wary about letting power users query any of those data
elements for fear of bogging down those systems. But, as you can see here in the new architecture, the
BI Delivery Framework 2020, we have lots of different options, and we’ll go to the next slide to go through
those in more detail.
Alright, so this is the same BI architecture as the previous slide, but with the five sandboxes available for
power users, noted in green. So, let me go through those. First is a virtual sandbox which resides inside
the warehouse, and this is a set of dedicated partitions into which analysts can upload their own data and
mix it with corporate data. This requires the data warehouse database to have very good workflow
managed utilities, which not all do, to make sure that the analytical queries are not (Inaudible) activity and
workloads in the rest of the warehouse. So, the second sandbox is just (Inaudible) the virtual sandbox
and this is freestanding sandbox. And, as I said earlier, many companies implement the freestanding
sandbox to avoid contention with data (Inaudible) that a virtual sandbox might pose. Basically, this
sandbox can also affect the processing through the separate high power machine that is highly tailored to
(Inaudible) of the analysis.
A third type of sandbox, if you look closer to the business user, is in local in-memory BI tool where users
can pull data from many different sources, including the data warehouse, into a local memory buffer and
run their analysis there to like (Inaudible), Tablo, support this type of architecture. The best part about
these is…well, the one thing that prevents them from…or keeps them in some cases from turning into
spreadmarts is that when the power users want to publish what they have found to the rest of the world,
they need to publish onto an IT controlled server. Now, a full type of sandbox is Hadoop itself because it
allows power users who know the (Inaudible) data well and can write code to submit queries that can
follow the in Hadoop. And, simply but less time, certain data warehousing administrators allow very
privileged and skilled patterns just to run queries right into the warehouse, but they need to build welldesigned SQL and prove that their queries won’t (Inaudible) other users. So, you can see here many
different ways for a power user to get involved or be a part of the new BI architecture, the next generation
BI architecture. The most predominant ones will be the virtual sandbox, the freestanding sandbox, and
the in-memory bi sandbox, as well as the Hadoop cluster, which will be the primary domain of the socalled data scientists or data savvy analysts who know how to write Java and other types of code.
So, that pretty much concludes my presentation. I’ve got four recommendations to leave you with. The
first is…well, four recommendations for supporting big data analytics. The first is really on this top-down
and bottom-up BI. For too long, organizations have tried to shoehorn all types of users into a single
information architecture and that’s never worked. Organizations need to recognize that cache users need
top-down analytic reports and dashboards but power users need ad hoc exploratory tools and
environments. The second recommendation I’d like to leave you with is implemented BI architecture that
supports multiple intelligences, not just one. The BI architecture of the future, as we step through,
supports both traditional data warehousing to handle detailed transactional data, and file-based and non-
relational systems to handle unstructured and semi-structured data. It also supports continuous
intelligence through CEP and streaming engines, and analytical sandboxes for ad hoc exploration.
Three, create multiple types of analytic sandboxes. Analytic sandboxes, as we just discussed, bring
power users more fully into the corporate data environment by enabling them to mix personal and
corporate data and run complex ad hoc queries with minimal restrictions. And, fourth and finally,
implement analytic platforms that best meet business and technical requirements. As I discussed earlier,
there are four broad types of analytical platforms. Pick the one that is right for you. Appliances are quick
to deploy and easy to maintain. Analytic databases provide flexibility to run your software on any
hardware you choose, including virtualized hardware. Analytic services forego the time and cost of
provisioning software in your own data center, even if you have one. File-based systems are ideal for
processing unstructured and semi-structured data.
So, that concludes our webcast today. Thank you for tuning in! Again, if you have questions, please feel
free to email me at [email protected]. I’d love to hear your questions, your observations and
insights, as I’m always learning from all of you out there. Thank you again and have a great day!