welcome to the world of big data. new world problems

Business Analytics Paper
WELCOME TO THE WORLD OF BIG DATA. NEW
WORLD PROBLEMS, NEW WORLD SOLUTIONS
TECHNOLOGY by Zachary Zeus
Data in our world has been exploding. According to IBM research, 90% of
today’s data was created in the last two years alone and every day sees
another 2.5 quintillion bytes. This is the world of Big Data – the mirror
refection of human life on the planet today and inconceivable insight into
people’s behaviours, decisions and daily transactions.
According to recent research by McKinsey, Big Data is the next frontier in
global industry’s quest for innovation, competition, and productivity, and is
already providing sweeping change in a diverse range of sectors – from
medical research and crime prevention.
WHAT IS THE DEFINITION OF BIG DATA?
Big data refers to datasets that grow so large they become complicated to
work with using on-hand database management tools. These difficulties
include capturing the data, storage, searching, sharing, analytics and
visualizing.
Generally there are three big data types:
• Transactional (reserved mainly for credit card companies and financial
services)
• Sub-transactional typically the events leading to transactions
• Non-transactional (websites, blogs etc).
Whichever the category, we tend to categorize big data as:
•
High Volume – because it’s too big to be analyzed using traditional
methods
BizCubed | Sydney
+61 2 9007 9887 | [email protected]
Business Analytics Paper
•
High Velocity – in that much of this is real time data and needs to
analyzed quickly to hold value
•
High Variety – typically unwieldy data that comes in many types and
formats
UNDERSTANDING THE POTENTIAL FOR YOUR BUSINESS It would be tempting to view the big data phenomenon as a social mediagenerated bubble, yet increasing numbers of global industries are
uncovering compelling insights using big data analytics - online businesses,
retail organisations and many media/marketing companies. Equally, a recent
MGI study concluded that in the developed economies of Europe,
government administrators could save more than €100 billion ($149 billion)
in operational efficiency improvements alone by using big data.1”
However the key word here is ‘could’ because for most industries today the
ability to analyse these data sets is beyond reach. Big data cannot be
analysed using database management tools and is beyond the capacity of
traditional BI databases. To make sense of big data you need access to tools
optimized for massive data crunching, and more importantly; access to
analytics that can make sense of this data for your business.
BIG DATA SOLUTIONS
Unsurprisingly there are a number of Big Data solutions that have launched
in the marketplace and these vary depending on your big data requirements.
The system for processing weblog data, for example, is very different to
applications enabling corporate treasury departments to report on intraday
cash balances. Two solutions gaining global attention are Hadoop - an
1
Big data: The next frontier for innovation, competition, and productivity” MGI May. 2011 by James Manyika,
Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, Angela Hung Byers
BizCubed | Sydney
+61 2 9007 9887 | [email protected]
Business Analytics Paper
Apache developed framework that allows for the processing of large data
sets across clusters of computers using a simple programming model and
under a free licence and NoSQL databases – non-sequential databases that
uses clusters of commodity servers to manage huge data and transaction
volumes and is highly-optimized for retrieval and record storage, elastic
scaling and allows you to store virtually any structure of data (MongoDB,
CouchDB, and Cassandra).
CHALLENGES AND CONSTRAINTS
When explaining what big data solutions are it’s important to clarify what
they are not. Firstly, big data solutions they are not databases. They don’t
provide the capabilities that BI toolsets expect of a database and they’re
comparatively slow. The smallest query possible on Hadoop, for example,
has an execution time that is much slower than that of a database. It is
optimized for executing very intensive data processing tasks on very large
amounts of data and not for quick queries.
Equally, from a BI perspective, these solutions offer few facilities for ad-hoc
query and analysis. Even a simple query requires significant programming
expertise, and most BI tools don’t provide connectivity to big data sources.
This is proving hugely problematic and frustrating for organizations wanting
to use their data for intelligence gathering – firstly because it’s not always
possible to resource the highly technical users to programme the queries,
and secondly because even these skilled users are limited in the access and
visibility these tools allow without huge time and project commitments.
THE OPEN SOURCE OPPORTUNITY
Interestingly the historical moves towards 'open source' has created exactly
that intelligence opportunity. Rather than committing to long term, multimillion dollar data projects, some organisations have been able to use open
tools to set up short, experimental projects. These have enabled them to
BizCubed | Sydney
+61 2 9007 9887 | [email protected]
Business Analytics Paper
explore the true value of their data and build tools and methods to make the
big data mining process increasingly easier.
Equally, Pentaho – the leaders in Big Data analytics, now has the capability to
significantly lower the technical barriers of Hadoop and No SQL tools using
an environment that’s logical for users to understand.
The system’s
designed to integrate data, leverage the full capabilities of each big data
platform and enables users to access the information they need in a highly
visual format.
In this sense Pentaho is being deployed to make it easier for groups of users
(not just the technically specialised) to conduct useful analytics by sitting on
top of unstructured data sources and providing an end-to-end BI solution
including reporting tools, ad hoc query options and genuine interactive
analysis.
A BUSINESS INTELLIGENCE APPROACH
This BI focus is important because without it your Big Data analysis is
virtually worthless. As MIT senior lecturer Jonathan Byrnes warned recently in
an article for Leading Company:
“initiatives have to be co-ordinated and focused on the right long-term
strategic goals to be effective. If the availability of big data encourages a
massive flock of independent tactical initiatives, it will do more harm than
good. 2”
It makes good business sense to secure a big data vendor who has business
intelligence capabilities and can work closely with you to determine which
data sets will have high value analytical uses for your company. Equally
though, the industry needs to accept that this is new world terrain - you
don’t know what you need until you can explore the true value of your data
2
www.leadingcompany.com.au/big-data/big-data-big-opportunity-or-big-headache (11 March 2012)
BizCubed | Sydney
+61 2 9007 9887 | [email protected]
Business Analytics Paper
and for that you need cost-effective tools that allow for short, experimental
projects.
THE WAY FORWARD
The implications of big data and the increasing volumes and detail of the
information will continue to multiply for the foreseeable future. Enterprise
driven data is predicted to grow by 650% over the next few years and 80% of
that will be unstructured 3. Your customers will continue to generate this
information – but how you access it (and how you make sense of it) could be
the key differentiator between you and your competitors. Finally, when
you’re choosing a vendor for your big data analytics, keep data security high
and make sure there’s synergy between the big data solution and your
existing infrastructure. Big data analysis is not a one size fits all solution.
Make sure your vendor understands your business goals and the nuances
and implications of your data. Without that insight – your access to big data
analysis will fall far short of the business intelligence it should provide.
To talk to BIZCUBED about our Pentaho Big Data solutions,
email [email protected] or call 02 9007 9887.
3
Gartner webinar Technical Trends you can’t afford to ignore, January 2010.
BizCubed | Sydney
+61 2 9007 9887 | [email protected]