The Four Things Data Scientists Wish You Knew

FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
by Brian Hopkins
October 26, 2015
Why Read This Report
Key Takeaways
In a business environment increasingly
characterized by the imperative to use deeper
insights to engage customers and outpace
competitors, data science is a hot topic. But data
science is very different from what most enterprise
architecture (EA) professionals are used to dealing
with. Read this report to find out how it’s different
and what data scientists hope you understand.
Learn how to optimize your technology strategy
and improve data science outcomes.
Data Science Is Very Different
Enterprise architects are used to engineered
data management processes that create clean,
curated data. But data science is not engineering;
it takes exploration and experimentation.
Four Principles Must Guide Your Investments
Data scientists want you to understand that this is
a team sport. They often need raw data, and the
faster they get it, the better they work. They also
want you to recognize that the discovery process
can be more valuable than algorithms.
Add Data Science Accelerators To Your Road
Map
Data scientists can benefit from technologies
that accelerate their work. A data science
workbench, fast data access services, metadata
management, and an insights fabric will help
speed your business forward in its plans to get
the most from data science.
FORRESTER.COM
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
by Brian Hopkins
with Gene Leganza, Elizabeth Cullen, and Diane Lynch
October 26, 2015
Table Of Contents
2 Firms Hope Data Science Will Give Them An
Edge
Data Science Is A Totally Different Animal
4 Data Scientists Want You To Understand
Four Basic Principles
No. 1: Data Science Is A Team Sport
No. 2: The Sushi Principle: Raw Data Is Often
Way More Interesting
No. 3: The Process Can Yield As Much Value
As The Model
No 4: Speed Is Crucial
Notes & Resources
Forrester spoke with 10 data experts from
consultancy, vendor, and end user companies
and interviewed 10 firms with advanced analytics
capabilities. Further inputs came from vendor
briefings and client inquiry calls.
Related Research Documents
Brief: Why Data-Driven Aspirations Fail
The Forrester Wave™: Big Data Predictive
Analytics Solutions, Q2 2015
Q&A: Forrester’s Top Five Questions About Big
Data
Recommendations
7 Add Data Science Accelerator Projects To
Your Road Map
What It Means
8 The Rise Of The Algorithm Will Cull Out
Lagging Businesses
9 Supplemental Material
Forrester Research, Inc., 60 Acorn Park Drive, Cambridge, MA 02140 USA
+1 617-613-6000 | Fax: +1 617-613-5000 | forrester.com
© 2015 Forrester Research, Inc. Opinions reflect judgment at the time and are subject to change. Forrester®,
Technographics®, Forrester Wave, RoleView, TechRadar, and Total Economic Impact are trademarks of Forrester
Research, Inc. All other trademarks are the property of their respective companies. Unauthorized copying or
distributing is a violation of copyright law. [email protected] or +1 866-367-7378
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
Firms Hope Data Science Will Give Them An Edge
Big data and advanced analytics initiatives are in full swing at most firms. Responding to Forrester’s
Global Business Technographics® Data And Analytics Survey, 2015, 66% of global data and analytics
decision-makers reported that their firms have expanded, have implemented, or are planning to
implement big data technologies within the next 12 months.1 More than a third of technology decisionmakers reported that large-scale predictive modeling, data mining, or advanced analytics are part of
their firms’ big data plans.2 As EA leaders and their organizations become serious about big data and
analytics programs, recruiting skilled employees, including data scientists, is a top priority (see Figure 1).
Data Science Is A Totally Different Animal
Forrester spoke with 10 working data science leaders and interviewed 10 firms with advanced
analytics capabilities.3 We wanted to discover whether business and technology management leaders
understood what data scientists did and if they were delivering what they needed for success. We
found out that data science is a different beast indeed:
“The world of data science is alien to most IT and businesspeople.” (Andrew Jennings, chief
analytics officer, FICO)
Firms often have naive notions and objectives for their data science programs. For example:
› Firms want certainty, and they engineer analytics to deliver it. Businesses like certainty, and
technology managers have spent a lot to ensure it. They cleanse data, enrich it, and make it “high
quality.” They sample, normalize, and conform and aggregate it. One data professional with deep
financial services experience typified common thinking when he told us, “If data isn’t clean and
normalized, it is not useful.” That’s not always true in data science.
› Data scientists work with uncertainty. Relative to an initial line of inquiry, Dr. Tye Rattenbury,
director of data science at Trifacta, told us that most data is “analytically useless.” Assessing
the validity of insights that scientists might extract is a big part of data science work. In other
words, data scientists do a lot of forensics to understand underlying processes and spot insight
opportunities. They do things like hypothesis development, feature engineering, pipeline building,
and model training and tuning. Furthermore, their output is rarely certainty — it’s usually a summary
of probable results.
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
2
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
FIGURE 1 Recruiting For Advanced Skills Is A Top Priority For Firms Seeking To Be Data-Driven
“What are your firm’s plans for the following data-driven initiatives?”
Expanding/implemented
Planning to implement within the next 12 months
51%
Recruiting more people with advanced data skills
23%
Combining content management and data management
programs into a unified information management program
47%
24%
Investing more in business-friendly, self-service
visualization and analytics
46%
24%
Implementing a single view of the customer
49%
22%
Changing our management culture to rely more on
quantitative decisions
48%
22%
Expanding our ability to source external data
47%
23%
Changing our processes to promote data
stewardship and sharing
46%
23%
Creating a data innovation capability
45%
23%
Building predictive systems
45%
23%
Creating an organizational CoE for BI and/or
advanced analytics
45%
23%
Investing in distributed real-time
insight-delivery technology
45%
22%
Creating a business-led data stewardship or
governance program
45%
22%
Investing in a data-hub-based big data
technology platform
44%
21%
Changing management incentives to
promote data sharing
44%
21%
Appointing a chief data officer
45%
16%
Base: 3,005 global data and analytics decision-makers
Source: Forrester’s Global Business Technographics® Data And Analytics Survey, 2015
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
3
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
Data Scientists Want You To Understand Four Basic Principles
We found that the mismatch between data science and traditional analytics was indeed causing pain.
For example, well-meaning technology professionals often respond to data science requests either too
slowly or with data that isn’t suitable to their needs. Enterprise architects can help their firms get the
most from their data science investments by understanding four principles (see Figure 2).
FIGURE 2 Understanding These Four Things Will Help You Get More From Your Firm’s Data Science Investment
Keys to understanding
data science
The old way
The data science way
It’s a team sport.
Analysts and data engineers work Data scientists need business and data
in separate organizations.
engineers to help them find and test insight.
The sushi principle: Raw
data is often better.
Data is delivered processed,
enriched, and sanitized.
Data scientists need fresh raw data to
feed experiments.
The process can be more Analytics are designed to answer
valuable than a model.
specific questions.
Data scientists are learning new things
as they navigate the data.
There’s a need for speed. Requests for data take hours or
days to fill.
Data scientists think differently when
they can work quickly.
No. 1: Data Science Is A Team Sport
Data scientists universally told us one thing: “It’s not magic.” They complained that many businesspeople
have the notion that they could apply data science to everything — just send the data scientists away to
find interesting things and tell them to come back with insights. The truth turns out to be quite different.
For example, predictive models are only accurate
when adequate data exists. Finding out which
customers might be predictable is itself a data
Data scientists universally told
mining exercise, and the answer might be “none.”
us one thing: “It’s not magic.”
Furthermore, the experts explained that data
science isn’t like other business analytics and
reporting. Engineers and data specialists don’t work up the reports and data and wait for analysts or
managers to consume them. Instead, data science is an exercise of exploration and discovery that
requires a diverse team:
› Experts with business acumen and data orientation translate data into insights. According to
Saum Mathur, senior vice president of big data analytics at CA Technologies, data science, as part
of business analytics, is a “high touch” activity. Business executives must constantly guide data
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
4
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
scientists as they work. They keep the data scientists focused on delivering the most important
insights, then approve the analytics results and communicate them to business stakeholders in
business language.
› Data scientists formulate testable hypothesis and assumptions. In data science, data scientists
must tease out meaningful business questions and translate them into a hypothesis that they can
answer with data. There are often hidden assumptions that they must monitor and evaluate as well.
For example, in 2009, when avian influenza, also known as bird flu, became a pandemic, millions
of people with no symptoms started searching the Internet to get more information about it. This
threw off the Google Flu Trends model, which implicitly assumed that the correlation between the
number of Internet searches and the actual occurrences of the flu would remain stable.4
› Developers help data scientists deploy insight. The output of data science — a prediction,
a model, or a score — is useless until the organization deploys it in a way that prompts action.
Increasingly, this means that data science intersects with software development. For example,
Art.com coded up a complex clustering algorithm that generated “similar to” search results as
searchers used new terms.5 While some firms still prefer their data scientists to also play the role of
code slingers, most need developers on the team.
› Data engineers and technology pros support the process. The most mature data science
organizations we find — Google, LinkedIn, Netflix, Tesla Motors, Uber, and Yahoo — all talk about
data pipelines. These are not yesterday’s extract, transform, and load (ETL) processes, where all
data ends up in the warehouse. Rather, data engineers and technology pros work together, DevOps
style. They care for services that provision data from everywhere — from raw streams at the source
to carefully curated customer master data.
No. 2: The Sushi Principle: Raw Data Is Often Way More Interesting
Enterprise architects must understand that raw data is superfood for data scientists. But in Forrester’s
Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online Survey, 61% of
enterprise architects responded that reports and
dashboards are the primary output of analytics
at their firm.6 Organizations often assemble these
Data is the food on which data
reports and dashboards from shiny, clean — and
scientists thrive, and raw data,
nutrient-poor — data. While well intentioned, they
like raw food, is often much
often miss the mark.
better.
“I’ve been in roles where IT would often give
us the cleaned, sanitized data from the
warehouse. All the interesting things we were looking for were gone. I can fish for myself — in fact,
I want to.” (Abe Gong, data science consultant)
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
5
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
Business and tech management leaders must recognize that data is the food on which data scientists
thrive and that raw data, like raw food, is often much better:
› Data scientists love to tap raw data at its source. When well-meaning ETL process clean,
conform, and aggregate data, items of interest to data scientists often disappear. For example, one
interviewee said that technology management furnished his team with aggregated claims records.
He wanted to investigate why claims from one system are different than others and needed to
examine the raw data, but it had been purged.
› Data scientists need help with archive retrieval and production databases queries. You can’t
always let data scientists run queries on production databases — that could have a negative
impact on online transaction processing (OLTP) workloads. Also, data scientists often need deep
historical data that has been archived. Enterprise architects can help by planning data access
services that include mirroring product data and self-service archive retrieval.
No. 3: The Process Can Yield As Much Value As The Model
The nature of data science, like any science, is that experiments fail and hypotheses turn out to be
wrong. However, the process can lead to new, adjacent insight opportunities. This can be trouble for
business execs who don’t consider failure an option or can’t stomach the many twists, turns, and
dead ends:
“A predictive model might never deliver huge benefits, but you learn a whole lot about your business
through the rigor it takes to build a one.” (Dr. Tye Rattenbury, director of data science, Trifacta)
“If you do any kind of project properly, you learn about your organization, your processes, the makeup of customers, your portfolio, who buys what, etc. A good data project finds answers to these
questions.” (Andrew Jennings, chief analytics officer, FICO)
In many cases, the journey yields unanticipated insights that spur further innovation and revenues:
› A financial services firm discovered the need to build a risk-tolerance score. Samuel Croker,
senior solution architect at SAS, told us about a financial services client that wanted to model
customers’ readiness for retirement. In the process, it discovered the need to model customers’
financial risk tolerance, something the client had not previously thought necessary. This spun off a
completely new data science project.
› Alerts often lead to more-complex optimization projects. Joel Cawley, a general manager
at IBM, is building a business focused on developing insights and integrating them into clients’
processes. For example, his team combines weather data with customers’ first-party data to
generate alerts. He told us that often, a simple alert leads customers into a predictive analytics
project for optimizing an entire process. For example, an alert of an impending storm’s impact on
equipment leads customers to optimize equipment procurement and maintenance.
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
6
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
No 4: Speed Is Crucial
Data science is an iterative process, and each iteration brings requests for more data. At many firms,
fetching new data takes time, especially if it’s deeply archived or massive in size. Abe Gong told us
that a data science team that can get an answer in less than a second is going to think and behave
differently than it would in a situation where getting an answer takes a minute or even 10 seconds.
Enterprise architects who understand this can help their data science teams by:
› Building high-performance data services. Go beyond building a Hadoop data lake with an SQL
engine, Apache Spark, and a predictive analytics tool on top. Consider adding a high-performance,
cache-optimized data virtualization technology from Cisco Systems, Denodo Technologies,
or Informatica.7 Or think about enabling federated SQL capabilities from Microsoft, Oracle,
or Teradata. Also, provision access to streaming data via a publish/subscribe data sink from
Confluence’s Kafka or Google’s Cloud Dataflow.
› Provisioning data-preparation tools. Not only do data scientists need to access data quickly
but they also need to work with it easily. Self-service data preparation tools are emerging to fit
this need. IBM, Informatica, and Teradata have all beefed up their offerings here, while smaller
firms like Paxata and Trifacta feature innovations like advanced machine learning or distributed
Hadoop processing.
Recommendations
Add Data Science Accelerator Projects To Your Road Map
The data scientists we spoke with articulated various characteristics of an ideal architecture that can
support many forms of data science (see Figure 3). Enterprise architecture leaders should accelerate
their firms’ investments by building these components into their road maps:
› A data science workbench. Data scientists need a wide variety of tools to do their work. Work
with your data scientists on a toolbox strategy that enables easy data extraction, preparation, and
mining. Data scientists also need model development and script-library management tools from
vendors like BlueData Software, Dato, Domino Data Labs, IBM, Revolution Analytics, SAS, Skytree,
Trifacta, and Waterline Data.
› A flexible framework for standing up various analytics components on demand. There is no
one-size-fits-all solution for data science’s data technology needs. Data scientists might need
Spark ML or SparkStreaming (microbatch). Some might do complex relationship analytics using a
graph database. Use virtualization and containers so you can rapidly stand and manage analytics
and application workloads easily in a shared farm. eBay, LinkedIn, and MapR Technologies started
project Apache Myriad to do just this.
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
7
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
› A managed data catalog and set of data APIs. Metadata management will help your data
science teams tremendously. But be sure you understand what metadata is most important to
them — like data locality, API interfacing rules, and most importantly, lineage. A big part of data
science involves explaining analytics models to executives who have to buy into the predictions.
Data lineage will help them prove models out.
› An insights fabric that can address streaming, batch, and federated data needs. Data
scientists need data from everywhere and at various stages of processing. Processed transaction
history from the warehouse may be fine, but they may need raw customer device data as well and
may need it very quickly. An insights fabric extends Forrester’s information fabric architecture with
on-demand analytics workload infrastructure, like Apache Spark or Kafka clusters. It must be able
to tap data along a continuum, from raw and messy to clean and normalized.
FIGURE 3 An Ideal Architecture For Supporting Data Science
Data science workbench
Microbatch
SQL
NoSQL
Algorithm
libraries
Data catalog/API management
Streaming
capture
pub/sub
Data pipeline
Data lake
Federation
services
Existing analytics and operational databases
What It Means
The Rise Of The Algorithm Will Cull Out Lagging Businesses
The main product of data science is an algorithm. The more data science you do, the more algorithms
you will deploy to help make or automate business decisions. For example, Signet Bank was the first
bank to employ an algorithm to set credit card rates based on consumer risk. Its credit card division was
so successful, it spun off and became Capital One. Now banks use hundreds or thousands of algorithms
and are building data science teams to create even more — and this is happening in every industry.
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
8
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
Enterprise architects must step back and recognize what is occurring: Today, we have billions of
transistors in our phones and computers, but in the 1950s, a radio had just one or two. Similarly, there is
no upper limit on how many algorithms firms will ultimately have and how they might use these for profit.
Michael Lewis’ Flash Boys: A Wall Street Revolt illustrates how this has already played out in highfrequency trading.8 These firms had the financial motivation to spend enormous sums of money on
technology and algorithms to shave microseconds off transactions 20 years ago, using complex
algorithms and super-fast networks. Today, computing power and connectivity have become much
cheaper. As we move into the age of connected things, the quality, quantity, and effective deployment
of algorithms will distinguish the winners and losers in many industries. This puts understanding and
supporting your data scientists in a whole new perspective — and the job you save may be your own!
Engage With An Analyst
Gain greater confidence in your decisions by working with Forrester thought leaders to apply our
research to your specific business and technology initiatives.
Analyst Inquiry
Analyst Advisory
Ask a question related to our research; a
Forrester analyst will help you put it into
practice and take the next step. Schedule
a 30-minute phone session with the analyst
or opt for a response via email.
Put research into practice with in-depth
analysis of your specific business and
technology challenges. Engagements
include custom advisory calls, strategy
days, workshops, speeches, and webinars.
Learn more about inquiry, including tips for
getting the most out of your discussion.
Learn about interactive advisory sessions
and how we can support your initiatives.
Supplemental Material
Survey Methodology
Forrester’s Global Business Technographics Data And Analytics Survey, 2015 was fielded in January
through March 2015 of 3,005 business and technology decision-makers located in Australia, Brazil,
Canada, China, France, Germany, India, New Zealand, the UK, and the US from companies with 100 or
more employees.
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
9
FOR ENTERPRISE ARCHITECTURE PROFESSIONALS
October 26, 2015
The Four Things Data Scientists Wish You Knew
Get The Most Out Of Your Data Science Investments
Forrester’s Business Technographics provides demand-side insight into the priorities, investments, and
customer journeys of business and technology decision-makers and the workforce across the globe.
Forrester collects data insights from qualified respondents in 10 countries spanning the Americas,
Europe, and Asia. Business Technographics uses only superior data sources and advanced datacleaning techniques to ensure the highest data quality.
Forrester’s Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online
Survey was fielded to 170 technology management professionals involved in or familiar with EA from
our ongoing technology management research panel and readers who have demonstrated an interest
in EA research. The panel consists of volunteers who join on the basis of interest and familiarity with
specific technology management topics. For quality assurance, panelists are required to provide
contact information and answer basic questions about their firms’ revenue and budgets. Forrester
fielded the survey from June to August 2015. Respondent incentives included a complimentary
webinar that discusses the survey results. Exact sample sizes are provided in this report on a questionby-question basis. Panels are not guaranteed to be representative of the population. Unless otherwise
noted, statistical data is intended to be used for descriptive and not inferential purposes.
Endnotes
1
Source: Forrester’s Global Business Technographics Data And Analytics Survey, 2015.
2
We asked technology decision-makers to identify all technologies that are included in their plans for big data. While
only by 1 percentage point, “large scale predictive modeling, data mining, or other advanced analytics” was the most
common component. The second-most-popular answer was “public cloud big data services.” Source: Forrester’s
Global Business Technographics Data And Analytics Survey, 2015.
3
Data science is the extraction of knowledge from large volumes of data that are structured or unstructured, which
is a continuation of the field data mining and predictive analytics. The people who perform these analytics are data
scientists. Forrester also includes financial service actuaries; statistical quantitative analysts, known as quants; and
marketing scientists under the data scientist label.
4
Google created a model to predict the movement of the flu at a very high level of detail based on a model of Internet
search patterns. Their model became inaccurate when the 2009 bird flu pandemic caused millions of people with no
flu symptoms to start searching the Internet to get more information about the flu. See the “Google Flu Trends — A Big
Data Fail? Not Exactly” Forrester report.
5
Most online websites do offline batch jobs to generate clusters of items similar to what web users are searching
for. Art.com generated clusters on the fly so users using new search terms would see relevant “similar to” results in
minutes. Source: May 2015 briefing by Art.com executives with Forrester.
6
Using a 5-point scale, we asked respondents to indicate their level of agreement or disagreement with the following
statement: “The output of our analytics is mostly reports and dashboards meant for broad consumption.” Source:
Forrester’s Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online Survey.
7
As enterprise architects look at how to deliver a trusted, real-time, integrated, and secure data platform to support
applications, they look at data virtualization. To review how the nine leading vendors in the marketplace faired against
Forrester’s 60-criteria evaluation, see the “The Forrester Wave™: Enterprise Data Virtualization, Q1 2015” Forrester
report.
8
Source: Michael Lewis, Flash Boys: A Wall Street Revolt, W.W. Norton & Company, 2015.
© 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
[email protected] or +1 866-367-7378
10
We work with business and technology leaders to develop
customer-obsessed strategies that drive growth.
PRODUCTS AND SERVICES
›
›
›
›
›
›
Core research and tools
Data and analytics
Peer collaboration
Analyst engagement
Consulting
Events
Forrester’s research and insights are tailored to your role and
critical business initiatives.
ROLES WE SERVE
Marketing & Strategy
Professionals
CMO
B2B Marketing
B2C Marketing
Customer Experience
Customer Insights
eBusiness & Channel
Strategy
Technology Management
Professionals
CIO
Application Development
& Delivery
› Enterprise Architecture
Infrastructure & Operations
Security & Risk
Sourcing & Vendor
Management
Technology Industry
Professionals
Analyst Relations
CLIENT SUPPORT
For information on hard-copy or electronic reprints, please contact Client Support at
+1 866-367-7378, +1 617-613-5730, or [email protected]. We offer quantity
discounts and special pricing for academic and nonprofit institutions.
Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with
business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary
research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a
singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations.
127141
For more information, visit forrester.com.