FOR ENTERPRISE ARCHITECTURE PROFESSIONALS The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments by Brian Hopkins October 26, 2015 Why Read This Report Key Takeaways In a business environment increasingly characterized by the imperative to use deeper insights to engage customers and outpace competitors, data science is a hot topic. But data science is very different from what most enterprise architecture (EA) professionals are used to dealing with. Read this report to find out how it’s different and what data scientists hope you understand. Learn how to optimize your technology strategy and improve data science outcomes. Data Science Is Very Different Enterprise architects are used to engineered data management processes that create clean, curated data. But data science is not engineering; it takes exploration and experimentation. Four Principles Must Guide Your Investments Data scientists want you to understand that this is a team sport. They often need raw data, and the faster they get it, the better they work. They also want you to recognize that the discovery process can be more valuable than algorithms. Add Data Science Accelerators To Your Road Map Data scientists can benefit from technologies that accelerate their work. A data science workbench, fast data access services, metadata management, and an insights fabric will help speed your business forward in its plans to get the most from data science. FORRESTER.COM FOR ENTERPRISE ARCHITECTURE PROFESSIONALS The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments by Brian Hopkins with Gene Leganza, Elizabeth Cullen, and Diane Lynch October 26, 2015 Table Of Contents 2 Firms Hope Data Science Will Give Them An Edge Data Science Is A Totally Different Animal 4 Data Scientists Want You To Understand Four Basic Principles No. 1: Data Science Is A Team Sport No. 2: The Sushi Principle: Raw Data Is Often Way More Interesting No. 3: The Process Can Yield As Much Value As The Model No 4: Speed Is Crucial Notes & Resources Forrester spoke with 10 data experts from consultancy, vendor, and end user companies and interviewed 10 firms with advanced analytics capabilities. Further inputs came from vendor briefings and client inquiry calls. Related Research Documents Brief: Why Data-Driven Aspirations Fail The Forrester Wave™: Big Data Predictive Analytics Solutions, Q2 2015 Q&A: Forrester’s Top Five Questions About Big Data Recommendations 7 Add Data Science Accelerator Projects To Your Road Map What It Means 8 The Rise Of The Algorithm Will Cull Out Lagging Businesses 9 Supplemental Material Forrester Research, Inc., 60 Acorn Park Drive, Cambridge, MA 02140 USA +1 617-613-6000 | Fax: +1 617-613-5000 | forrester.com © 2015 Forrester Research, Inc. Opinions reflect judgment at the time and are subject to change. Forrester®, Technographics®, Forrester Wave, RoleView, TechRadar, and Total Economic Impact are trademarks of Forrester Research, Inc. All other trademarks are the property of their respective companies. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments Firms Hope Data Science Will Give Them An Edge Big data and advanced analytics initiatives are in full swing at most firms. Responding to Forrester’s Global Business Technographics® Data And Analytics Survey, 2015, 66% of global data and analytics decision-makers reported that their firms have expanded, have implemented, or are planning to implement big data technologies within the next 12 months.1 More than a third of technology decisionmakers reported that large-scale predictive modeling, data mining, or advanced analytics are part of their firms’ big data plans.2 As EA leaders and their organizations become serious about big data and analytics programs, recruiting skilled employees, including data scientists, is a top priority (see Figure 1). Data Science Is A Totally Different Animal Forrester spoke with 10 working data science leaders and interviewed 10 firms with advanced analytics capabilities.3 We wanted to discover whether business and technology management leaders understood what data scientists did and if they were delivering what they needed for success. We found out that data science is a different beast indeed: “The world of data science is alien to most IT and businesspeople.” (Andrew Jennings, chief analytics officer, FICO) Firms often have naive notions and objectives for their data science programs. For example: › Firms want certainty, and they engineer analytics to deliver it. Businesses like certainty, and technology managers have spent a lot to ensure it. They cleanse data, enrich it, and make it “high quality.” They sample, normalize, and conform and aggregate it. One data professional with deep financial services experience typified common thinking when he told us, “If data isn’t clean and normalized, it is not useful.” That’s not always true in data science. › Data scientists work with uncertainty. Relative to an initial line of inquiry, Dr. Tye Rattenbury, director of data science at Trifacta, told us that most data is “analytically useless.” Assessing the validity of insights that scientists might extract is a big part of data science work. In other words, data scientists do a lot of forensics to understand underlying processes and spot insight opportunities. They do things like hypothesis development, feature engineering, pipeline building, and model training and tuning. Furthermore, their output is rarely certainty — it’s usually a summary of probable results. © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 2 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments FIGURE 1 Recruiting For Advanced Skills Is A Top Priority For Firms Seeking To Be Data-Driven “What are your firm’s plans for the following data-driven initiatives?” Expanding/implemented Planning to implement within the next 12 months 51% Recruiting more people with advanced data skills 23% Combining content management and data management programs into a unified information management program 47% 24% Investing more in business-friendly, self-service visualization and analytics 46% 24% Implementing a single view of the customer 49% 22% Changing our management culture to rely more on quantitative decisions 48% 22% Expanding our ability to source external data 47% 23% Changing our processes to promote data stewardship and sharing 46% 23% Creating a data innovation capability 45% 23% Building predictive systems 45% 23% Creating an organizational CoE for BI and/or advanced analytics 45% 23% Investing in distributed real-time insight-delivery technology 45% 22% Creating a business-led data stewardship or governance program 45% 22% Investing in a data-hub-based big data technology platform 44% 21% Changing management incentives to promote data sharing 44% 21% Appointing a chief data officer 45% 16% Base: 3,005 global data and analytics decision-makers Source: Forrester’s Global Business Technographics® Data And Analytics Survey, 2015 © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 3 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments Data Scientists Want You To Understand Four Basic Principles We found that the mismatch between data science and traditional analytics was indeed causing pain. For example, well-meaning technology professionals often respond to data science requests either too slowly or with data that isn’t suitable to their needs. Enterprise architects can help their firms get the most from their data science investments by understanding four principles (see Figure 2). FIGURE 2 Understanding These Four Things Will Help You Get More From Your Firm’s Data Science Investment Keys to understanding data science The old way The data science way It’s a team sport. Analysts and data engineers work Data scientists need business and data in separate organizations. engineers to help them find and test insight. The sushi principle: Raw data is often better. Data is delivered processed, enriched, and sanitized. Data scientists need fresh raw data to feed experiments. The process can be more Analytics are designed to answer valuable than a model. specific questions. Data scientists are learning new things as they navigate the data. There’s a need for speed. Requests for data take hours or days to fill. Data scientists think differently when they can work quickly. No. 1: Data Science Is A Team Sport Data scientists universally told us one thing: “It’s not magic.” They complained that many businesspeople have the notion that they could apply data science to everything — just send the data scientists away to find interesting things and tell them to come back with insights. The truth turns out to be quite different. For example, predictive models are only accurate when adequate data exists. Finding out which customers might be predictable is itself a data Data scientists universally told mining exercise, and the answer might be “none.” us one thing: “It’s not magic.” Furthermore, the experts explained that data science isn’t like other business analytics and reporting. Engineers and data specialists don’t work up the reports and data and wait for analysts or managers to consume them. Instead, data science is an exercise of exploration and discovery that requires a diverse team: › Experts with business acumen and data orientation translate data into insights. According to Saum Mathur, senior vice president of big data analytics at CA Technologies, data science, as part of business analytics, is a “high touch” activity. Business executives must constantly guide data © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 4 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments scientists as they work. They keep the data scientists focused on delivering the most important insights, then approve the analytics results and communicate them to business stakeholders in business language. › Data scientists formulate testable hypothesis and assumptions. In data science, data scientists must tease out meaningful business questions and translate them into a hypothesis that they can answer with data. There are often hidden assumptions that they must monitor and evaluate as well. For example, in 2009, when avian influenza, also known as bird flu, became a pandemic, millions of people with no symptoms started searching the Internet to get more information about it. This threw off the Google Flu Trends model, which implicitly assumed that the correlation between the number of Internet searches and the actual occurrences of the flu would remain stable.4 › Developers help data scientists deploy insight. The output of data science — a prediction, a model, or a score — is useless until the organization deploys it in a way that prompts action. Increasingly, this means that data science intersects with software development. For example, Art.com coded up a complex clustering algorithm that generated “similar to” search results as searchers used new terms.5 While some firms still prefer their data scientists to also play the role of code slingers, most need developers on the team. › Data engineers and technology pros support the process. The most mature data science organizations we find — Google, LinkedIn, Netflix, Tesla Motors, Uber, and Yahoo — all talk about data pipelines. These are not yesterday’s extract, transform, and load (ETL) processes, where all data ends up in the warehouse. Rather, data engineers and technology pros work together, DevOps style. They care for services that provision data from everywhere — from raw streams at the source to carefully curated customer master data. No. 2: The Sushi Principle: Raw Data Is Often Way More Interesting Enterprise architects must understand that raw data is superfood for data scientists. But in Forrester’s Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online Survey, 61% of enterprise architects responded that reports and dashboards are the primary output of analytics at their firm.6 Organizations often assemble these Data is the food on which data reports and dashboards from shiny, clean — and scientists thrive, and raw data, nutrient-poor — data. While well intentioned, they like raw food, is often much often miss the mark. better. “I’ve been in roles where IT would often give us the cleaned, sanitized data from the warehouse. All the interesting things we were looking for were gone. I can fish for myself — in fact, I want to.” (Abe Gong, data science consultant) © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 5 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments Business and tech management leaders must recognize that data is the food on which data scientists thrive and that raw data, like raw food, is often much better: › Data scientists love to tap raw data at its source. When well-meaning ETL process clean, conform, and aggregate data, items of interest to data scientists often disappear. For example, one interviewee said that technology management furnished his team with aggregated claims records. He wanted to investigate why claims from one system are different than others and needed to examine the raw data, but it had been purged. › Data scientists need help with archive retrieval and production databases queries. You can’t always let data scientists run queries on production databases — that could have a negative impact on online transaction processing (OLTP) workloads. Also, data scientists often need deep historical data that has been archived. Enterprise architects can help by planning data access services that include mirroring product data and self-service archive retrieval. No. 3: The Process Can Yield As Much Value As The Model The nature of data science, like any science, is that experiments fail and hypotheses turn out to be wrong. However, the process can lead to new, adjacent insight opportunities. This can be trouble for business execs who don’t consider failure an option or can’t stomach the many twists, turns, and dead ends: “A predictive model might never deliver huge benefits, but you learn a whole lot about your business through the rigor it takes to build a one.” (Dr. Tye Rattenbury, director of data science, Trifacta) “If you do any kind of project properly, you learn about your organization, your processes, the makeup of customers, your portfolio, who buys what, etc. A good data project finds answers to these questions.” (Andrew Jennings, chief analytics officer, FICO) In many cases, the journey yields unanticipated insights that spur further innovation and revenues: › A financial services firm discovered the need to build a risk-tolerance score. Samuel Croker, senior solution architect at SAS, told us about a financial services client that wanted to model customers’ readiness for retirement. In the process, it discovered the need to model customers’ financial risk tolerance, something the client had not previously thought necessary. This spun off a completely new data science project. › Alerts often lead to more-complex optimization projects. Joel Cawley, a general manager at IBM, is building a business focused on developing insights and integrating them into clients’ processes. For example, his team combines weather data with customers’ first-party data to generate alerts. He told us that often, a simple alert leads customers into a predictive analytics project for optimizing an entire process. For example, an alert of an impending storm’s impact on equipment leads customers to optimize equipment procurement and maintenance. © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 6 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments No 4: Speed Is Crucial Data science is an iterative process, and each iteration brings requests for more data. At many firms, fetching new data takes time, especially if it’s deeply archived or massive in size. Abe Gong told us that a data science team that can get an answer in less than a second is going to think and behave differently than it would in a situation where getting an answer takes a minute or even 10 seconds. Enterprise architects who understand this can help their data science teams by: › Building high-performance data services. Go beyond building a Hadoop data lake with an SQL engine, Apache Spark, and a predictive analytics tool on top. Consider adding a high-performance, cache-optimized data virtualization technology from Cisco Systems, Denodo Technologies, or Informatica.7 Or think about enabling federated SQL capabilities from Microsoft, Oracle, or Teradata. Also, provision access to streaming data via a publish/subscribe data sink from Confluence’s Kafka or Google’s Cloud Dataflow. › Provisioning data-preparation tools. Not only do data scientists need to access data quickly but they also need to work with it easily. Self-service data preparation tools are emerging to fit this need. IBM, Informatica, and Teradata have all beefed up their offerings here, while smaller firms like Paxata and Trifacta feature innovations like advanced machine learning or distributed Hadoop processing. Recommendations Add Data Science Accelerator Projects To Your Road Map The data scientists we spoke with articulated various characteristics of an ideal architecture that can support many forms of data science (see Figure 3). Enterprise architecture leaders should accelerate their firms’ investments by building these components into their road maps: › A data science workbench. Data scientists need a wide variety of tools to do their work. Work with your data scientists on a toolbox strategy that enables easy data extraction, preparation, and mining. Data scientists also need model development and script-library management tools from vendors like BlueData Software, Dato, Domino Data Labs, IBM, Revolution Analytics, SAS, Skytree, Trifacta, and Waterline Data. › A flexible framework for standing up various analytics components on demand. There is no one-size-fits-all solution for data science’s data technology needs. Data scientists might need Spark ML or SparkStreaming (microbatch). Some might do complex relationship analytics using a graph database. Use virtualization and containers so you can rapidly stand and manage analytics and application workloads easily in a shared farm. eBay, LinkedIn, and MapR Technologies started project Apache Myriad to do just this. © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 7 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments › A managed data catalog and set of data APIs. Metadata management will help your data science teams tremendously. But be sure you understand what metadata is most important to them — like data locality, API interfacing rules, and most importantly, lineage. A big part of data science involves explaining analytics models to executives who have to buy into the predictions. Data lineage will help them prove models out. › An insights fabric that can address streaming, batch, and federated data needs. Data scientists need data from everywhere and at various stages of processing. Processed transaction history from the warehouse may be fine, but they may need raw customer device data as well and may need it very quickly. An insights fabric extends Forrester’s information fabric architecture with on-demand analytics workload infrastructure, like Apache Spark or Kafka clusters. It must be able to tap data along a continuum, from raw and messy to clean and normalized. FIGURE 3 An Ideal Architecture For Supporting Data Science Data science workbench Microbatch SQL NoSQL Algorithm libraries Data catalog/API management Streaming capture pub/sub Data pipeline Data lake Federation services Existing analytics and operational databases What It Means The Rise Of The Algorithm Will Cull Out Lagging Businesses The main product of data science is an algorithm. The more data science you do, the more algorithms you will deploy to help make or automate business decisions. For example, Signet Bank was the first bank to employ an algorithm to set credit card rates based on consumer risk. Its credit card division was so successful, it spun off and became Capital One. Now banks use hundreds or thousands of algorithms and are building data science teams to create even more — and this is happening in every industry. © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 8 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments Enterprise architects must step back and recognize what is occurring: Today, we have billions of transistors in our phones and computers, but in the 1950s, a radio had just one or two. Similarly, there is no upper limit on how many algorithms firms will ultimately have and how they might use these for profit. Michael Lewis’ Flash Boys: A Wall Street Revolt illustrates how this has already played out in highfrequency trading.8 These firms had the financial motivation to spend enormous sums of money on technology and algorithms to shave microseconds off transactions 20 years ago, using complex algorithms and super-fast networks. Today, computing power and connectivity have become much cheaper. As we move into the age of connected things, the quality, quantity, and effective deployment of algorithms will distinguish the winners and losers in many industries. This puts understanding and supporting your data scientists in a whole new perspective — and the job you save may be your own! Engage With An Analyst Gain greater confidence in your decisions by working with Forrester thought leaders to apply our research to your specific business and technology initiatives. Analyst Inquiry Analyst Advisory Ask a question related to our research; a Forrester analyst will help you put it into practice and take the next step. Schedule a 30-minute phone session with the analyst or opt for a response via email. Put research into practice with in-depth analysis of your specific business and technology challenges. Engagements include custom advisory calls, strategy days, workshops, speeches, and webinars. Learn more about inquiry, including tips for getting the most out of your discussion. Learn about interactive advisory sessions and how we can support your initiatives. Supplemental Material Survey Methodology Forrester’s Global Business Technographics Data And Analytics Survey, 2015 was fielded in January through March 2015 of 3,005 business and technology decision-makers located in Australia, Brazil, Canada, China, France, Germany, India, New Zealand, the UK, and the US from companies with 100 or more employees. © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 9 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS October 26, 2015 The Four Things Data Scientists Wish You Knew Get The Most Out Of Your Data Science Investments Forrester’s Business Technographics provides demand-side insight into the priorities, investments, and customer journeys of business and technology decision-makers and the workforce across the globe. Forrester collects data insights from qualified respondents in 10 countries spanning the Americas, Europe, and Asia. Business Technographics uses only superior data sources and advanced datacleaning techniques to ensure the highest data quality. Forrester’s Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online Survey was fielded to 170 technology management professionals involved in or familiar with EA from our ongoing technology management research panel and readers who have demonstrated an interest in EA research. The panel consists of volunteers who join on the basis of interest and familiarity with specific technology management topics. For quality assurance, panelists are required to provide contact information and answer basic questions about their firms’ revenue and budgets. Forrester fielded the survey from June to August 2015. Respondent incentives included a complimentary webinar that discusses the survey results. Exact sample sizes are provided in this report on a questionby-question basis. Panels are not guaranteed to be representative of the population. Unless otherwise noted, statistical data is intended to be used for descriptive and not inferential purposes. Endnotes 1 Source: Forrester’s Global Business Technographics Data And Analytics Survey, 2015. 2 We asked technology decision-makers to identify all technologies that are included in their plans for big data. While only by 1 percentage point, “large scale predictive modeling, data mining, or other advanced analytics” was the most common component. The second-most-popular answer was “public cloud big data services.” Source: Forrester’s Global Business Technographics Data And Analytics Survey, 2015. 3 Data science is the extraction of knowledge from large volumes of data that are structured or unstructured, which is a continuation of the field data mining and predictive analytics. The people who perform these analytics are data scientists. Forrester also includes financial service actuaries; statistical quantitative analysts, known as quants; and marketing scientists under the data scientist label. 4 Google created a model to predict the movement of the flu at a very high level of detail based on a model of Internet search patterns. Their model became inaccurate when the 2009 bird flu pandemic caused millions of people with no flu symptoms to start searching the Internet to get more information about the flu. See the “Google Flu Trends — A Big Data Fail? Not Exactly” Forrester report. 5 Most online websites do offline batch jobs to generate clusters of items similar to what web users are searching for. Art.com generated clusters on the fly so users using new search terms would see relevant “similar to” results in minutes. Source: May 2015 briefing by Art.com executives with Forrester. 6 Using a 5-point scale, we asked respondents to indicate their level of agreement or disagreement with the following statement: “The output of our analytics is mostly reports and dashboards meant for broad consumption.” Source: Forrester’s Q3 2015 Global State Of Strategic Planning, Enterprise Architecture, And PMO Online Survey. 7 As enterprise architects look at how to deliver a trusted, real-time, integrated, and secure data platform to support applications, they look at data virtualization. To review how the nine leading vendors in the marketplace faired against Forrester’s 60-criteria evaluation, see the “The Forrester Wave™: Enterprise Data Virtualization, Q1 2015” Forrester report. 8 Source: Michael Lewis, Flash Boys: A Wall Street Revolt, W.W. Norton & Company, 2015. © 2015 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. [email protected] or +1 866-367-7378 10 We work with business and technology leaders to develop customer-obsessed strategies that drive growth. PRODUCTS AND SERVICES › › › › › › Core research and tools Data and analytics Peer collaboration Analyst engagement Consulting Events Forrester’s research and insights are tailored to your role and critical business initiatives. ROLES WE SERVE Marketing & Strategy Professionals CMO B2B Marketing B2C Marketing Customer Experience Customer Insights eBusiness & Channel Strategy Technology Management Professionals CIO Application Development & Delivery › Enterprise Architecture Infrastructure & Operations Security & Risk Sourcing & Vendor Management Technology Industry Professionals Analyst Relations CLIENT SUPPORT For information on hard-copy or electronic reprints, please contact Client Support at +1 866-367-7378, +1 617-613-5730, or [email protected]. We offer quantity discounts and special pricing for academic and nonprofit institutions. Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations. 127141 For more information, visit forrester.com.
© Copyright 2026 Paperzz