Cliff Candiotti – [email protected] May 2017 The Hybrid Data Integration Platform Integrating Structured and Unstructured Data © 2017 IBM Corporation Disclaimer : IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 2 © 2017 IBM Corporation Legal Disclaimer • © IBM Corporation 2017. All Rights Reserved. • The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. • References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. • If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. • If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete: All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. • Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both. • If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. • If you reference Java™ in the text, please mark the first use and include the following; otherwise delete: Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. • If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete: Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. • If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete: Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. • If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete: UNIX is a registered trademark of The Open Group in the United States and other countries. • If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete: Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. • If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only. 3 © 2017 IBM Corporation Users Need Data that is Accurate, Reliable and Trustworthy Data Engineer Public Enterprise Data Web/Mobile Data Social IoT 4 Data Scientist Developer Business Analyst Data Steward Businesses are challenged to answer key questions about the integrity of their data: Where is my data? Who has access to this data? Am I protecting sensitive data? Do I have the right data and context? How do I move and transform complex datasets? Do I understand the risk of using this data “as is”? How do I make data more readily available to my consumers? © 2017 IBM Corporation Data Integration And Governance Are Key Data Steward Business Analyst Integrate Trust Discover, integrate and transform data from all types of sources Establish an accurate single view of data from various systems Govern Data Scientist Data Engineer Put data in context and mitigate risk with unified governance capabilities 5 © 2017 IBM Corporation Information Empowerment for your Data Ecosystem Data Governance .. powered by Information Server InfoSphere Information Server Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer Data Quality Data Governance Understand & Collaborate • Catalog technical metadata & align w/ business language • Manage (big) data lineage Compliance reporting Data Quality Cleanse & Monitor • Analyze & validate with enhanced classification • Cleanse & standardize • Define, manage & monitor data rules + exceptions Data Integration Data Integration Transform & Deliver • Massive scalability • Power for any complexity • Deliver in batch and/or realtime with change capture • common connectivity • shared metadata • security (data privacy functions included) • common execution engine with flexible deployments (native on YARN) 6 © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform - Scalability Performance Runtime engine providing unlimited scalability through all objects tasks in batch/real-time, ETL/ELT/DV/SOA Maximize Resource utilization with “Anywhere” Execution Optimize your Integration/Transformation and Data Quality workload based on data locality and resources availability Design your integration, data preparation or cleansing once and run it on your Hadoop Cluster, on your traditional engine or optimize to run on your database 7 © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform - Scalability Performance Runtime engine providing unlimited scalability through all objects tasks in batch/real-time, ETL/ELT/DV/SOA Massive scalability needs an MPP shared nothing Architecture Dynamic Instantly get better performance as hardware resources are added Extendable Add new compute nodes to dynamically scale out Partitioned No contention or upper limitation on throughput 8 © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform - Scalability Performance Runtime engine providing unlimited scalability through all objects tasks in batch/real-time, ETL/ELT/DV/SOA Proven to scale to large volumes 9 Global Bank Data Services Co Process 500,000 tps with complex transformation and guaranteed delivery Information Server powered grid processing over 80+ trillion records each month Global Retailer Health Care Daily processing of one trillion rows of data 200,000 programs built in Information Server on a grid/cluster of low commodity hardware. © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform – Capabilities Transformation Extensive set of pre-built objects that act on data to satisfy both simple & complex data integration tasks Transformation Features for Big Data Integration of Quality & Transformation Components Leverages power of Hadoop or a parallel dbms engine for any transformation execution Easily extensive library of pre-build logic constructs for: • Simple & Complex integration task • Hierarchical and relational transformations • Warehouse-specific features 10 © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform – Extensibility Connectivity Native access to common industry databases and applications for both Structured and Unstructured sources and targets Connectivity Features for Big Data Scalable Hadoop File System Connectivity to read and write to Hadoop in parallel Direct feed to InfoSphere Streams for real-time analytical processing Open Source Accelerators for other Big Data/NoSQL stores…Hive, MongoDB, Cassandra, Cloudant, CouchDB Source & Targets for the entire Enterprise High performance native support for DBMS (DB2, NZ, Oracle, Teradata, SQL Server, etc…) Access to zOS sources (DB2 z/OS, VSAM, ISAM, etc…) Specific ERP connectors Flexible connectivity via ODBC & JDBC Flat & Hierarchical formats (Flat, Cobol, XML, native Excel Real time: SOA, Stream, CDC Out of the box connectivity to ERP systems Extensible: Cloud, Java, External Source/Targets, etc… 11 © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform - Accountability Built-in Governance Maximizes business & IT collaboration providing business terms, policies, end to end data linage, advanced impact analysis etc. Information Server is the only platform with true built-in Enterprise level governance Common metadata paradigm for all enterprise data • Comprehensive and extensible 12 Data lineage and impact analysis for any activity (including Hadoop workloads) Metadata support for HDFS and NoSQL data stores Connected & Linked semantics Controlled & auditable from any source Rich Representation (business, design, technical, operational metadata) © 2017 IBM Corporation IBM’s Hybrid Data Integration Platform – Trusted Data Integrated Data Quality Single user experience for data integration as well as designing & running data validation, standardization & matching rules Discover Assess Cleanse Discovery of Business entities across heterogeneous sources Data Classification & Validation Rules linked to business rules for Impact Analysis Business-driven Data Standardization and Matching Validate Monitor & Remediate Life Cycle Governance Rule-based data validation to ensure complete and consistent data 13 Enterprise-wide DQ Exception Monitoring and collaborative remediation Ownership and management of Policies and Rules © 2017 IBM Corporation What about tomorrow, next month, next year ? Is more technology needed to get value from data? 14 © 2017 IBM Corporation Have the Right Data at the Right Time Users don't have access to the right data at the right time Creates a strain on IT to meet LOB needs More time integrating and cleansing Less time interpreting and analyzing Prevents timely and accurate decisions Self-service access and capabilities Provides data users with the necessary access to manage data on their terms Empowers both IT and LOB to focus on high priority projects 15 © 2017 IBM Corporation Anyone can do it • Zero code • Zero configuration • Data driven or fully guided flow driven • Semi-technical person in a business environment • Friendly modern style graphical design • instant impact view • closely guided design Automator Business Analyst Integrator Data Analyst Data Shadow Scientist Integrator Data-driven/ fully guided self-service 16 • Skilled integration practitioners • Graphical assist, but full code environment. • Hybrid deployment • Connect to anything Data Engineer Developer API Integration Full Stack Front End Specialist Developer Developer Developer Graphical flow design with full control Service & API orchestration © 2017 IBM Corporation Business Agility demands Expansion of Integration User Community Data Preparation & Curation Self-Service Integration Enterprise-class Integration WHO: Business users / Data owners non-technical users WHO: Shadow IT, LOB users, Data Scientist, semi-technical WHO: Integration Specialist, Integration Developer, highly technical WHAT: Visual data shaping/curation WYSIWG Closely guided and controlled (shop for data paradigm) Manipulation of 1-2 data sets at a time WHAT: Combined visual & flow based design Template / pattern approach Zero configuration Implicit validation Collaboration WHAT: Comprehensive library of integration, transformation & quality operations Support for comprehensive integration flows and projects Expandability for custom operations Full control for configuration & parameterization Top-down or bottom-up design approach Support for team development process 17 © 2017 IBM Corporation 17 Bluemix Data Connect Data Integration through Cloud Services Data Connect will provide the self-service data preparation, integration and governance for the Watson Data Platform 18 © 2017 IBM Corporation IBM’s Hosted Data Integration Solutions Data Quality Data Integration Information Server on Cloud Enterprise Edition DataStage on Cloud DataStage on Cloud Designer Client Information Server on Cloud Data Quality Information Governance Catalog on Cloud Now available as 19 Hosted services Data Governance © 2017 IBM Corporation IBM’s cloud-first strategy supports hybrid environments and supports customers in their migration to the cloud Cloud-First Statement of Direction and Design Principles Cloud-native Core Hosted Cloud Competitive cloudnative fully managed services… Convenience without compromising power and control… Retain our market leadership and support our customers… • • • • DataStage on Cloud • DataStage Designer Client on Cloud • Information Server on Cloud Data Quality • Information Server on Cloud Enterprise Edition • Information Governance Catalog on Cloud • • • 20 Bluemix Data Connect Bluemix Lift Bluemix Data Connect (Canvas) Butterfly (Beyond MDM) ILG.Next (Cosmos) • • • • Information Server • DataStage • QualityStage • Information Analyzer • Info Governance Catalog Data Replication Master Data Management StoredIQ StoredIQ for Legal © 2017 IBM Corporation Information Server / Data Connect Hybrid Journey Utilizing the Best from both sides! Bridge & Combine Data Connect Converge 21 © 2017 IBM Corporation Cognitive Integration Design Next Gen DataStage Designer What: • • ZERO Install Browser based design ZERO migration view existing jobs/projects in new designer Ability to use new & old Designer side by side New simplified Design experience without compromising capabilities • Who: • Phase 1: New Integration experience for Integration Specialist / Integration Developers • Phase 2/3: Self service integration and preparation for Business and LOB users 22 22 © 2017 IBM Corporation Maximize your IT resources utilization through hybrid execution • • Optimize your integration workload based on data locality and resource availability DataStage already enables you to design your transformation once and run it on the PX Engine, a Hadoop cluster, or a database Bluemix Data Connect provides a new web-based self-service designer with a code-gen frameworks to support similar runtime targeting DataStage/QualityStage Designer Data Connect UI Execute “Anywhere” Databases 23 PX ETL Engine Spark as a Service / Local © 2017 IBM Corporation New, expanded or enhanced Connectivity for both Structured and Unstructured Data 24 © 2017 IBM Corporation 24 Unified Governance - A New Era of Governance Governance 1.0 Data within the firewall Distinct capabilities for structured & unstructured data Compliance use cases: e-Discovery, Records, Archiving, GDPR, BCBS 239, Basel II etc. IT led Governance 2.0 Data, API’s, & Analytics in or outside the firewall (Hybrid platform) Common capabilities: Policy Administration, Metadata, Consent Management, & Stewardship Compliance & analytics use cases: Information Repositories (e.g. Data Lakes), Self-service analytics, Regulations, & Data Science, GDPR, BCBS 239, Basel II etc. IT & Business led IT 25 IT Analysts Data Scientists Developers © 2017 IBM Corporation Use Cases Driving a Unified Governance Strategy GOVERNANCE FOR COMPLIANCE GOVERNANCE FOR INSIGHTS Discover, classify and manage information in ways that meet the obligations enforced by both regulatory and corporate mandates Provide safe access to trusted, high quality, fit-for-purpose data while facilitating effective collaboration among team members Regulations (e.g. GDPR) Self-Service Access to Data & Analytics Privacy & Protection Governed Enterprise Information Repositories (such as Data Lakes) eDiscovery Records & Retention Archiving Audit Readiness 26 © 2017 IBM Corporation Unified Catalog – The Core of the Unified Governance GOVERNANCE SERVICES Metadata Test Data Auto Info Classification INFORMATION SECURITY Privacy/ Protect Shop 4 Info Mastered, Open, Enterprise Information Catalog Archiving Retention/ Disposal Collaborate Structured data Un-structured info Other sources Transformation & Delivery Fabric Quality Mgmt Records eDiscovery Workflows Consent 27 Policy Mgmt Lineage © 2017 IBM Corporation What do we mean by Hybrid Integration? Optimizing workloads based on Data Locality 28 Distributing workloads across loosely coupled runtimes Choice of Runtimes based on your data delivery requirements On-demand / elastic expansion Combined SaaS and on-premise self service prep / integration © 2017 IBM Corporation 28 Thank You © 2017 IBM Corporation 30 © 2017 IBM Corporation
© Copyright 2026 Paperzz