From Moore to Metcalf: The Network as the Next Database Platform Michael Franklin UC Berkeley & Truviso (formerly, Amalgamated Insight) HPDC June 2007 Outline • • • • • Motivation Stream Processing Overview Micro-Architecture Issues Macro-Architecture Issues Conclusions Michael Franklin June 2007 Moore’s Law vs. Shugart’s: The battle of the bottlenecks • Moore: Exponential Processor and Memory improvement. • Shugart: Similar law for disk capacity. • The yin and yang of DBMS architecture: “disk-bound” or “memory-bound”? • OR are DBMS platforms getting faster or slower relative to the data they need to process? • Traditionally, the answer dictates where you innovate. Michael Franklin June 2007 Metcalf’s Law will drive more profound changes • Metcalf: “The value of a network grows with the square of the # of participants”. • Practical implication: all interesting datacentric applications become distributed. • Already happening: • Service-based architectures (and Grid!) • Web 2.0 • Mobile Computing Michael Franklin June 2007 Bell’s law will amplify Metcalf’s Bell: “Every decade, a new, lower cost, class of computers emerges, defined by platform, interface, and interconnect.” • • • • • Mainframes 1960s Minicomputers 1970s Microcomputers/PCs 1980s Web-based computing 1990s Devices (Cell phones, PDAs, wireless sensors, RFID) 2000’s Enabling a new generation of applications for Operational Visibility, monitoring, and alerting. Michael Franklin June 2007 The Network as platform: Challenges Barcodes PoS System Information Feeds XYZ 23.2; AAA 19; … • Data Constantly “On-the-Move” RFID Mobile Devices • Increased Data Volume • Increased Heterogeneity & Sharing • Shrinking decision cycles • Increased data and decision complexity Transactional Systems Clickstream Michael Franklin June 2007 Telematics Blogs/Web 2.0 Sensors The Network as platform: Implications Lots of challenges: • • • • • Integration (or “Dataspaces”) Optimization/Planning/Adaptivity Consistency/Master Data Mgmt Continuity/Disaster Mgmt Stream Processing (or data-on-the-move) My current focus (and thus, the focus of this talk) is the latter. Michael Franklin June 2007 Stream Processing My view: Stream Processing will become the 3rd leg of standard IT data management: • OLAP splitoff from OLTP for historical reporting. • OLSA (On-line Stream Analytics) will handle: • • • • Monitoring Alerting Transformation Real-time Visability and Reporting Note: CEP (Complex Event Processing) is a related, emerging technology. Michael Franklin June 2007 Stream Processing + Grid? • On-the-fly stream processing required for high-volume data/event generators. • Real-time event detection for coordination of distributed observations. • Wide-area sensing in environmental macroscopes. Michael Franklin June 2007 Stream Processing - Overview Turning Query Processing Upside Down Traditional Database Approach Data Stream Processing Approach Static Batch Reports Queries Bulk Load Data Continuous, Visibility, Alerts Results Results Data Stream Data Warehouse Live Data Streams Processor • Batch ETL & load, query later • Poor RT monitoring, no replay • Always-on data analysis & alerts • RT Monitor & Replay to optimize • DB size affects query response • Consistent sub-second response Michael Franklin June 2007 Example 1: Simple Stream Query A SQL smoothing filter to interpolate dropped RFID readings. SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id Smoothed output Smoothing Filter Raw readings Time Michael Franklin June 2007 Example 2 - Stream/Table Join Every 3 seconds, compute avg transaction value of high-volume trades on S&P 500 stocks, over a 5 second “sliding window” SELECT FROM WHERE GROUP BY T.symbol, AVG(T.price*T.volume) Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S T.symbol = S.symbol AND T.volume > 5000 T.symbol Stream Window clause Note: Output is also a Stream Michael Franklin June 2007 Table Example 3 - Streaming View Positive Suspense: Find the top 100 storeskus ordered by their decreasing positive suspense (inventory - sales). CREATE VIEW StoreSKU (store, sku, sales) as (SELECT FROM WHERE GROUP BY P.store, P.sku,SUM(P.qty) as sales POSLog P[RANGE `1 day’ SLIDE `10 min’], Inventory I P.sku = I.sku and P.store = I.store and P.time > I.time P.store, P.sku) SELECT FROM WHERE ORDER BY LIMIT (I.quantity – S.sales) as positive_suspense StoreSKU S, Inventory I S.store = I.store and S.sku = I.sku positive_suspense DESC 100 Michael Franklin June 2007 Application Areas • • • • • • • • • Financial Services: Trading/Capital Mkts SOA/Infrastructure Monitoring; Security Physical (sensor) Monitoring Fraud Detection/Prevention Risk Analytics and Compliance Location-based Services Customer Relationship Management/Retail Supply chain/Logistics … Michael Franklin June 2007 Real-Time Monitoring A Flex-based dashboard driven by multiple SQL queries. Michael Franklin June 2007 16 The “Jellybean” Argument Conventional Wisdom: “can I afford real-time?” Do the benefits justify the cost? Reality: With stream query processing, real-time is cheaper than batch. • minimize copies & query start-up overhead • takes load off expensive back-end systems • rapid application dev & maintenance Michael Franklin June 2007 Historical Context and status • Early stuff: • Data “Push”, Pub/Sub, Adaptive Query Proc. • Lots of non-SQL approaches • • Rules systems (e.g., for Fraud Detection) Complex Event Processing (CEP) • Research Projects led to companies • • • TelegraphCQ -> Truviso (Amalgamated) Aurora -> Streambase Streams -> Coral8 • Big guys ready to jump in: BEA, IBM, Oracle, … Michael Franklin June 2007 Requirements • High Data Rates: 1K (SOA monitoring) up to 700K rec/sec (option trading) • # queries: single digits to 10,000’s • Query complexity • Full SQL + windows + events + analytics • Persistence, replay, historical comparison • Huge range of Sources and Sinks Michael Franklin June 2007 Stream QP: Micro-Architecture Single Node Architecture Continuous Query Engine Concurrent Query Planner Triggers/ Rules Streaming Adaptive SQL Query Processor Replay Database Active Data Ingress Egress External Archive © 2007, Amalgamated Insight, Inc. … Connectors Transformations Connectors XML CSV MQ MSMQ JDBC .NET … Other CQE Instances Transformations Other CQE Instances Michael Franklin June 2007 XML Message Bus Proprietary APIs Pub/Sub Alerts Events • 700K ticks/second for FS • Wirespeed for networking/security Ingress • Minimal latency • FS trading particularly sensitive to this • Fault tolerance • Especially given remote sources • Efficient (bulk) data transformation • XML, text, binary, … • Work well for both push and pull sources Michael Franklin June 2007 Transformations • Must support high data rates XML CSV MQ MSMQ JDBC .NET Connectors Ingress Issues (performance) Connectors • • • • • • • Transformations Egress Issues (performance) Must support high data rates Minimal latency Egress Fault tolerance Efficient (bulk) data transformation Buffering/Support for JDBC-style clients Interaction with bulk warehouse loaders Large-scale dissemination (Pub/Sub) Michael Franklin June 2007 XML Message Bus Prop. APIs Pub/Sub Alerts Events Continuous Query Engine Query Processing (Single) • Simple approach: Concurrent Query Planner Triggers/ Rules Replay Database Active Data Streaming Adaptive SQL Query Processor • Stream inputs are “scan” operators • Adapt operator plumbing to push/pull • “Exchange” operators/ Fjords • Need to run lots of these concurrently • Index the queries? • Scheduling, Memory Mgmt. • Must avoid I/O, cache misses to run at speed • Predicate push-down - a la Gigascope Michael Franklin June 2007 QP (continued) • Transactional/Correctness issues: • Never-ending queries hold locks forever! • Need efficient heartbeat mechanism to keep things moving forward. • Dealing with corrections (e.g., in financial feeds). • Out-of-order/missing data • “ripples in the stream” can hurt clever scheduling mechanisms. • Integration with external code: • Matlab, R, …, UDFs and UDAs Michael Franklin June 2007 Query Processing (Shared) • Previous approach misses huge opportunity. • Individual execution leads to linear slowdown • Until you fall off the memory cliff! • Recall that we know all the queries • we know when they will need data • we know what data they will need • we know what things they will compute • Why run them individually (as if we didn’t know any of this)? Michael Franklin June 2007 Shared Processing - The Überquery Form “query plan” from query text SELECT T.symbol, AVG(T.price*T.volume) Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S New query plan enters the system FROM WHERE T.symbol = S.symbol AND T.volume > 5000 GROUP BY T.symbol SELECT … More queries arrive … FROM SELECT … … WHEREFROM…. … SELECT … Queries get compiled into plans GROUPWHERE BY … FROM…. … GROUPWHERE BY … …. GROUP BY … Each plan is folded into the global plan No redundant modules = Super-Linear Query Scalability Shared Query Engine Michael Franklin June 2007 Shared QP raises lots of new issues • Scheduling based on data availability/location and work affinity. • Lots of bittwiddling: need efficient bitmaps. • Query “folding” - how to combine (MQO) • On-the-fly query changes. • How does shared processing change the traditional architectural tradeoffs? • How to process across multiple: cores, dies, boxes, racks, rooms? Refs: NiagaraCQ, CACQ, TelegraphCQ, Sailesh Krishnamurthy’s thesis Michael Franklin June 2007 Archiving - Huge area External Archive • Most streaming use-cases want access to historical information. • Compliance/Risk: also need to keep the data. • Science apps need to keep raw data around too. • In a high-volume streaming environment, going to disk is an absolute killer. • Obviously need clever techniques: • Sampling, Index update deferral, load shedding • Scheduling based on time-oriented queries • Good old buffering/prefetching Michael Franklin June 2007 Stream QP: Macro-Architecture HiFi - Taming the Data Flood In-network Stream Query Processing and Storage Headquarters Hierarchical Aggregation: Spatial & Temporal Regional Centers Warehouses, Stores Fast Data Path vs. Slow Data Path Dock doors, Shelves Receptors Michael Franklin June 2007 Problem: Sensors are Noisy • A simple RFID Experiment • 2 adjacent shelves, 6 ft. wide • 10 EPC-tagged items each, plus 5 moved between them • RFID antenna on each shelf Michael Franklin June 2007 Shelf RIFD - Ground Truth Michael Franklin June 2007 Actual RFID Readings “Restock every time inventory goes below 5” Michael Franklin June 2007 VICE: Virtual Device Interface [Jeffery et al., Pervasive 2006, VLDBJ 07] Vice API is a natural place to hide much of the complexity arising from physical devices. “Virtual Device (VICE) API” Michael Franklin June 2007 Query-based Data Cleaning Smooth CREATE VIEW smoothed_rfid_stream AS (SELECT receptor_id, tag_id FROM cleaned_rfid_stream [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= count_T) Point Michael Franklin June 2007 Query-based Data Cleaning Arbitrate CREATE VIEW arbitrated_rfid_stream AS (SELECT receptor_id, tag_id FROM smoothed_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= ALL (SELECT count(*) FROM smoothed_rfid_stream [range by ’5 sec’, slide by ’5 sec’] WHERE tag_id = rs.tag_id GROUP BY receptor_id)) Smooth Point Michael Franklin June 2007 After Query-based Cleaning “Restock every time inventory goes below 5” Michael Franklin June 2007 Adaptive Smoothing [Jeffery et al. VLDB 2006] Michael Franklin June 2007 SQL Abstraction Makes it Easy? • Soft Sensors - e.g., “LOUDMOUTH” sensor (VLDB 04) • • • • • • Quality and lineage Optimization (power, etc.) Pushdown of external validation information Automatic/Adaptive query placement Data archiving Imperative processing Michael Franklin June 2007 Some Challenges • How to run across the full gamut of devices from motes to mainframes? • What about running *really* in-the-network? • Data/query placement and movement • Adaptivity is key • “Push down” is a small subset of this problem. • Sharing is also crucial here. • Security, encryption, compression, etc. • Lots of issues due to devices and “physical world” problems. Michael Franklin June 2007 It’s not just a sensor-net problem Decision Latency Transactional Edge Devices PCs HandheldsPoS Readers Distributed Data Enterprise Apps ERP E-com CRM SCM Integration Bus Analytical DashBoards Reports Portal Alerts Operational BI Analytics Data Mining Business Intelligence OLAP OLAP OLAP Specialized OLAP OLAP OLAP OLAP OLAP Data Marts OLAP Batch Latency Transactional OLTP OLTP OLTP OLTP OLTP OLTP Data Stores OLTP OLTP OLTP Enterprise Data Warehouse Batch Load OLTP OLTP OLTP Exploding Data Volumes Query Latency Michael Franklin June 2007 Data Dissemination (Fan-Out) • Many applications have large numbers of consumers. • Lots of interesting questions on large-scale pub/sub technology. • Micro-scale: locality, scheduling, sharing, for huge numbers of subscriptions. • Macro-scale: dissemination trees, placement, sharing, … Michael Franklin June 2007 What to measure? (a research opportunity) • High Data Rates/Throughput • rec/sec; record size • Number of concurrent queries. • Query complexity • Huge range of Sources and Sinks • transformation and connector performance • Minimal Benchmarking work so far: • “Linear Road” from Aurora group • CEP benchmark work by Pedro Bizarro Michael Franklin June 2007 Conclusions • Two relevant trends: • Metcalf’s Law DB systems need to become more network-savvy. • Jim Gray and others have helped demonstrate the value of SQL to science. • Stream query processing is where these two trends meet in the Grid world. • A new (3rd) component of data management infrastructure. • Lots of open research problems for the HPDC (and DB) community. Michael Franklin June 2007 Resources • Research Projects @ Berkeley • TelegraphCQ - single-site stream processor • HiFi - Distributed/Hierarchical see www.cs.berkeley.edu/~franklin for links/papers • Good jumping off point for CEP and related info: www.complexevents.com • The company: www.truviso.com Michael Franklin June 2007
© Copyright 2026 Paperzz