HiFi - People @ EECS at UC Berkeley

From Moore to Metcalf:
The Network as the Next Database
Platform
Michael Franklin
UC Berkeley
&
Truviso
(formerly, Amalgamated Insight)
HPDC
June 2007
Outline
•
•
•
•
•
Motivation
Stream Processing Overview
Micro-Architecture Issues
Macro-Architecture Issues
Conclusions
Michael Franklin
June 2007
Moore’s Law vs. Shugart’s:
The battle of the bottlenecks
• Moore: Exponential Processor and Memory
improvement.
• Shugart: Similar law for disk capacity.
• The yin and yang of DBMS architecture:
“disk-bound” or “memory-bound”?
• OR are DBMS platforms getting faster or
slower relative to the data they need to
process?
• Traditionally, the answer dictates where you
innovate.
Michael Franklin
June 2007
Metcalf’s Law will drive more
profound changes
• Metcalf: “The value of a network grows
with the square of the # of participants”.
• Practical implication: all interesting datacentric applications become distributed.
• Already happening:
• Service-based architectures (and Grid!)
• Web 2.0
• Mobile Computing
Michael Franklin
June 2007
Bell’s law will amplify Metcalf’s
Bell: “Every decade, a new, lower cost,
class of computers emerges, defined by
platform, interface, and interconnect.”
•
•
•
•
•
Mainframes 1960s
Minicomputers 1970s
Microcomputers/PCs 1980s
Web-based computing 1990s
Devices (Cell phones, PDAs, wireless sensors,
RFID) 2000’s
Enabling a new generation of applications for
Operational Visibility, monitoring, and alerting.
Michael Franklin
June 2007
The Network as platform: Challenges
Barcodes
PoS System
Information
Feeds
XYZ 23.2; AAA 19; …
• Data Constantly “On-the-Move”
RFID
Mobile Devices
• Increased Data Volume
• Increased Heterogeneity & Sharing
• Shrinking decision cycles
• Increased data and decision
complexity
Transactional
Systems
Clickstream
Michael Franklin
June 2007
Telematics
Blogs/Web 2.0
Sensors
The Network as platform: Implications
Lots of challenges:
•
•
•
•
•
Integration (or “Dataspaces”)
Optimization/Planning/Adaptivity
Consistency/Master Data Mgmt
Continuity/Disaster Mgmt
Stream Processing (or data-on-the-move)
My current focus (and thus, the focus of this
talk) is the latter.
Michael Franklin
June 2007
Stream Processing
My view: Stream Processing will become the
3rd leg of standard IT data management:
• OLAP splitoff from OLTP for historical reporting.
• OLSA (On-line Stream Analytics) will handle:
•
•
•
•
Monitoring
Alerting
Transformation
Real-time Visability and Reporting
Note: CEP (Complex Event Processing) is a
related, emerging technology.
Michael Franklin
June 2007
Stream Processing + Grid?
• On-the-fly stream processing
required for high-volume
data/event generators.
• Real-time event detection
for coordination of
distributed observations.
• Wide-area sensing in
environmental macroscopes.
Michael Franklin
June 2007
Stream Processing - Overview
Turning Query Processing Upside Down
Traditional Database Approach
Data Stream Processing Approach
Static Batch
Reports
Queries
Bulk
Load
Data
Continuous,
Visibility, Alerts
Results
Results
Data Stream
Data
Warehouse
Live Data Streams
Processor
• Batch ETL & load, query later
• Poor RT monitoring, no replay
• Always-on data analysis & alerts
• RT Monitor & Replay to optimize
• DB size affects query response
• Consistent sub-second response
Michael Franklin
June 2007
Example 1: Simple Stream Query
A SQL smoothing filter to interpolate dropped RFID
readings.
SELECT distinct tag_id
FROM RFID_stream [RANGE ‘5 sec’]
GROUP BY tag_id
Smoothed
output
Smoothing Filter
Raw
readings
Time
Michael Franklin
June 2007
Example 2 - Stream/Table Join
Every 3 seconds, compute avg transaction
value of high-volume trades on S&P 500
stocks, over a 5 second “sliding window”
SELECT
FROM
WHERE
GROUP BY
T.symbol, AVG(T.price*T.volume)
Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S
T.symbol = S.symbol AND T.volume > 5000
T.symbol
Stream
Window
clause
Note: Output is also a Stream
Michael Franklin
June 2007
Table
Example 3 - Streaming View
Positive Suspense: Find the top 100 storeskus ordered by their decreasing positive
suspense (inventory - sales).
CREATE VIEW StoreSKU (store, sku, sales) as
(SELECT
FROM
WHERE
GROUP BY
P.store, P.sku,SUM(P.qty) as sales
POSLog P[RANGE `1 day’ SLIDE `10 min’], Inventory I
P.sku = I.sku and P.store = I.store and P.time > I.time
P.store, P.sku)
SELECT
FROM
WHERE
ORDER BY
LIMIT
(I.quantity – S.sales) as positive_suspense
StoreSKU S, Inventory I
S.store = I.store and S.sku = I.sku
positive_suspense DESC
100
Michael Franklin
June 2007
Application Areas
•
•
•
•
•
•
•
•
•
Financial Services: Trading/Capital Mkts
SOA/Infrastructure Monitoring; Security
Physical (sensor) Monitoring
Fraud Detection/Prevention
Risk Analytics and Compliance
Location-based Services
Customer Relationship Management/Retail
Supply chain/Logistics
…
Michael Franklin
June 2007
Real-Time
Monitoring
A Flex-based
dashboard
driven by
multiple
SQL queries.
Michael Franklin
June 2007
16
The “Jellybean” Argument
Conventional Wisdom: “can I afford real-time?”
Do the benefits justify the cost?
Reality: With stream
query processing,
real-time is cheaper
than batch.
• minimize copies & query
start-up overhead
• takes load off expensive
back-end systems
• rapid application dev &
maintenance
Michael Franklin
June 2007
Historical Context and status
• Early stuff:
• Data “Push”, Pub/Sub, Adaptive Query Proc.
• Lots of non-SQL approaches
•
•
Rules systems (e.g., for Fraud Detection)
Complex Event Processing (CEP)
• Research Projects led to companies
•
•
•
TelegraphCQ -> Truviso (Amalgamated)
Aurora -> Streambase
Streams -> Coral8
• Big guys ready to jump in: BEA, IBM,
Oracle, …
Michael Franklin
June 2007
Requirements
• High Data Rates: 1K (SOA monitoring) up
to 700K rec/sec (option trading)
• # queries: single digits to 10,000’s
• Query complexity
• Full SQL + windows + events + analytics
• Persistence, replay, historical comparison
• Huge range of Sources and Sinks
Michael Franklin
June 2007
Stream QP: Micro-Architecture
Single Node Architecture
Continuous Query Engine
Concurrent
Query
Planner
Triggers/
Rules
Streaming
Adaptive
SQL Query
Processor
Replay
Database
Active
Data
Ingress
Egress
External
Archive
© 2007, Amalgamated Insight, Inc.
…
Connectors
Transformations
Connectors
XML
CSV
MQ
MSMQ
JDBC
.NET
…
Other
CQE Instances
Transformations
Other CQE
Instances
Michael Franklin
June 2007
XML
Message Bus
Proprietary APIs
Pub/Sub
Alerts
Events
• 700K ticks/second for FS
• Wirespeed for networking/security
Ingress
• Minimal latency
• FS trading particularly sensitive to this
• Fault tolerance
• Especially given remote sources
• Efficient (bulk) data transformation
• XML, text, binary, …
• Work well for both push and pull sources
Michael Franklin
June 2007
Transformations
• Must support high data rates
XML
CSV
MQ
MSMQ
JDBC
.NET
Connectors
Ingress Issues (performance)
Connectors
•
•
•
•
•
•
•
Transformations
Egress Issues (performance)
Must support high data rates
Minimal latency
Egress
Fault tolerance
Efficient (bulk) data transformation
Buffering/Support for JDBC-style clients
Interaction with bulk warehouse loaders
Large-scale dissemination (Pub/Sub)
Michael Franklin
June 2007
XML
Message Bus
Prop. APIs
Pub/Sub
Alerts
Events
Continuous Query Engine
Query Processing (Single)
• Simple approach:
Concurrent
Query
Planner
Triggers/
Rules
Replay
Database
Active
Data
Streaming
Adaptive
SQL Query
Processor
• Stream inputs are “scan” operators
• Adapt operator plumbing to push/pull
• “Exchange” operators/ Fjords
• Need to run lots of these concurrently
• Index the queries?
• Scheduling, Memory Mgmt.
• Must avoid I/O, cache misses to run at speed
• Predicate push-down - a la Gigascope
Michael Franklin
June 2007
QP (continued)
• Transactional/Correctness issues:
• Never-ending queries hold locks forever!
• Need efficient heartbeat mechanism to keep
things moving forward.
• Dealing with corrections (e.g., in financial feeds).
• Out-of-order/missing data
• “ripples in the stream” can hurt clever scheduling
mechanisms.
• Integration with external code:
• Matlab, R, …, UDFs and UDAs
Michael Franklin
June 2007
Query Processing (Shared)
• Previous approach misses huge opportunity.
• Individual execution leads to linear slowdown
• Until you fall off the memory cliff!
• Recall that we know all the queries
• we know when they will need data
• we know what data they will need
• we know what things they will compute
• Why run them individually (as if we didn’t
know any of this)?
Michael Franklin
June 2007
Shared Processing - The Überquery
Form “query plan” from query text
SELECT
T.symbol, AVG(T.price*T.volume)
Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S
New query plan enters the system FROM
WHERE
T.symbol = S.symbol AND T.volume > 5000
GROUP BY T.symbol
SELECT
…
More queries arrive …
FROM SELECT
…
…
WHEREFROM….
…
SELECT
…
Queries get compiled into plans
GROUPWHERE
BY …
FROM….
…
GROUPWHERE
BY …
….
GROUP BY …
Each plan is folded into the global plan
No redundant modules = Super-Linear Query Scalability
Shared Query Engine
Michael Franklin
June 2007
Shared QP raises lots of new issues
• Scheduling based on data
availability/location and work affinity.
• Lots of bittwiddling: need efficient bitmaps.
• Query “folding” - how to combine (MQO)
• On-the-fly query changes.
• How does shared processing change the
traditional architectural tradeoffs?
• How to process across multiple: cores, dies,
boxes, racks, rooms?
Refs: NiagaraCQ, CACQ, TelegraphCQ, Sailesh Krishnamurthy’s thesis
Michael Franklin
June 2007
Archiving - Huge area
External
Archive
• Most streaming use-cases want
access to historical information.
• Compliance/Risk: also need to keep the data.
• Science apps need to keep raw data around too.
• In a high-volume streaming environment,
going to disk is an absolute killer.
• Obviously need clever techniques:
• Sampling, Index update deferral, load shedding
• Scheduling based on time-oriented queries
• Good old buffering/prefetching
Michael Franklin
June 2007
Stream QP: Macro-Architecture
HiFi - Taming the Data Flood
In-network Stream
Query Processing
and Storage
Headquarters
Hierarchical
Aggregation:
Spatial & Temporal
Regional
Centers
Warehouses,
Stores
Fast Data
Path vs.
Slow Data
Path
Dock doors,
Shelves
Receptors
Michael Franklin
June 2007
Problem: Sensors are Noisy
• A simple RFID
Experiment
• 2 adjacent shelves,
6 ft. wide
• 10 EPC-tagged items
each, plus 5 moved
between them
• RFID antenna on
each shelf
Michael Franklin
June 2007
Shelf RIFD - Ground Truth
Michael Franklin
June 2007
Actual RFID Readings
“Restock every time inventory goes below 5”
Michael Franklin
June 2007
VICE: Virtual Device Interface
[Jeffery et al., Pervasive 2006, VLDBJ 07]
Vice API is a natural place
to hide much of the
complexity arising from
physical devices.
“Virtual Device
(VICE)
API”
Michael Franklin
June 2007
Query-based Data Cleaning
Smooth
CREATE VIEW smoothed_rfid_stream AS
(SELECT receptor_id, tag_id
FROM cleaned_rfid_stream
[range by ’5 sec’,
slide by ’5 sec’]
GROUP BY receptor_id, tag_id
HAVING count(*) >= count_T)
Point
Michael Franklin
June 2007
Query-based Data Cleaning
Arbitrate
CREATE VIEW arbitrated_rfid_stream AS
(SELECT receptor_id, tag_id
FROM smoothed_rfid_stream rs
[range by ’5 sec’,
slide by ’5 sec’]
GROUP BY receptor_id, tag_id
HAVING count(*) >= ALL
(SELECT count(*)
FROM smoothed_rfid_stream
[range by ’5 sec’,
slide by ’5 sec’]
WHERE tag_id = rs.tag_id
GROUP BY receptor_id))
Smooth
Point
Michael Franklin
June 2007
After Query-based Cleaning
“Restock every time inventory
goes below 5”
Michael Franklin
June 2007
Adaptive Smoothing
[Jeffery et al. VLDB 2006]
Michael Franklin
June 2007
SQL Abstraction Makes it Easy?
• Soft Sensors - e.g.,
“LOUDMOUTH” sensor (VLDB 04)
•
•
•
•
•
•
Quality and lineage
Optimization (power, etc.)
Pushdown of external validation information
Automatic/Adaptive query placement
Data archiving
Imperative processing
Michael Franklin
June 2007
Some Challenges
• How to run across the full gamut of devices
from motes to mainframes?
• What about running *really* in-the-network?
• Data/query placement and movement
• Adaptivity is key
• “Push down” is a small subset of this problem.
• Sharing is also crucial here.
• Security, encryption, compression, etc.
• Lots of issues due to devices and “physical
world” problems.
Michael Franklin
June 2007
It’s not just a sensor-net problem
Decision
Latency
Transactional
Edge
Devices
PCs
HandheldsPoS
Readers
Distributed
Data
Enterprise
Apps
ERP
E-com
CRM
SCM
Integration
Bus
Analytical
DashBoards
Reports
Portal
Alerts
Operational
BI
Analytics
Data
Mining
Business
Intelligence
OLAP OLAP OLAP
Specialized
OLAP OLAP
OLAP
OLAP OLAP Data Marts
OLAP
Batch
Latency
Transactional OLTP OLTP OLTP
OLTP OLTP OLTP
Data Stores
OLTP OLTP OLTP
Enterprise
Data Warehouse
Batch Load
OLTP OLTP OLTP
Exploding
Data Volumes
Query
Latency
Michael Franklin
June 2007
Data Dissemination (Fan-Out)
• Many applications have large numbers of
consumers.
• Lots of interesting questions on large-scale
pub/sub technology.
• Micro-scale: locality, scheduling, sharing, for
huge numbers of subscriptions.
• Macro-scale: dissemination trees, placement,
sharing, …
Michael Franklin
June 2007
What to measure?
(a research opportunity)
• High Data Rates/Throughput
• rec/sec; record size
• Number of concurrent queries.
• Query complexity
• Huge range of Sources and Sinks
• transformation and connector performance
• Minimal Benchmarking work so far:
• “Linear Road” from Aurora group
• CEP benchmark work by Pedro Bizarro
Michael Franklin
June 2007
Conclusions
• Two relevant trends:
• Metcalf’s Law  DB systems need to become
more network-savvy.
• Jim Gray and others have helped demonstrate
the value of SQL to science.
• Stream query processing is where these
two trends meet in the Grid world.
• A new (3rd) component of data management
infrastructure.
• Lots of open research problems for the
HPDC (and DB) community.
Michael Franklin
June 2007
Resources
• Research Projects @ Berkeley
• TelegraphCQ - single-site stream processor
• HiFi - Distributed/Hierarchical
see www.cs.berkeley.edu/~franklin for links/papers
• Good jumping off point for CEP and
related info: www.complexevents.com
• The company:
www.truviso.com
Michael Franklin
June 2007