Big Data Systems: Achieving Performance and Usability

Big Data Systems:
Achieving Performance and Usability
Peter Pietzuch
[email protected]
Large-Scale Distributed Systems Group
Peter R. Pietzuch
Department of Computing, Imperial College London
[email protected]
http://lsds.doc.ic.ac.uk
Big Data Excellence, Berlin, Germany – March 2017
Machine Learning For Web Data
Uniquely Identified Visitors
– How to maximise user experience with
relevant content?
– How to analyse “click paths” to trace most
common user routes?
Unique Visitors
Visits
Page Views
Hits
Volume of Available Data
• Example: Online predictions for
• Solution: AdPredictor
adverts to serve on search engines
– Bayesian learning algorithm
ranks ads according to click
probabilities
update
…
fn
f1
Peter Pietzuch - Imperial College London
y E {−1,1}
predict
2
Data Mining of Social Networks
Twitter Cascade
Detection
Peter Pietzuch - Imperial College London
3
Intelligent Urban Transport
• Instrumentation of urban
transport
– Induction loops to measure
traffic flow
– Video surveillance of hot spots
– Sensors in public transport
• Potential queries
– How to detect traffic
congestion and road
closures?
– How to explain the cause of
congestion (public event,
emergency)?
– How to react accordingly (eg
by adapting traffic light
schedules)?
Peter Pietzuch - Imperial College London
4
Big Data Applications Enabled by Systems
• Data centres allow processing
at scale
– Commodity servers
– Fast network fabrics
Peter Pietzuch - Imperial College London
5
Task vs Data Parallelism
...
Big Data
n machines in
data centre
select distinct W.cid
select distinct
W.cid [range 300 seconds] as W,
From
Payments
select
distinct
W.cid [range 300 seconds] as W,
From
Payments
Payments
[partition-by
1 row] AVG(speed)
as L
select
highway,
segment,
From
Payments
[rangedirection,
300 1seconds]
Payments
[partition-by
row]
Las W,
where from Vehicles[range
W.cid = L.cid and
W.regionslide
!= L.region
5 seconds
1as
second]
Payments
1!=row]
as L
where
W.cid
= L.cid[partition-by
and
W.region
L.region
group by
highway,
segment,
direction
where
having W.cid
avg=<L.cid
40 and W.region != L.region
Task parallelism:
Multiple queries
Peter Pietzuch - Imperial College London
Results
select highway, segment, direction, AVG(speed)
from Vehicles[range 5 seconds slide 1 second]
group by highway, segment, direction
having
avg < 40
Data parallelism:
Single query
6
Distributed Dataflow Systems
• Idea:
Execute multiple data-parallel tasks
on cluster nodes
parallelism
degree 2
parallelism
degree 3
• Tasks organised as dataflow graph
• Almost all big data systems do this:
Apache Hadoop, Apache Spark,
Apache Storm, Apache Flink,
Google Dataflow, Google
TensorFlow, ...
•
Peter Pietzuch - Imperial College London
7
Design Space of Big Data Systems
• Tension between performance and algorithmic expressiveness
Data amount
TBs
Hard for
all algorithms
GBs
Hard for
machine
learning
algorithms
MBs
Easy
10s
Peter Pietzuch - Imperial College London
1s
100ms
10ms
Result latency
1ms
8
1. High Performance with New Hardware
…
Big Data System
High-throughput processing
Facebook Insights:
Feedzai:
Google Zeitgeist:
NovaSparks:
Low-latency results
Aggregates 9 GB/s
40K credit card transactions/s
40K user queries/s (1 sec windows)
150M trade options/s
Peter Pietzuch - Imperial College London
< 10 sec latency
< 25 ms latency
< 1 ms latency
< 1 ms latency
9
2. Expressing Machine Learning Algorithms
T1(a, b, c)
T2(c, d, e)
highway
segment
direction
speed
highway
segment
direction
speed
highway
segment
direction
speed
highway
segment
direction
speed
Pre-process
…
Share state
…
Parallelize
…
Aggregate
Iterate
highway
segment
direction
speed
T3(g, i, h)
Data
filtering
Complex
pattern
matching
Data
queries
Online machine
learning, data
mining
Increasing complexity
Peter Pietzuch - Imperial College London
10
Must Exploit New Parallel Hardware
Servers have many parallel CPU cores
Servers with GPUs common
– GPU have even more specialised cores
New types of compute accelerators
– Xeon Phi, Google's TPUs, FPGAs, ...
PCIe Bus
Command Queue
C1
C5
C2
C6
C3
C4
Socket 2
10s of
CPU cores
Socket 1
SMX1 ... SMXN
C1
C5
C2
C6
C7
C3
C7
C8
C4
C8
L3
1000s of
GPU cores
L3
L2 Cache
DRAM
Peter Pietzuch - Imperial College London
DMA
DRAM
11
SQL-Based Big Data Queries
SQL provides well-defined declarative semantics for queries
– Based on relational algebra (select, project, join, …)
• Example: Identify slow moving traffic on highway
– Find highway segments with average speed below 40 km/h
select
Input data
from
group by
having
highway, segment,
direction, AVG(speed) as avg
Vehicles[range 5 sec slide 1 sec]
highway, segment, direction
avg < 40
Output
Operators
Peter Pietzuch - Imperial College London
12
Processing Big Data with Windows
Queries must operate over finite amounts of data at a time
6
5
4
3
2
size:
slide:
1
4 sec
1 sec
w1
w2
w3
w4
windows
Peter Pietzuch - Imperial College London
13
How to use GPUs with Big Data Queries?
Naive strategy parallelises computation along window boundaries
6
5
4
3
2
size:
slide:
1
4 sec
1 sec
Task
T1
w
1
w2
Combine partial results
Task
w
3 T2
w4
E Window-based parallelism results in redundant computation
Peter Pietzuch - Imperial College London
14
How to use GPUs with Big Data Queries?
Parallel processing of non-overlapping window data?
6
5
4
3
2
size:
slide:
1
4 sec
1 sec
T1
w
T22
w
Combine partial results
T3
w
3
T
w44
T5
E Slide-based parallelism limits degree of parallelism
Peter Pietzuch - Imperial College London
15
SABER’s Big Data Processing Model
Idea: Parallelise over data chunks that are best for hardware
– Should be independent of query!
T3
T1
T2
15 14 15
13 14
12 13
11 12 11
10 10
9
98
87
w5
76
6
w4
5
5 tuples/task
4
3
w3
2
1
w2
w1
size: 7 tuples
slide: 2 tuples
• Task contains one or more window fragments
Peter Pietzuch - Imperial College London
16
SABER's Big Data Processing Model
Idea: Reassemble partial results to obtain overall result
Worker A: T1
w1
w2
w2
result
w3
Empty
w1
w2
w3
w4
w5
Slot 2
Empty
Slot 1
Result Stage
w1
result
Output result
circular buffer
Worker B: T2
E Partial result assembly must also be done in parallel
Peter Pietzuch - Imperial College London
17
SABER's Hybrid Execution Model
Idea: Permits task execution on all heterogeneous processors
(eg CPUs, GPUs, ...)
– Scheduler assigns tasks to idle processors on-the-fly
Task queue
T8
T7
T6
T5
T4
T3
T2
T1
QA QA QB
QA
QB QB
QB
QB QA
QB
T10
T9
0
GPU
9
T7
T3
T1
T2
Peter Pietzuch - Imperial College London
GPU worker
6
3
CPU
CPU worker
T4
T5
12
T10
T66
T8
T9
18
SABER Big Data Engine
Java
C & OpenCL
15K LOC 4K LOC
T1
op
T2
CPU
T2
T1
T2
T1
GPU
op
α
α
Dispatching stage
Scheduling & execution stage
Result stage
Dispatch
fixed-size tasks
Dequeue tasks
based on HLS
Merge & forward partial
window results
Peter Pietzuch - Imperial College London
19
What is SABER's Performance?
select
Throughput (106 tuples/s)
group-byavg
group-byavg group-bycnt
aggravg group-byavg select
group-bycnt
50
SABER (CPU contrib.)
40
Intel Xeon 2.6 GHz
16 cores
30
20
SABER (GPU contrib.)
10
2,304 cores
NVIDIA Quadro K5200
0
CM2
SG1
Cluster Mgmt.
SG2
Smart Grid
LRB3
LRB4
LRB
E SABER achieves aggregate performance of CPUs and GPUs
Peter Pietzuch - Imperial College London
20
2. Expressing Machine Learning Algorithms
T1(a, b, c)
T2(c, d, e)
highway
segment
direction
speed
highway
segment
direction
speed
highway
segment
direction
speed
highway
segment
direction
speed
Pre-process
…
Share state
…
Parallelize
…
Aggregate
Iterate
highway
segment
direction
speed
T3(g, i, h)
Data
filtering
Complex
pattern
matching
Data
queries
Online machine
learning, data
mining
Increasing complexity
Peter Pietzuch - Imperial College London
21
Democratise Big Data Analytics
E Need to enable more users to do big data analytics…
Peter Pietzuch - Imperial College London
22
Programming Language Popularity
Peter Pietzuch - Imperial College London
23
Programming Models For Big Data Apps?
• Declarative query languages (eg SQL) challenging for complex
algorithms (eg iterative machine learning algorithms)
– Typically need to rely on user-defined functions (UDFs)
• Distributed dataflow frameworks favour functional programming models
– MapReduce, SQL, PIG, DryadLINQ, Spark, …
– Simplifies consistency and fault tolerance
• Domain experts tend to write imperative programs
– Java, Matlab, C++, R, Python, Fortran, …
Peter Pietzuch - Imperial College London
24
Abstractions for Machine Learning
• Online recommender system
– Recommendations based on past user ratings
– Eg based on collaborative filtering (cf Netflix, Amazon, …)
User A
Recommend:
“Apple
Watch”
User A
Rating:
Item:3
“iPhone”
Rating: 5
Up-to-date
recommendations
Customer events
on website
Executed as dataflow graph
(eg Hadoop, Spark, Flink, …)
Peter Pietzuch - Imperial College London
25
Online Collaborative Filtering in Java
Update with
new ratings
User-A
User-B
Item-A
4
0
Matrix userItem = new Matrix();
Matrix coOcc = new Matrix();
Item-B
5
5
void addRating(int user, int item, int rating) {
userItem.setElement(user, item, rating);
updateCoOccurrence(coOcc, userItem);
}
User-Item matrix (UI)
Multiply for
recommendation
User-B
1 2
x
Item-A
Item-B
Item-A
Item-B
1
1
1
2
Vector getRec(int user) {
Vector userRow = userItem.getRow(user);
Vector userRec = coOcc.multiply(userRow);
return userRec;
}
Co-occurrence matrix (CO)
Peter Pietzuch - Imperial College London
26
Collaborative Filtering in Spark (Java)
// Build the recommendation model using ALS
int rank = 10;
int numIterations = 20;
MatrixFactorizationModel model = ALS.train(JavaRDD.toRDD(ratings), rank, numIterations, 0.01);
// Evaluate the model on rating data
JavaRDD<Tuple2<Object, Object>> userProducts = ratings.map(
new Function<Rating, Tuple2<Object, Object>>() {
public Tuple2<Object, Object> call(Rating r) {
return new Tuple2<Object, Object>(r.user(), r.product());
}
}
);
JavaPairRDD<Tuple2<Integer, Integer>, Double> predictions = JavaPairRDD.fromJavaRDD(
model.predict(JavaRDD.toRDD(userProducts)).toJavaRDD().map(
new Function<Rating, Tuple2<Tuple2<Integer, Integer>, Double>>() {
public Tuple2<Tuple2<Integer, Integer>, Double> call(Rating r){
return new Tuple2<Tuple2<Integer, Integer>, Double>(
new Tuple2<Integer, Integer>(r.user(), r.product()), r.rating());
}
}
));
JavaRDD<Tuple2<Double, Double>> ratesAndPreds = JavaPairRDD.fromJavaRDD(ratings.map(
new Function<Rating, Tuple2<Tuple2<Integer, Integer>, Double>>() {
public Tuple2<Tuple2<Integer, Integer>, Double> call(Rating r){
return new Tuple2<Tuple2<Integer, Integer>, Double>(
new Tuple2<Integer, Integer>(r.user(), r.product()), r.rating());
}
}
)).join(predictions).values();
Peter Pietzuch - Imperial College London
27
Collaborative Filtering in Spark (Scala)
// Build the recommendation model using ALS
val rank = 10
val numIterations = 20
val model = ALS.train(ratings, rank, numIterations, 0.01)
// Evaluate the model on rating data
val usersProducts = ratings.map {
case Rating(user, product, rate) => (user, product)
}
val predictions =
model.predict(usersProducts).map {
case Rating(user, product, rate) => ((user, product), rate)
}
val ratesAndPreds = ratings.map {
case Rating(user, product, rate) => ((user, product), rate)
}.join(predictions)
E All data is immutable, no fine-grained model updates
Peter Pietzuch - Imperial College London
28
State as First Class Citizen
Tasks process data
Item 1 Item 2
5
2
User A
User B
4
1
Dataflows
represent
data
State Elements
(SEs) represent
transient state
• State elements (SEs) are mutable in-memory data structures
– Tasks have local access to SEs
– SEs can be shared between tasks
Peter Pietzuch - Imperial College London
29
Imperial,
USENIX
ATC’14
Stateful Dataflow Graphs (SDG)
SEEP distributed dataflow
framework
Annotated Java program
(@Partitioned, @Partial, @Global, …)
Program.java
Static
program
analysis
Peter Pietzuch - Imperial College London
Data-parallel
Stateful Dataflow
Graph (SDG)
Translation &
checkpointbased fault
tolerance
Cluster
30
Conclusions
• Big Data Systems have enabled the Big Data revolution
– Driven by advances in data centre technology and parallel hardware
• 1. Performance means exploiting new parallel hardware (GPUs)
– Must decouple query semantics from hardware parameters
• E SABER engine: Hybrid CPU/GPU execution model
• 2. Usability means providing right programing abstractions
– Imperative programming models still going strong
• E SDG: Stateful Big Data processing for machine learning
Peter Pietzuch - Imperial College London
31
Acknowledgements: LSDS Group
We're recruiting: PhDs, Post-Docs
Thank you! Any Questions?
Peter Pietzuch - Imperial College London
Peter Pietzuch
<[email protected]>
http://lsds.doc.ic.ac.uk
32