Big Data Systems: Achieving Performance and Usability Peter Pietzuch [email protected] Large-Scale Distributed Systems Group Peter R. Pietzuch Department of Computing, Imperial College London [email protected] http://lsds.doc.ic.ac.uk Big Data Excellence, Berlin, Germany – March 2017 Machine Learning For Web Data Uniquely Identified Visitors – How to maximise user experience with relevant content? – How to analyse “click paths” to trace most common user routes? Unique Visitors Visits Page Views Hits Volume of Available Data • Example: Online predictions for • Solution: AdPredictor adverts to serve on search engines – Bayesian learning algorithm ranks ads according to click probabilities update … fn f1 Peter Pietzuch - Imperial College London y E {−1,1} predict 2 Data Mining of Social Networks Twitter Cascade Detection Peter Pietzuch - Imperial College London 3 Intelligent Urban Transport • Instrumentation of urban transport – Induction loops to measure traffic flow – Video surveillance of hot spots – Sensors in public transport • Potential queries – How to detect traffic congestion and road closures? – How to explain the cause of congestion (public event, emergency)? – How to react accordingly (eg by adapting traffic light schedules)? Peter Pietzuch - Imperial College London 4 Big Data Applications Enabled by Systems • Data centres allow processing at scale – Commodity servers – Fast network fabrics Peter Pietzuch - Imperial College London 5 Task vs Data Parallelism ... Big Data n machines in data centre select distinct W.cid select distinct W.cid [range 300 seconds] as W, From Payments select distinct W.cid [range 300 seconds] as W, From Payments Payments [partition-by 1 row] AVG(speed) as L select highway, segment, From Payments [rangedirection, 300 1seconds] Payments [partition-by row] Las W, where from Vehicles[range W.cid = L.cid and W.regionslide != L.region 5 seconds 1as second] Payments 1!=row] as L where W.cid = L.cid[partition-by and W.region L.region group by highway, segment, direction where having W.cid avg=<L.cid 40 and W.region != L.region Task parallelism: Multiple queries Peter Pietzuch - Imperial College London Results select highway, segment, direction, AVG(speed) from Vehicles[range 5 seconds slide 1 second] group by highway, segment, direction having avg < 40 Data parallelism: Single query 6 Distributed Dataflow Systems • Idea: Execute multiple data-parallel tasks on cluster nodes parallelism degree 2 parallelism degree 3 • Tasks organised as dataflow graph • Almost all big data systems do this: Apache Hadoop, Apache Spark, Apache Storm, Apache Flink, Google Dataflow, Google TensorFlow, ... • Peter Pietzuch - Imperial College London 7 Design Space of Big Data Systems • Tension between performance and algorithmic expressiveness Data amount TBs Hard for all algorithms GBs Hard for machine learning algorithms MBs Easy 10s Peter Pietzuch - Imperial College London 1s 100ms 10ms Result latency 1ms 8 1. High Performance with New Hardware … Big Data System High-throughput processing Facebook Insights: Feedzai: Google Zeitgeist: NovaSparks: Low-latency results Aggregates 9 GB/s 40K credit card transactions/s 40K user queries/s (1 sec windows) 150M trade options/s Peter Pietzuch - Imperial College London < 10 sec latency < 25 ms latency < 1 ms latency < 1 ms latency 9 2. Expressing Machine Learning Algorithms T1(a, b, c) T2(c, d, e) highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed Pre-process … Share state … Parallelize … Aggregate Iterate highway segment direction speed T3(g, i, h) Data filtering Complex pattern matching Data queries Online machine learning, data mining Increasing complexity Peter Pietzuch - Imperial College London 10 Must Exploit New Parallel Hardware Servers have many parallel CPU cores Servers with GPUs common – GPU have even more specialised cores New types of compute accelerators – Xeon Phi, Google's TPUs, FPGAs, ... PCIe Bus Command Queue C1 C5 C2 C6 C3 C4 Socket 2 10s of CPU cores Socket 1 SMX1 ... SMXN C1 C5 C2 C6 C7 C3 C7 C8 C4 C8 L3 1000s of GPU cores L3 L2 Cache DRAM Peter Pietzuch - Imperial College London DMA DRAM 11 SQL-Based Big Data Queries SQL provides well-defined declarative semantics for queries – Based on relational algebra (select, project, join, …) • Example: Identify slow moving traffic on highway – Find highway segments with average speed below 40 km/h select Input data from group by having highway, segment, direction, AVG(speed) as avg Vehicles[range 5 sec slide 1 sec] highway, segment, direction avg < 40 Output Operators Peter Pietzuch - Imperial College London 12 Processing Big Data with Windows Queries must operate over finite amounts of data at a time 6 5 4 3 2 size: slide: 1 4 sec 1 sec w1 w2 w3 w4 windows Peter Pietzuch - Imperial College London 13 How to use GPUs with Big Data Queries? Naive strategy parallelises computation along window boundaries 6 5 4 3 2 size: slide: 1 4 sec 1 sec Task T1 w 1 w2 Combine partial results Task w 3 T2 w4 E Window-based parallelism results in redundant computation Peter Pietzuch - Imperial College London 14 How to use GPUs with Big Data Queries? Parallel processing of non-overlapping window data? 6 5 4 3 2 size: slide: 1 4 sec 1 sec T1 w T22 w Combine partial results T3 w 3 T w44 T5 E Slide-based parallelism limits degree of parallelism Peter Pietzuch - Imperial College London 15 SABER’s Big Data Processing Model Idea: Parallelise over data chunks that are best for hardware – Should be independent of query! T3 T1 T2 15 14 15 13 14 12 13 11 12 11 10 10 9 98 87 w5 76 6 w4 5 5 tuples/task 4 3 w3 2 1 w2 w1 size: 7 tuples slide: 2 tuples • Task contains one or more window fragments Peter Pietzuch - Imperial College London 16 SABER's Big Data Processing Model Idea: Reassemble partial results to obtain overall result Worker A: T1 w1 w2 w2 result w3 Empty w1 w2 w3 w4 w5 Slot 2 Empty Slot 1 Result Stage w1 result Output result circular buffer Worker B: T2 E Partial result assembly must also be done in parallel Peter Pietzuch - Imperial College London 17 SABER's Hybrid Execution Model Idea: Permits task execution on all heterogeneous processors (eg CPUs, GPUs, ...) – Scheduler assigns tasks to idle processors on-the-fly Task queue T8 T7 T6 T5 T4 T3 T2 T1 QA QA QB QA QB QB QB QB QA QB T10 T9 0 GPU 9 T7 T3 T1 T2 Peter Pietzuch - Imperial College London GPU worker 6 3 CPU CPU worker T4 T5 12 T10 T66 T8 T9 18 SABER Big Data Engine Java C & OpenCL 15K LOC 4K LOC T1 op T2 CPU T2 T1 T2 T1 GPU op α α Dispatching stage Scheduling & execution stage Result stage Dispatch fixed-size tasks Dequeue tasks based on HLS Merge & forward partial window results Peter Pietzuch - Imperial College London 19 What is SABER's Performance? select Throughput (106 tuples/s) group-byavg group-byavg group-bycnt aggravg group-byavg select group-bycnt 50 SABER (CPU contrib.) 40 Intel Xeon 2.6 GHz 16 cores 30 20 SABER (GPU contrib.) 10 2,304 cores NVIDIA Quadro K5200 0 CM2 SG1 Cluster Mgmt. SG2 Smart Grid LRB3 LRB4 LRB E SABER achieves aggregate performance of CPUs and GPUs Peter Pietzuch - Imperial College London 20 2. Expressing Machine Learning Algorithms T1(a, b, c) T2(c, d, e) highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed Pre-process … Share state … Parallelize … Aggregate Iterate highway segment direction speed T3(g, i, h) Data filtering Complex pattern matching Data queries Online machine learning, data mining Increasing complexity Peter Pietzuch - Imperial College London 21 Democratise Big Data Analytics E Need to enable more users to do big data analytics… Peter Pietzuch - Imperial College London 22 Programming Language Popularity Peter Pietzuch - Imperial College London 23 Programming Models For Big Data Apps? • Declarative query languages (eg SQL) challenging for complex algorithms (eg iterative machine learning algorithms) – Typically need to rely on user-defined functions (UDFs) • Distributed dataflow frameworks favour functional programming models – MapReduce, SQL, PIG, DryadLINQ, Spark, … – Simplifies consistency and fault tolerance • Domain experts tend to write imperative programs – Java, Matlab, C++, R, Python, Fortran, … Peter Pietzuch - Imperial College London 24 Abstractions for Machine Learning • Online recommender system – Recommendations based on past user ratings – Eg based on collaborative filtering (cf Netflix, Amazon, …) User A Recommend: “Apple Watch” User A Rating: Item:3 “iPhone” Rating: 5 Up-to-date recommendations Customer events on website Executed as dataflow graph (eg Hadoop, Spark, Flink, …) Peter Pietzuch - Imperial College London 25 Online Collaborative Filtering in Java Update with new ratings User-A User-B Item-A 4 0 Matrix userItem = new Matrix(); Matrix coOcc = new Matrix(); Item-B 5 5 void addRating(int user, int item, int rating) { userItem.setElement(user, item, rating); updateCoOccurrence(coOcc, userItem); } User-Item matrix (UI) Multiply for recommendation User-B 1 2 x Item-A Item-B Item-A Item-B 1 1 1 2 Vector getRec(int user) { Vector userRow = userItem.getRow(user); Vector userRec = coOcc.multiply(userRow); return userRec; } Co-occurrence matrix (CO) Peter Pietzuch - Imperial College London 26 Collaborative Filtering in Spark (Java) // Build the recommendation model using ALS int rank = 10; int numIterations = 20; MatrixFactorizationModel model = ALS.train(JavaRDD.toRDD(ratings), rank, numIterations, 0.01); // Evaluate the model on rating data JavaRDD<Tuple2<Object, Object>> userProducts = ratings.map( new Function<Rating, Tuple2<Object, Object>>() { public Tuple2<Object, Object> call(Rating r) { return new Tuple2<Object, Object>(r.user(), r.product()); } } ); JavaPairRDD<Tuple2<Integer, Integer>, Double> predictions = JavaPairRDD.fromJavaRDD( model.predict(JavaRDD.toRDD(userProducts)).toJavaRDD().map( new Function<Rating, Tuple2<Tuple2<Integer, Integer>, Double>>() { public Tuple2<Tuple2<Integer, Integer>, Double> call(Rating r){ return new Tuple2<Tuple2<Integer, Integer>, Double>( new Tuple2<Integer, Integer>(r.user(), r.product()), r.rating()); } } )); JavaRDD<Tuple2<Double, Double>> ratesAndPreds = JavaPairRDD.fromJavaRDD(ratings.map( new Function<Rating, Tuple2<Tuple2<Integer, Integer>, Double>>() { public Tuple2<Tuple2<Integer, Integer>, Double> call(Rating r){ return new Tuple2<Tuple2<Integer, Integer>, Double>( new Tuple2<Integer, Integer>(r.user(), r.product()), r.rating()); } } )).join(predictions).values(); Peter Pietzuch - Imperial College London 27 Collaborative Filtering in Spark (Scala) // Build the recommendation model using ALS val rank = 10 val numIterations = 20 val model = ALS.train(ratings, rank, numIterations, 0.01) // Evaluate the model on rating data val usersProducts = ratings.map { case Rating(user, product, rate) => (user, product) } val predictions = model.predict(usersProducts).map { case Rating(user, product, rate) => ((user, product), rate) } val ratesAndPreds = ratings.map { case Rating(user, product, rate) => ((user, product), rate) }.join(predictions) E All data is immutable, no fine-grained model updates Peter Pietzuch - Imperial College London 28 State as First Class Citizen Tasks process data Item 1 Item 2 5 2 User A User B 4 1 Dataflows represent data State Elements (SEs) represent transient state • State elements (SEs) are mutable in-memory data structures – Tasks have local access to SEs – SEs can be shared between tasks Peter Pietzuch - Imperial College London 29 Imperial, USENIX ATC’14 Stateful Dataflow Graphs (SDG) SEEP distributed dataflow framework Annotated Java program (@Partitioned, @Partial, @Global, …) Program.java Static program analysis Peter Pietzuch - Imperial College London Data-parallel Stateful Dataflow Graph (SDG) Translation & checkpointbased fault tolerance Cluster 30 Conclusions • Big Data Systems have enabled the Big Data revolution – Driven by advances in data centre technology and parallel hardware • 1. Performance means exploiting new parallel hardware (GPUs) – Must decouple query semantics from hardware parameters • E SABER engine: Hybrid CPU/GPU execution model • 2. Usability means providing right programing abstractions – Imperative programming models still going strong • E SDG: Stateful Big Data processing for machine learning Peter Pietzuch - Imperial College London 31 Acknowledgements: LSDS Group We're recruiting: PhDs, Post-Docs Thank you! Any Questions? Peter Pietzuch - Imperial College London Peter Pietzuch <[email protected]> http://lsds.doc.ic.ac.uk 32
© Copyright 2026 Paperzz