One Trillion Edges: Graph Processing at Facebook-Scale Eirik Folkestad & Christoffer Nysæther Introduction - - Graphs can model everything and Facebook manages a very large one. - Facebook had 1.3B users in 2014 and more than 400 Billion edges in their network. Now they have 1.71B and many more edges - Twitter has 288M monthly active users with an average of 288M followers in 2015 with an estimated total of 60 Billion edges Open graph helps developers of connecting applications to real-world actions to create real-world graphs. However, analyzing these real-world graphs was very difficult due to their enormous graph with sizes of hundred of billions and even trillions of edges(!). Specialized graph frameworks - Many graph frameworks fail at a much smaller scale. Asynchronous graph processing engines tend to have additional challenges - - unbounded message queues causing memory overload vertex-centric locking complexity and overhead difficulty in leveraging high network bandwidth due to finegrained computation Correspondingly, there is lack of information on how applications perform and scale to practical problems on trillion-edge graphs. Related systems - MapReduce - - Apache Hadoop - - Open-Source implementation of MapReduce written in Java Apache Hive - - Programming model for processing and generating large data sets with a parallel, distributed algorithm Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Pregel - The first implementation of the Bulk Synchronous Parallel (BSP) model for component cooperation. Not open source Apache Giraph Apache Giraph is an open source iterative graph processing system designed to scale to hundreds or thousands of machines and process trillions of edges. Based loosely on Pregel but written in open source Java Why Facebook chose Giraph: - Performed better than other tested graph frameworks: HIVE and GraphLab Open source Has vertex and edge input formats that can access MapReduce formats as well as Hive tables. Can insert Giraph applications into existing Hadoop pipelines Easy debugging due to being straight forward and not an asynchronous graph processing framework Improvements to Giraph - Can load data from different sources rather than being vertex centric only - - Can add more worker threads per machine and use worker local multithreading for better parallelization - - Will increase resource availability (mitigate “slowest worker problem”) Serialize edges of vertexes as binary arrays and faster outEdge interface for better memory usage - - Reduce preprocessing, more flexible, worker can load arbitrary subset of edges. JVM worked to hard Support for a larger aggregators by using sharded aggregators which moves the responsibility of aggregation to a worker rather than master. - Inefficiently implemented in Zookeeper due to small max-size of znodes Compute Model Extensions ● Pregel model needed to be generalized ○ ○ ● Support more complex application Make the framework more reusable K-means clustering Worker phases ● Methods added: ○ ○ ○ preSuperstep() ■ Calculate new position for each of the centroids postSuperstep() preApplication() ■ Determine the initial position of the centroids ■ Executed on every worker prior to any computation ○ postApplication() ○ Adds a lot of functionality, but bypass the Pregel model Master Computation ● Executing the same code on each workers ○ ○ ● ● ● Not well understood Error prone Added master computation to do centralized computation prior to every superstep Aggregate errors to see if application is converging Facebook has: ○ ○ ○ ○ Users Friendships Subscriptions Other social connections Master computation ● ● ● Edge cut → master computation Aggregator used to either continue computing or execute edge cut haltComputation() in master checked prior to starting a superstep Composable computation - Cleaner and more reusable option Decouples the vertex from the computation Abstracts the computation from the vertex Uses two message types - - M1: Incoming message type M2: Outgoing message type Master computation can choose the computation class to execute for the current superstep Superstep splitting - Messaging patterns can exceed the available memory on the destination vertex owner Many messages can not be aggregated - - Mutual friends - Each vertex has to send all its neighbors the vertex ids of its neighborhood Calculating strength of a relationship - 850GB when using 200 workers Superstep splitting sends a fragment of the message and to a partial computation Limitations: - Message must be commutative and associative No single message can overflow the memory buffer of a single vertex Experimental results - Applications - - Speed ups in both CPU time and elapsed time - - Label propagation PageRank Friends of friends score Hive queries 5x - 26x CPU time 8x - 120x elapsed time Unweighted PageRank on 1.39B user dataset with 1 trillion social connections - Less than 3 minutes per iteration with only 200 machines Operational experience - Running Giraph in production at Facebook for over two years Scheduling - - Graph preparation - - Disabled checkpointing Errors handled by restarting Hive tables HiveIO Production application workflow - Write application and unit test it Run application on test dataset Run application at scale Deploy to production Thank You!
© Copyright 2025 Paperzz