One Trillion Edges: Graph Processing at Facebook

One Trillion Edges: Graph
Processing at
Facebook-Scale
Eirik Folkestad & Christoffer Nysæther
Introduction
-
-
Graphs can model everything and Facebook manages a very large one.
-
Facebook had 1.3B users in 2014 and more than 400 Billion edges in their network. Now they
have 1.71B and many more edges
-
Twitter has 288M monthly active users with an average of 288M followers in 2015 with an
estimated total of 60 Billion edges
Open graph helps developers of connecting applications to real-world actions
to create real-world graphs.
However, analyzing these real-world graphs was very difficult due to their
enormous graph with sizes of hundred of billions and even trillions of edges(!).
Specialized graph frameworks
-
Many graph frameworks fail at a much smaller scale.
Asynchronous graph processing engines tend to have additional challenges
-
-
unbounded message queues causing memory overload
vertex-centric locking complexity and overhead
difficulty in leveraging high network bandwidth due to finegrained computation
Correspondingly, there is lack of information on how applications perform and
scale to practical problems on trillion-edge graphs.
Related systems
-
MapReduce
-
-
Apache Hadoop
-
-
Open-Source implementation of MapReduce written in Java
Apache Hive
-
-
Programming model for processing and generating large data sets with a parallel, distributed
algorithm
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis.
Pregel
-
The first implementation of the Bulk Synchronous Parallel (BSP) model for component
cooperation. Not open source
Apache Giraph
Apache Giraph is an open source iterative graph processing system designed to
scale to hundreds or thousands of machines and process trillions of edges.
Based loosely on Pregel but written in open source Java
Why Facebook chose Giraph:
-
Performed better than other tested graph frameworks: HIVE and GraphLab
Open source
Has vertex and edge input formats that can access MapReduce formats as
well as Hive tables.
Can insert Giraph applications into existing Hadoop pipelines
Easy debugging due to being straight forward and not an asynchronous graph
processing framework
Improvements to Giraph
-
Can load data from different sources rather than being vertex centric only
-
-
Can add more worker threads per machine and use worker local
multithreading for better parallelization
-
-
Will increase resource availability (mitigate “slowest worker problem”)
Serialize edges of vertexes as binary arrays and faster outEdge interface for
better memory usage
-
-
Reduce preprocessing, more flexible, worker can load arbitrary subset of edges.
JVM worked to hard
Support for a larger aggregators by using sharded aggregators which moves
the responsibility of aggregation to a worker rather than master.
-
Inefficiently implemented in Zookeeper due to small max-size of znodes
Compute Model Extensions
●
Pregel model needed to be generalized
○
○
●
Support more complex application
Make the framework more reusable
K-means clustering
Worker phases
●
Methods added:
○
○
○
preSuperstep()
■ Calculate new position for each of the centroids
postSuperstep()
preApplication()
■ Determine the initial position of the centroids
■ Executed on every worker prior to any computation
○
postApplication()
○
Adds a lot of functionality, but bypass the Pregel model
Master Computation
●
Executing the same code on each workers
○
○
●
●
●
Not well understood
Error prone
Added master computation to do centralized computation prior to every
superstep
Aggregate errors to see if application is converging
Facebook has:
○
○
○
○
Users
Friendships
Subscriptions
Other social connections
Master computation
●
●
●
Edge cut → master computation
Aggregator used to either continue computing or execute edge cut
haltComputation() in master checked prior to starting a superstep
Composable computation
-
Cleaner and more reusable option
Decouples the vertex from the computation
Abstracts the computation from the vertex
Uses two message types
-
-
M1: Incoming message type
M2: Outgoing message type
Master computation can choose the computation class to execute for the
current superstep
Superstep splitting
-
Messaging patterns can exceed the available memory on the destination
vertex owner
Many messages can not be aggregated
-
-
Mutual friends
- Each vertex has to send all its neighbors the vertex ids of its neighborhood
Calculating strength of a relationship
- 850GB when using 200 workers
Superstep splitting sends a fragment of the message and to a partial
computation
Limitations:
-
Message must be commutative and associative
No single message can overflow the memory buffer of a single vertex
Experimental results
-
Applications
-
-
Speed ups in both CPU time and elapsed time
-
-
Label propagation
PageRank
Friends of friends score
Hive queries
5x - 26x CPU time
8x - 120x elapsed time
Unweighted PageRank on 1.39B user dataset with 1 trillion social connections
-
Less than 3 minutes per iteration with only 200 machines
Operational experience
-
Running Giraph in production at Facebook for over two years
Scheduling
-
-
Graph preparation
-
-
Disabled checkpointing
Errors handled by restarting
Hive tables
HiveIO
Production application workflow
-
Write application and unit test it
Run application on test dataset
Run application at scale
Deploy to production
Thank You!