Graph Algorithms Using Map

Graph Algorithms Using
Map-Reduce
By
Team-6
Introduction:
Breadth First Search
•
•
•
Breadth-first search (BFS) is a general technique for traversing a graph.
BFS on a graph with n vertices and m edges takes O(n + m ) time.
Algorithm:
– Input: Simple Connected directed graph with ‘n’ vertices and the node to be searched.
– Output: if node is found “Yes” is printed and the corresponding path is displayed
else “No” is printed.
BFS Using Map-Reduce Framework( Hadoop Impl ):
•
Graph is represented as adjacency list.
• Key: Node ID
• Value: EDGES|DISTANCE_FROM_SOURCE|COLOR|
• where EDGES is a comma delimited list of the ids of the nodes that are
connected to this node. in the beginning, we do not know the distance
and will use Integer.MAX_VALUE for marking "unknown". color tells us
whether or not we've seen the node before, so this starts off as white.
– Eg:
Key Value
1
2,5|0|GRAY|
2
1,3,4,5|Integer.MAX_VALUE|WHITE|
3
2,4|Integer.MAX_VALUE|WHITE|
4
2,3,5|Integer.MAX_VALUE|WHITE|
5
1,2,4|Integer.MAX_VALUE|WHITE|
BFS Continued…(2)
• Map Function:
– For each gray node, the mappers emit a new gray node, with distance =
distance + 1. they also then emit the input gray node, but colored black.
(once a node has been exploded, we're done with it.) mappers also emit
all non-gray nodes, with no change. so, the output of the first map
iteration would be
1
2,5|0|BLACK|
2
NULL|1|GRAY|
5
NULL|1|GRAY|
2
1,3,4,5|Integer.MAX_VALUE|WHITE|
3
2,4|Integer.MAX_VALUE|WHITE|
4
2,3,5|Integer.MAX_VALUE|WHITE|
5
1,2,4|Integer.MAX_VALUE|WHITE|
Note: When the mappers "explode" the gray nodes and create a new node for
each edge, they do not know what to write for the edges of this new node - so
they leave it blank
BFS Continued…(3)
• Reduce Function:
– Reducers, receives all data for a given key - in this case it means that
they receive the data for all "copies" of each node.
– for example, the reducer that receives the data for key = 2 gets the
following list of values :
2
NULL|1|GRAY|
2
1,3,4,5|Integer.MAX_VALUE|WHITE|
– the reducers job is to take all this data and construct a new node using
• the non-null list of edges
• the minimum distance
• the darkest color
BFS Continued…(4)
• Iterations
– using this logic the output from our first iteration will be :
1
2,5,|0|BLACK
2
1,3,4,5,|1|GRAY
3
2,4,|Integer.MAX_VALUE|WHITE
4
2,3,5,|Integer.MAX_VALUE|WHITE
5
1,2,4,|1|GRAY
– the second iteration uses this as the input and outputs :
1
2,5,|0|BLACK
2
1,3,4,5,|1|BLACK
3
2,4,|2|GRAY
4
2,3,5,|2|GRAY
5
1,2,4,|1|BLACK
– and the third iteration outputs:
1
2,5,|0|BLACK
2
1,3,4,5,|1|BLACK
3
2,4,|2|BLACK
4
2,3,5,|2|BLACK
5
1,2,4,|1|BLACK
– subsequent iterations will continue to print out the same output.
Issues Addressed..
• When should we terminate search?
– Case1:if all the vertices are visited i.e colored
black.(Here Output is “No”)
– Case2:When mapper finds the destined node with
color gray(i.e when mapper visits destined
node for first time).(Here Output is “Yes”)
• In above cases crux is usage of shared variables
between mapper/reducer and main program.
Issues Addressed..
• Shared Variables are not supported in hadoop.
We Addressed this issue by serializing mapper’s
object to HDFS and deserializing those objects
in main program.
Further Enhancement:
- Implementing BFS for disconnected graph.
Depth First Search
•DFS algorithm traverses the graph by starting
at root(some node selected as root) and
explores as far as possible along each branch
before backtracking.
Shortest Path
• BFS Guarantees to find the shortest path to
the destined node if it exists in the graph.
• Idea:
– Check if destined node exists in the graph by
doing a search(either BFS or DFS).
– If exists path is printed.(Path will be saved while
doing search, that path will be printed if target
node is found ,else saved path will be discarded).
Any Queries??
Thank You