91.404 : Depth First Search

GRAPH TRAVERSALS
• Depth-First Search
91.404 :
Depth First Search
Exploring a Labyrinth Without Getting Lost
• A depth-first search (DFS) in an undirected graph
G is like wandering in a labyrinth with a string and a
can of red paint without getting lost.
• We start at vertex s, tying the end of our string to the
point and painting s “visited”. Next we label s as our
current vertex called u.
• Now we travel along an arbitrary edge (u,v).
• If edge (u,v) leads us to an already visited vertex v
we return to u.
• If vertex v is unvisited, we unroll our string and move
to v, paint v “visited”, set v as our current
vertex, and repeat the previous steps.
• Breadth-First Search
•
Eventually, we will get to a point where all incident
edges on u lead to visited vertices. We then
backtrack by rolling our string to a previously
visited vertex v. Then v becomes our current
vertex and we repeat the previous steps.
• Then, if all incident edges on v lead to visited
vertices, we backtrack as before. We continue to
backtrack along the path we have traveled,
finding and exploring unexplored edges, and
repeating the procedure.
• When we backtrack to vertex s and there are no more
unexplored edges incident on s, we have finished our
DFS search.
Adjacency List
Graph Representations
A
Adjacency matrix : each edge uses 2 bits in representation
( 1 for ij entry , 1 for ji entry)
Adjacency list : linked lists - representation in list impacts on
the order of searches.
if G is dense ( i.e. | E | ~ | V |2 ) as about | V |2 ops. req'd
to read in edges
if sparse, initializing matrix is expensive, so use list model.
B
F
C
B
G
C
A
D
A
E
F
E
F
G
F
D
G
A
E
D
H
E
A
I
I
J
H
K
K
L
M
L
J
M
J
M
J
L
DFS :
• visit every node and check every edge in graph
• systematically move as far from the root node as quickly
as possible
• choose a "nearby " node only when reach stalemate
Adjacency Matrix
DFS Search Forest :
2
1
Textbook Approach :
Divide nodes into 3 colors :,,
• White - Not yet seen nodes,
• Grey - nodes partially visited ,
• Black - nodes with visit completed
3
7
4
6
5
1
1
2
3
4
5
6
7
0
1
0
0
1
1
0
2
3 4 5
6 7
1
0
1
0
0
0
1
0
1
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
1
0
1
1
0
0
1
0
1
1
0
1
0
1
1
0
0
Theorem : DFS of graph represented
with an adjacency list requires time
proportional
to
V+E,
i.e.
O ( V + E ).
Pf. For each Vertex, examine each
edge twice.
J
K
A
L
F
B
M
H
E
I
G
D
C
Theorem : DFS for graph with adjacency matrix
representation requires time proportional to V 2 .
If think of it as a stack ( almost ok ) then get ( for this graph)
Pf. DFS corresponds to scanning rows of adj. matrix to
locate a 1 and then jump to appropriate row to continue
scan.
⎛1
⎜
⎜1
⎜1
⎜
⎜0
⎜0
⎜
⎜1
⎜
⎜1
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
1
1
0
1
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0⎞
⎟
0⎟
0⎟
⎟
0⎟
0 ⎟⎟
0⎟
⎟
0⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
G E D
B B B B
C C C C C
A F F F F F H I J K K K
A
B
B
START 1 2 3 …
F
D
E
G
Jump from F in row 1 to row F and scan to A then D
DFS search on graph is a generalization of tree traversal
and when applied to a tree, generate tree traversal.
i.) edges AF,AC,AB,AG discovered, then F,C,B,G onto
stack
ii.) G is popped – it is last on adjacency list for A -- GA,GE
traversed
iii.) E pushed onto stack, A not pushed again
iv.) ED,EG, EF, traversed , D pushed onto stack , …
( DFS done by moving nodes rediscovered to top of stackÎ
more powerful data structure than a stack)
Graph Terms & Definitions
If use a queue rather than a stack, implement a Breadth
First Search
Both DFS and BFS : divide vertices into 3 groups :
–
–
–
M
L L
tree vertices - ones visited and removed from data structure
fringe vertices - vertices adjacent to tree vertices but not yet
visited
unseen vertices - those not yet encountered
• A graph is connected if every pair of vertices is
connected by a path ( in the graph)
• The connected components of a graph are the
equivalence classes of vertices defined by the “ is
reachable from” relation.
A
G
If each tree vertex is connected to the edge that caused it
to be discovered ( added to the data structure), then
these edges form a tree.
C
B
D
F
E
To Search a connected component of a graph :
DFS
•
begin with a vertex on the fringe , all others unseen and
perform the following until all vertices have been visited:
–
move 1 vertex ( x ) from the fringe to the tree and put any yet
unseen vertices adjacent to x on the fringe
–
choose vertex from fringe that was most recently encountered
( stack )
– explores Graph looking for vertices furthest from the
root
– chooses closer vertex only when reaches a dead end
– stores vertices where paths branch in the stack
– stack Ù recursive implementation
Depth-First Search
Algorithm DFS(v);
Input: A vertex v in a graph
Output: A labeling of the edges as “discovery” edges
and “back edges”
for each edge e incident on v do
if edge e is unexplored then
let w be the other endpoint of e
if vertex w is unexplored then
label e as a discovery edge
recursively call DFS(w)
else
label e as a back edge
Depth-First Search(cont.)
Proposition : Let G be an undirected graph on which a DFS
traversal starting at a vertex s has been performed. Then:
1) The traversal visits all vertices in the connected component of s
2) The discovery edges form a spanning tree of the connected
component of s
Justification of 1):
- Use a contradiction argument: suppose there is at least one vertex
v not visited and let w be the first unvisited vertex on some path
from s to v.
- Because w was the first unvisited vertex on the path, there is a
neighbor u that has been visited.
- But when we visited u we must have looked at edge (u, w).
Therefore w must have been visited.
Justification of 2):
- We only mark edges from when we go to unvisited vertices. So we
never form a cycle of discovery edges, i.e. discovery edges form
a tree.
- This is a spanning tree because DFS visits each vertex in the
connected component of s
Running Time Analysis
Marking Vertices
Remember:
- DFS is called on each vertex exactly once.
- Every edge is examined exactly twice, once from each of its
vertices
For ns vertices and ms edges in the connected
component of the vertex s, a DFS starting at s runs
in O(ns +ms) time if:
- The graph is represented in a data structure, like the
adjacency list, where vertex and edge methods take
constant time
- Marking the vertex as explored and testing to see if a vertex
has been explored takes O(1)
- We have a way of systematically considering the edges
incident on the current vertex so we do not examine the
same edge twice.
• Look
at ways to mark vertices in a way that
satisfies the above condition.
• Extend vertex positions to store a variable for
marking
Depth-First Search
• The strategy followed by depth-first search is, as its name
implies, to search "deeper" in the graph whenever possible:
– edges are explored out of the most recently discovered vertex v that
still has unexplored edges leaving it.
• Besides creating a depth-first forest, depth-first search
also timestamps each vertex.
• Each vertex v has two timestamps:
– When all of v's edges have been explored, the search "backtracks"
to explore edges leaving the vertex from which v was discovered.
– the first timestamp d[v] records when v is first discovered (and
grayed), and
– the second timestamp f[v] records when the search finishes
examining v's adjacency list (and blackens v).
– This process continues until we have discovered all the vertices that
are reachable from the original source vertex.
– If any undiscovered vertices remain, then one of them is selected as
a new source and the search is repeated from that source.
• These timestamps are used in many graph algorithms
and are generally helpful in reasoning about the behavior
of depth-first search
– This entire process is repeated until all vertices are discovered.
• These timestamps are integers between 1 and 2 |V|,
since there is one discovery event and one finishing
event for each of the |V| vertices.
• For every vertex u,
•
Whenever a vertex v is discovered during a scan of the adjacency
list of an already discovered vertex u, depth-first search records this
event by setting v's predecessor field p[v] to u.
•
The predecessor subgraph produced by a depth-first search
may be composed of several trees, because the search may be
repeated from multiple sources.
•
The predecessor subgraph of a depth-first search is defined : we let
Gp = (V, Ep), where
Ep = {(p[v], v) : v ε V and p[v] = NIL} .
•
The edges in Ep are called tree edges.
•
Vertices are colored during the search to indicate their state. Each vertex is
initially white, is grayed when it is discovered in the search, and is
blackened when it is finished, that is, when its adjacency list has been
examined completely.
•
This technique guarantees that each vertex ends up in exactly one depthfirst tree, so that these trees are disjoint.
d[u] < f[u] .
– Vertex u is WHITE before time d[u],
– GRAY between time d[u] and time f[u], and
– BLACK thereafter.
• The following pseudocode is the basic depth-first-search
algorithm. The input graph G may be undirected or
directed. The
Depth First Search Algorithm
Example of Depth First Search
DFS(1)
• Recursive marking algorithm
• Initially every vertex is unmarked
2
1
7
DFS(i: vertex)
mark i;
for each j adjacent to i do
if j is unmarked then DFS(j)
end{DFS}
4
6
5
Example Step 2
Example Step 3
DFS(1)
DFS(2)
2
1
3
3
2
1
3
7
7
4
6
4
6
5
5
DFS(1)
DFS(2)
DFS(7)
Example Step 4
2
1
3
Example Step 5
DFS(1)
DFS(2)
DFS(7)
DFS(5)
2
1
3
7
7
4
DFS(1)
DFS(2)
DFS(7)
DFS(5)
DFS(4)
4
6
6
5
5
Example Step 6
2
1
3
7
4
6
Example Step 7
DFS(1)
DFS(2)
DFS(7)
DFS(5)
DFS(4)
DFS(3)
2
1
3
7
4
DFS(1)
DFS(2)
DFS(7)
DFS(5)
DFS(4)
DFS(3)
DFS(6)
6
5
5
Note that the edges traversed in the depth first
search form a spanning tree.
Procedure DFS works as follows:
• Lines 1–3 paint all vertices white and initialize their p fields to NIL. Line 4
resets the global time counter.
•
•
Lines 5-7 check each vertex in V in turn and, when a white vertex is
found, visit it using DFS-VISIT. Every time DFS-VISIT(u) is called in line
7, vertex u becomes the root of a new tree in the depth-first forest. When
DFS returns, every vertex u has been assigned a discovery time d[u] and
a finishing time f[u].
In each call DFS-VISIT(u), vertex u is initially white. Line 1 paints u gray,
and line 2 records the discovery time d[u] by incrementing and saving the
global variable time.
•
Lines 3-6 examine each vertex v adjacent to u and recursively visit v if it is
white. As each vertex v e Adj[u] is considered in line 3, we say that edge
(u, v) is explored by the depth-first search.
•
Finally, after every edge leaving u has been explored, lines 7-8 paint u
black and record the finishing time in f[u].
Properties of depth-first search
•
Depth-first search yields much information about the structure of a
graph. Perhaps the most basic property of depth-first search is that the
predecessor subgraph Gp does indeed form a forest of trees, since the
structure of the depth-first trees exactly mirrors the structure of recursive
calls of DFS-VISIT.
– u = p[v] if and only if DFS-VISIT (v) was called during a search
of u's adjacency list.
What is the running time of DFS?
•
The loops on lines 1-2 and lines 5-7 of DFS take time Θ(V),
exclusive of the time to execute the calls to DFS-VISIT.
•
The procedure DFS-VISIT is called exactly once for each vertex the
v ε V, since DFS-VISIT is invoked only on white vertices and the first
thing it does is paint the vertex gray.
•
During an execution of DFS-VISIT(v), the loop on lines 3-6 is
executed |Adj[v]| times.
•
Since the total cost of executing lines 2-5 of DFS-VISIT is Θ(E). The
running time of DFS is therefore Θ(V + E).
Theorem 22.7
In any depth-first search of a (directed or undirected) graph G = (V, E), for
any two vertices u and v, exactly one of the following three conditions
holds:
– the intervals [d[u], f[u]] and [d[v], f[v]] are entirely disjoint,
– the interval [d[u], f[u]] is contained entirely within the interval [d[v], f[u]], and
u is a descendant of v in the depth-first tree, or
– the interval [d[v], f[v]] is contained entirely within the interval [d[u], f[u]], and
v is a descendant of u in the depth-first tree.
Corollary 22.8
•
Another important property of depth-first search is that discovery and
finishing times have parenthesis structure.
– If we represent the discovery of vertex u with a left parenthesis "(u" and
represent its finishing by a right parenthesis "u),"
– then the history of discoveries and finishings makes a well-formed
expression in the sense that the parentheses are properly nested.
Vertex v is a proper descendant of vertex u in the depth-first forest for a
(directed or undirected) graph G if and only if d[u] < d[v] < f[v] < f[u].
Theorem 22.9
In a depth-first forest of a (directed or undirected) graph G = (V, E),
vertex v is a descendant of vertex u if and only if at the time d[u] that the
search discovers u, vertex v can be reached from u along a path
consisting entirely of white vertices.
Classification of Edges
Another interesting property of depth-first search is that the search can
be used to classify the edges of the input graph G = (V, E). This edge
classification can be used to glean important information about a
graph. For example a directed graph is acyclic if and only if a depthfirst search yields no "back" edges (Lemma 22.10).
We can define four edge types in terms of the depth-first forest Gp produced by a
depth-first search on G.
1. Tree edges are edges in the depth-first forest Gp. Edge (u, v) is a tree edge
if v was first discovered by exploring edge (u, v).
2. Back edges are those edges (u, v) connecting a vertex u to an ancestor v in
a depth-first tree. Self-loops are considered to be back edges.
3. Forward edges are those non-tree edges (u, v) connecting a vertex u to a
descendant v in a depth-first tree.
4. Cross edges are all other edges. They can go between vertices in the same
depth-first tree, as long as one vertex is not an ancestor of the other, or they
can go between vertices in different depth-first trees.
Theorem 22.10
In a depth-first search of an undirected graph G, every edge of G is
either a tree edge or a back edge.
Proof : Let (u, v) be an arbitrary edge of G, and suppose without loss
of generality that d[u] < d[v]. Then, v must be discovered and
finished before we finish u, since v is on u's adjacency list. If the
edge (u, v) is explored first in the direction from u to v, then (u, v)
becomes a tree edge. If (u, v) is explored first in the direction from v
to u, then (u, v) is a back edge, since u is still gray at the time the
edge is first explored.
Topological sort
A topological sort of a dag G = (V, E) is a linear ordering of all its
vertices such that if G contains an edge (u, v), then u appears
before v in the ordering. (If the graph is not acyclic, then no linear
ordering is possible.)
•
A topological sort of a graph can be viewed as an ordering of its
vertices along a horizontal line so that all directed edges go from left
to right.
•
•
•
•
•
Topological sorting is thus different from the usual kind of "sorting"
Directed acyclic graphs are used in many applications to indicate
precedence among events.
Figure 22.7 gives an example that arises when Professor Bumstead
gets dressed in the morning. The professor must don certain
garments before others (e.g., socks before shoes). Other items may
be put on in any order (e.g., socks and pants).
A directed edge (u,v) in the dag of Figure 22.7(a) indicates that
garment u must be donned before garment v.
A topological sort of this dag therefore gives an order for getting
dressed.
– Figure 22.7(b) shows the topologically sorted dag as an ordering of
vertices along a horizontal line such that all directed edges go from left
to right
We can perform a topological sort in time Q(V + E), since depth-first search
takes Q(V + E) time and it takes 0(1) time to insert each of the |V| vertices
onto the front of the linked list.
The following simple algorithm topologically sorts a dag.
TOPOLOGICAL-SORT(G)
1 call DFS(G) to compute finishing times f[v] for each vertex v,
2 as each vertex is finished, insert it onto the front of a linked
list
return the linked list of vertices.
Lemma 22.11
A directed graph G is acyclic if and only if a depth-first search of G yields
no back edges.
Theorem 22.12
TOPOLOGICAL-SORT(G) produces a topological sort of a
directed acyclic graph G.
•
Proof
Suppose that DFS is run on a given dag G = (V, E) to
determine finishing times for its vertices. It suffices to show that for any
pair of distinct vertices u,v Î V, if there is an edge in G from u to v, then
f[v] < f[u]. Consider any edge (u,v) explored by DFS(G). When this edge
is explored, v cannot be gray, since then v would be an ancestor of u and
(u,v) would be a back edge, contradicting Lemma 22.10. Therefore, v
must be either white or black. If v is white, it becomes a descendant of u,
and so f[v] < f[u]. If v is black, then f[v] < f[u] as well. Thus, for any edge
(u,v) in the dag, we have f[v] < f[u], proving the theorem.