Building Graphs

CS 201
Computer Systems Programming
Chapter 14
“Heap Space and Graphs”
Herbert G. Mayer, PSU CS
Status 5/21/2013
1
Syllabus
 Goal
 Heap Space
 Definition of Graph
 Building a Graph
 Graph Data Structure
 Graph Traversal
 References
2
Goal
Improve students’ understanding and use of dynamic
memory management: using the system heap to
allocate and free data space
Sometimes heap use is explained, implementing linked
lists that can grow and shrink dynamically
For a system programmer that is way too simple; so we
shall exercise heap management by building
directed, cyclic graphs
Our graphs may be unconnected and each vertex may
have any number of successor nodes
Undirected edges are interpreted to be bi-directional
3
Heap Space
Static data space of a program exists during whole execution;
created at program load time, returned at program exit() time. In
Unix the .bss section, initialized to 0 by default; can overwrite
Automatic space is data space created by the run-time system.
Each time a new function scope is entered, automatic space is
allocated off the stack (could also be from the heap). Part of the
return operation is to free such previously granted automatic
space. AKA dynamic data space
Controlled space is data space allocated by a specific request to
allocate; e.g. the C function malloc() command. Some
languages (Ada, Java) use a built-in new operator to acquire
heap space. Space can be de-allocated (freed) automatically
(Java) or by the explicit command free(). On a run-time
system requiring a programmer to code the free() action, it is
possible to “forget” to call free, and consequently such memory
space will never be returned until main() exit; creates holes
We shall build graphs using heap space
4
Heap Space
The lifetime of an object in the heap (controlled space) is the time
span from explicit birth (malloc or new) to the moment of
explicit death (free) operation; this is program controlled
Each such heap object is identified by a pointer of suitable type
The lifetime of the pointer is defined by its own storage attribute,
static, automatic, or controlled
Thus it can happen that an object continues to live on the heap –
i.e. consumes heap space– while its identifying pointer has
long ceased to exist
Thus holes in memory are created
If this happens with system SW that runs forever, memory
gradually vanishes and your computer crashes
5
Hole in Heap Space
#include <iostream.h>
typedef int * pi_type;
void hole( void )
{ // hole
pi_type pi = new int;
*pi = 109;
// just to show space exists!
// forgot: free( pi );
return;
// hole is left
} //end hole
int main()
{ // main
hole();
return 0;
} //end main
6
Formal Definition of Graph
Graph: A graph G is a data structure G = { V, E } consisting of a set E of edges
and a set of V vertices, AKA nodes. Any node vi ϵ V may be connected to any
other node vj. Such a connection is called an edge. Edges may be directed, even
bi-directed. Different from a tree, a node in G may have any number of
predecessors –or incident edges; that is THE main difference of graph vs. tree
Empty Graph: For expediency we ignore the possibility of a graph G being empty; in
an empty graph the data structure that points the graph is simply NIL
Connected Graph: If all n > 0 nodes vn in G are connected somehow, the graph G is
called connected, regardless of edge directions
Strongly Connected Component: A subset SG ⊆ G is strongly connected, if for every
node vi in SG, i > 0, any node vi can reach all vi nodes in SG somehow
Directed Acyclic Graph (DAG): A DAG is a graph with directed edges but no cycles. A
node may still have multiple predecessors
When programming graphs, it is convenient to add fields to the node type for auxiliary
functions; e.g. it is possible to process all nodes in a linear fashion by adding a
link field, often called a “finger” or “link”

Possible use: traversing all nodes in G, though G may be unconnected!
7
Building a Graph
8
Graph Data Structure
A graph G( v, e ) consists of nodes v and edges e

Implemented via some node_type data structure
G is identified and thus accessible via one select node,
called entry node, or simply entry, AKA head
Head is of type pointer to node_type
G is not necessarily connected

If parts of G are unconnected, how can they be retrieved in
case of a necessary, complete graph traversal?
Several methods of forcing complete traversal:


Either create a super-node, not part of G proper, in a way that
each unconnected region is pointed at, thus reachable
Or have a linked-list (LL) meandering through each node of
G, without this link field being part of G proper; e.g. “finger”
9
Sample Graph G0
G0
1
R
2
3
Y
G
B
O
5
V
6
10
4
Graph Data Structure
Sample Graph G0 above has 6 nodes
The ID of a node, AKA name of each node is shown next
to the nodes, e.g. 1 2 3 4 5 …
The graph’s node type data structure includes such
name information as part of node_type
In addition, each node in G0 has attributes, such as R, G,
Y etc. in the sample above; such attributes can be
arbitrarily complex consisting of many fields
There may be more attributes belonging to each node,
depending on what the graph is used for
Any of these attributes must also be declared in the
node_type data structure
Successors, if any, of each node must be encoded in the
node somehow; there is no upper limit on the number!
G0 has 3 SCCs; 2 of those are trivial, i.e. not interesting!
11
Graph Data Structure
Since in there is no inherent upper bound on the
number of successor nodes, a suitable way to define
successors is via a linked list of tuples, AKA link
nodes
Thus a possible data type for successor nodes is a
pointer to a link node
Link nodes can be allocated off the heap, as needed;
they are not of type node_type, but of link_type
And each link is a tuple, i.e. consists of just 2 fields:


One field pointing to the next link; the type is pointer to
link_type, in some languages expresses as *link_type
The other field pointing to the successor node; the type is
pointer to node_type
For convenience, the last link inserted is added at the
head of the list, simplifying searches for list end
12
Graph Traversal
Graph G with i nodes may be unconnected, and yet
each unconnected region is part of G
Algorithm may require visiting each node ni ϵ G
Requires additional data structure to visit node ni
Moreover, any full traversal of G must start with some
node of G; which one?
13
Graph Traversal: Fixed Array
Specify fixed-sized array a[] of k>=i node pointers, all
initially null, not necessarily al used, in which each
used element a[i], i < k points to node ni
Say, in increasing order of indices, starting at index 0
As soon as a[i] yields a null pointer, no more nodes
are identified in a[]
Thus G is known to have i nodes
Advantage:
•
Simple to implement
Disadvantage:
•
•
Wasted space almost always, since a[] needs to be large
Too limited, the method fails if i >= k
14
Graph Traversal Fixed Array
m
a
z
b
k
15
Graph Traversal Linked List
Instead of fixed array, use a linked list of link-elements
Each element has 2 fields
Field next points to next link element, if any
Field finger points to its node ni ϵ G
Link element is allocated dynamically off the heap only
when another node ni in G is added
Advantage:
•
•
•
Also simple to implement
Space consumption is directly proportional to the number of
nodes in G; thus no wasted space
Runs out of space only when all memory space is exhausted
Disadvantage:
•
•
Is a separate data structure from graph G
Per link element, i.e. per node, 2 scalar fields are consumed
16
Graph Traversal Added Field
Extend the data structure of the graph G
Keep all fields needed for the graph proper, but add a
finger field, of type pointer to node
Like the other 2 methods, this will require an outside
data structure, of type pointer to node , initially nil, to
point to any of the nodes ni ϵ G
Thus all finger fields form a linked list of graph nodes
Advantage:
•
•
•
•
Also simple to implement
Space consumption is directly proportional to the number of
nodes in G
Runs out of space only when all memory space is exhausted
Requires no additional data structure, just 1 more field
Disadvantage:
•
none
17
Graph Data Structure, Link
//
//
//
//
//
node may have any number of successors
all need to be retrievable.
so each node in G has a link pointer,
pointing to LL of all successor nodes.
Last node connected will be inserted at head
typedef struct link_tp * link_ptr_tp;
typedef struct link_tp
{
link_ptr_tp next_link;
node_ptr_tp next_node;
} str_link_tp;
// forward ref
// point to next link
// point to successor node
#define LINK_SIZE sizeof( str_link_tp )
18
Graph Data Structure, Node
//
//
//
//
"name" is arbitrary number given during creation
"link" is head of LList of successor nodes, while
"finger" is linear link through all nodes
"visited” only true after visit; initially FALSE
typedef struct
. . .
typedef struct
{
link_ptr_tp
node_ptr_tp
int
bool
others ...
} str_node_tp;
node_tp * node_ptr_tp;
node_tp
link;
finger;
name;
visited;
//
//
//
//
//
points to LL of successors
finger through all nodes
name given at creation
to check connectivity
other fields: attributes
#define NODE_SIZE sizeof( str_node_tp )
19
Building a Graph, one Node
//
//
//
//
create a node in graph G, identified by “name”
connect to the global “finger” at head of LList
assumption: no such node “name” exists in graph
assume: global “finger”, initially NULL
node_ptr_tp make_node( int name )
{ // make_node
node_ptr_tp node = (node_ptr_tp) malloc( NODE_SIZE );
// check once non-Null, not repeatedly on user side!
ASSERT( node, "space for node missing" );
node->finger
= finger;
// re-assign glob finger!!
node->link
= NULL;
// pointer type
node->name
= name;
// IDs this node
node->visited = FALSE;
// initially
finger
= node;
// now link to “this”
return node;
} //end make_node
20
Building a Graph from Pairs
// input is list of pairs, each element being a node name
// craft edge from first to second; simpler than HW5!!
// If a node is new: create it; else use ptr = exists()
while( scanf( "%d%d", &a, &b ) ) {
// a, b are ints
if ( ! ( first = exists( a ) ) ) {
// ‘a’ new node?
first = make_node( a );
// allocate ‘a’
} //end if
if ( ! ( second = exists( b ) ) ) { // ‘b’ new node?
second = make_node( b );
// allocate ‘b’
} //end if
// both exist. Either created, or pre-existed: Connect!
if ( new_link( first, second ) ) {
link = make_link( first->link, second );
ASSERT( link, "no space for link node" );
first->link = link;
}else{
// link was there already, no need to add again!
printf( "<><> skip duplicate link %d->%d\n", a, b );
} //end if
} //end while
21
Building a Graph from Pairs
// check, whether link between 2 nodes already exists
// if not, return true: New! Else return false, NOT new!
bool new_link( node_ptr_tp first, node_ptr_tp second )
{ // new_link
int target
= second->name;
link_ptr_tp link = first->link;
while ( link ) {
if ( target == link->next_node->name ) {
return FALSE; // it is an existing link, NOT new
} //end if
// check next node; if any
link = link->next_link;
} //end while
// none of successors equal the second node's name
return TRUE;
// is a new link
} //end new_link
22
References
1. Control Flow Graph, in: Mayer, H. “Parallel
Execution Enabled by Refined Source Analysis:
Cost and Benefits in a Supercompiler”, R.
Oldenbourg Verlag München/Wien, March 1997
2. Graphs in: C. Berge, “Graphs and Hypergraphs”,
North-Holland, Amsterdam 1973
3. SCCs: Robert Tarjan, "Depth-First Search and Linear
Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2,
June 1972
23