ITCS 2214 Exam 3 Study Guide Describe the characteristics of a 2

ITCS 2214 Exam 3 Study Guide
1.
•
–
–
•
–
–
–
–
Describe the characteristics of a 2-node in a 2-3 tree. Describe a 3-node.
A 2-node in a 2-3 tree contains one element and either has no children or two children
Elements of the left sub-tree less than the element
Elements of the right sub-tree greater than or equal to the element
A 3-node contains two elements, one designated as the smaller and one as the larger. A 3-node has
either no children or three children. If a 3-node has children then
Elements of the left sub-tree are less than the smaller element
The smaller element is less than or equal to the elements of the middle sub-tree
Elements of the middle sub-tree are less then the larger element
The larger element is less than or equal to the elements of the right sub-tree
2. Describe the four possible cases when inserting a new element into a 2-3 tree. Be able to diagram
the steps when inserting a new element.
Insertion has three cases
a. Tree is empty (in which case the new element becomes the root of the tree)
b. Insertion point is a 2-node
c. Insertion point is a 3-node
•
The first of these cases is trivial with the element inserted into a new 2-node that becomes the root of
the tree
•
The second case occurs when the new element is to be inserted into a 2-node
•
In this case, we simply add the element to the leaf and make it a 3-node
3. Describe the three possible cases when removing an element from a 2-3 tree. Be able to diagram the
steps when removing an element.
Document1
4/23/2012
1
Removal of elements is also made up of three cases:
a. The element to be removed is in a leaf that is a 3-node
b. The element to be removed is in a leaf that is a 2-node
c. The element to be removed is in an internal node
Document1
4/23/2012
2
4. Contrast a 2-3 tree and a 2-4 tree.
2-4 Trees are very similar to 2-3 Trees adding the characteristic that a node can contain three
elements. A 4-node contains three elements and has either no children or 4 children. We refer to the
maximum number of children of each node as the order of a B-Tree. Thus 2-3 trees are 3 B-trees and
2-4 trees are 4 B-trees. 2-3-4 tree has an advantage over 2-3 trees in that insertion and deletion can
be performed by a single root-to-leaf pass rather than by a root-to-leaf pass followed by a leaf-toroot pass. So the corresponding algorithms in 2-4 trees are simpler than those of 2-3 trees. Hence 2-4
trees can be represented efficiently as a binary tree. It would result in a more efficient utilization of
space.
5. What is the rationale for B-trees in regard to computer implementation, hard disk storage and
memory?
A B-tree of height h with all its nodes completely filled has n=mh−1 entries. Hence, the best case
height of a B-tree is:
The worst case height of a B-tree (where the root node is considered to have height 0) as
Document1
4/23/2012
3
For a B-Tree stored on disk, the size of a node is generally taken to be one page of disk memory.
The degree, d, of the tree is chosen so that 2d items and their associated pointers can just t in
one page. This means that d will be rather large and therefore the height of the tree will be very
small. If d is 1000, for example, a B-Tree of height three will hold more than a billion items. The
number of disk accesses needed to perform a typical search, insert, or delete operation is equal to the
height of the tree (assuming that the root node is always kept in memory), so the number of disk
accesses needed to perform an operation even on a very large tree is quite small.
6. What is the order of a B+ tree?
The order, or branching factor, b of a B+ tree measures the capacity of nodes (i.e., the number of
children nodes) for internal nodes in the tree. The actual number of children for a node, referred to
here as m, is constrained for internal nodes so that
. The root is an exception: it
[1]
is allowed to have as few as two children. For example, if the order of a B+ tree is 7, each internal
node (except for the root) may have between 4 and 7 children; the root may have between 2 and 7.
7. Describe the implementation characteristics of a B+ tree.
All data is stored at the leaf nodes (leaf pages); all other nodes (index pages) only store keys Leaf
pages are linked to each other. Keys may be duplicated; every key to the right of a particular key is
>= to that key.
8. Given a set of data objects, describe how the objects would be inserted into a B+ tree and the
changes that would occur within the tree. Describe how the tree would change when objects are
deleted.
Insertion:
Perform a search to determine what bucket the new record should go into.

If the bucket is not full (at most b - 1 entries after the insertion), add the record.

Otherwise, split the bucket.

Allocate new leaf and move half the bucket's elements to the new bucket.

Insert the new leaf's smallest key and address into the parent.

If the parent is full, split it too.

Add the middle key to the parent node.

Repeat until a parent is found that need not split.
Document1
4/23/2012
4

If the root splits, create a new root which has one key and two pointers. (That is, the value that gets
pushed to the new root gets removed from the original node)
B-trees grow at the root and not at the leaves
Deletion:

Start at root, find leaf L where entry belongs.

Remove the entry.

If L is at least half-full, done!

If L has fewer entries than it should,

Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

If re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

Merge could propagate to root, decreasing height.
9. Contrast a non-indexed file to a file with a linear index. What are the advantages of an ISAM
structure over a linear index? What are the advantages of a B/B+ index over ISAM?
A linear index is an index file organized as a sequence of key/pointer pairs where the keys are in
sorted order and the pointers either (1) point to the position of the complete record on disk, (2) point
to the position of the primary key in the primary index, or (3) are actually the value of the primary
key. On the other hand, a non-indexed file is a computer file without an index and hence does not
allow for easy random access to a record given a file key.
The main problem with the linear index is that it is a single, large array that does not adjust to
updates because a single update can require changing the position of every key in the index.
The B-tree storage structure has the following advantages over ISAM:
 B-tree is essential in tables that are growing at a rate that quickly causes overflow in an ISAM
structure (for example, situations where there are ever-increasing keys).
 B-tree is better when sorting on the key is required, because sequential access (for example, select
* from emp) to data in B-tree is automatic; there is no need to add a sort clause to queries, if you are
sorting on the primary key. Btree also eliminates sorting of the joining column when joining on key
columns; sort-merge queries are more efficient if the tables joined are B-tree.
10. What is a directed graph? An undirected graph?
11. Regarding graphs:
adjacent
edge
Document1
4/23/2012
5
complete
path
connected
cyclic/acyclic
vertex
directed
undirected

Two adjacent edges are edges with a common vertex.

In a complete graph, each pair of vertices is joined by an edge; that is, the graph contains all possible







edges.
In graph theory, a path in a graph is a finite or infinite sequence of edges which connect a sequence
of vertices which, by most definitions, are all distinct from one another.
A graph which is connected in the sense of a topological space, i.e., there is a path from any point
to any other point in the graph.
In graph theory, a cycle graph or circular graph is a graph that consists of a single cycle, or in other
words, some number of vertices connected in a closed chain.
An acyclic graph is a graph with no directed cycles.
A vertex is the fundamental unit of which graphs are formed: an undirected graph consists of a set of
vertices and a set of edges, while a directed graph consists of a set of vertices and a set of arcs.
A directed graph is a graph, or set of nodes connected by edges, where the edges have a direction
associated with them.
An undirected graph is one in which edges have no orientation.
12. What is a network? Contrast a network and a graph.
The term "network" is also used in several ways, including:


an interconnected system of things (inanimate objects or people)
a specialised type of graph (the mathematical concept)
The things in a network are people/personas and relationships between them. A graph is a model (or
representation, or description, etc) of that network; a graph contains only things which "refer to"
those things in the network. Fundamentally, the network is the sum of our living, changing
relationships. The graph is a representation of the relationships. Network is to person as graph is to
snapshot.
13. Contrast a graph and a tree.
Trees
Path
Document1
4/23/2012
Graphs
Tree is special form of graph i.e. minimally
In graph there can be more than one
connected graph and having only one path
i.e. graph can have uni-directional o
6
between any two vertices.
directional paths (edges) between no
Tree is a special case of graph having no loops,
Graph can have loops, circuits as we
no circuitsand no self-loops.
can have self-loops.
In tree there is exactly one root node and
In graph there is no such concept
every child have only one parent.
of root node.
Parent Child
In trees, there is parent child relationship so flow
In Graph there is no such parent chi
relationship
can be there with direction top to bottom or vice
relationship.
Loops
Root Node
versa.
Trees are less complex then graphs as having no
Graphs are more complex in compa
cycles, no self-loops and still connected.
trees as it can have cycles, loops etc
Types of
Tree traversal is a kind of special case of
Graph is traversed by DFS: Depth F
Traversal
traversal of graph. Tree is traversed in Pre-
Search and in BFS : Breadth First
Order, In-Order and Post-Order(all three in
Search algorithm
Complexity
DFS or in BFS algorithm)
Connection
Rules
DAG
In trees, there are many rules / restrictions for
In graphs no such rules/ restrictions
making connections between nodes through
there for connecting the nodes throu
edges.
edges.
Trees come in the category of DAG : Directed
Graph can be Cyclic or Acyclic.
Acyclic Graphs is a kind of directed graph that
have no cycles.
Different
Types
Applications
Different types of trees are : Binary Tree ,
There are mainly two types of Grap
Binary Search Tree, AVL tree, Heaps.
:Directed and Undirected graphs.
Tree applications : sorting and searching like
Graph applications : Coloring of ma
Tree Traversal & Binary Search.
in OR (PERT & CPM), algorithms
Graph coloring, job scheduling, etc.
No. of edges
Tree always has n-1 edges.
In Graph, no. of edges depend on th
graph.
Model
Document1
4/23/2012
Tree is a hierarchical model.
Graph is a network model.
7
Figure
14. Be able to describe the steps in traversing a graph breadth-first. Depth-first.
breadth first search:






Put the root node on a queue;
while (queue is not empty) {
remove a node from the queue;
if (node is a goal node) return success;
put all children of node onto the queue;
}
return failure;
Just before starting to explore level n, the queue holds all the nodes at level n-1
In a typical tree, the number of nodes at each level increases exponentially with the depth
Memory requirements may be infeasible
When this method succeeds, it doesn’t give the path
There is no “recursive” breadth-first search equivalent to recursive depth-first search
depth first search:

Put the root node on a stack;
while (stack is not empty) {
remove a node from the stack;
if (node is a goal node) return success;
put all children of node onto the stack;
}
return failure;

o
o

At each step, the stack contains some nodes from each of a number of levels
The size of stack that is required depends on the branching factor b
While searching level n, the stack contains approximately b*n nodes
When this method succeeds, it doesn’t give the path
15. What is a minimum spanning tree?
Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a tree and
connects all the vertices together. A minimum spanning tree (MST) or minimum weight
Document1
4/23/2012
8
spanning tree is a spanning tree with weight less than or equal to the weight of every other spanning
tree.
16. Describe an adjacency matrix. What is its use?
An adjacency matrix is a means of representing which vertices (or nodes) of a graph are adjacent to
which other vertices. Specifically, the adjacency matrix of a finite graph G on n vertices is the n ×
n matrix where the non-diagonal entry aij is the number of edges from vertex i to vertex j, and the
diagonal entry aii, depending on the convention, is either once or twice the number of edges (loops)
from vertex i to itself.
17. Describe how to traverse a graph breadth first. Depth first. Be able to list the steps of both types of
traversal.
Same as 14
18. What is a minimum spanning tree (MST)?
Same as 15
19. Contrast the Prim and Kruskal algorithms for determining an MST. Be able to list the steps for
applying both Prim and Kruskal to a specific network.
Kruskal’s:



Arrange all edges in a list (L) in non--decreasing order
Select edges from L, and include that in set T, avoid cycle.
Repeat 3 until T becomes a tree that covers all vertices
Prims:

Initialize a tree with a single vertex, chosen arbitrarily from the graph.

Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the
minimum-weight edge, and transfer it to the tree.

Repeat step 2 (until all vertices are in the tree).
Compare Prim and Kruskal
• Both have the same output : MST
• Kruskal’s begins with forest and merge into a tree
• Prim’s always stays as a tree
• If you don’t know all the weight on edges use Prim’s algorithm
• If you only need partial solution on the graph E use Prim’s algorithm
Document1
4/23/2012
9
Complexity
Kruskal: O(NlogN) comparison sort for edges
Prim: O(NlogN) search the least weight edge for every vertex.
Document1
4/23/2012
10