ID1020 vt14-p1 Lab 2, Due: 23.59 CET, Friday 19th September

ID1020 vt14-p1
Lab 2,
Due: 23.59 CET, Friday 19th September 2014
Examination: This lab will be graded based on the document you
submit to Bilda.
Submission: Submit a PDF file containing your answers using Bilda.
Q.1
Examine the following code snippet. What is the worst case time complexity (order of growth of the
worst case running time ) it as a function of the input size, N?
int total = 0;
for (int i = 1; i > N; i = i++)
for (int j = i; j < N; j++)
total++;
(20 pts)
Q.3
Examine the following code snippet. What is the worst case time complexity (order of growth of the
worst case running time ) it as a function of the input size, N?
int total = 0;
for (int i = N*N; i > 1; i = i/2)
for (int j = 0; j < i; j++)
total++;
(20 pts)
Q.4
Imagine that you wrote a program and measured its execution time as a function of N. The results of
your execution are shown in the table below.
N
time (seconds
=============================
64
128
256
512
1024
2048
4096
8192
16384
0.000
0.000
0.001
0.009
0.060
0.415
2.601
17.101
110.921
32768
65536
712.105
4630.283
You can assume that the program's running time follows a power law T(N) ~ a N^b. Estimate the
order of growth of the running time as a function of N. In your solutin, you will need to produce the
constant b. You will be given a corret grade if your your solution is within 1% of the target answer. As
such, it is advisable to give your solution with at least two digits after the decimal separator, e.g.,
2.81.
(20 pts)
Q4. Memory Complexity Analysis
The Hadoop Filesystem (HDFS) is used at Yahoo, Facebook, Twitter, and many other companies to
store Big Data, that is, huge volumes of data. Facebook have a HDFS cluster that stores up to 100 PB
of data. In HDFS, there is a single machine, called the NameNode, that stores meta information about
files and all of this meta information must fit on the heap of a single JVM. With current JVM
technology, there is a practical limit of 100 GB on the size of a JVM heap, before stop-the-world
garbage collection events make it unusable.
There are 2 data structures used in the NameNode that make up the vast amount of memory it uses.
These are


Inode
BlockInfo
The relationship between these data-structures is as follows. Firstly, an inode represents a directory
or file. If the inode is a file, it may contain a variable number of blocks (from 1..N). A block contains
the actual file’s data. Blocks, in their turn, may be replicated. We can assume, for simplicity, that
each block has 3 Replicas. BlockInfo holds references to datanodes containing replicas for this block.
To get the big picture there are some other relevant classes in the namenode; BlocksMap which
holds a reference to all blocks in the cluster. For each datanode in the cluster The namenode has a
Datanodedescriptor object which contains information about this datanode also it holds a reference
to the blocks belonging to it.
For simplicity we kept only non static members of the classes as shown below:
abstract class INode implements Comparable<byte[]> {
protected byte[] name;
protected INodeDirectory parent;
protected long modificationTime;
protected long accessTime;
}
class INodeFile extends INode implements BlockCollection {
private long header;
private BlockInfo[] blocks;
}
class INodeDirectory extends INode {
private List<INode> children;
}
public class Block implements Writable, Comparable<Block> {
private long blockId;
private long numBytes;
private long generationStamp;
}
public class BlockInfo extends Block implements
LightWeightGSet.LinkedElement {
private BlockCollection bc;
/** For implementing {@link LightWeightGSet.LinkedElement}
interface */
private LightWeightGSet.LinkedElement nextLinkedElement;
/**
* This array contains triplets of references.
* For each i-th datanode the block belongs to
* triplets[3*i] is the reference to the DatanodeDescriptor
* and triplets[3*i+1] and triplets[3*i+2] are references
* to the previous and the next blocks, respectively, in the
* list of blocks belonging to this data-node.
*/
private Object[] triplets;
}
Note: Classes are taken from https://github.com/apache/hadoop-common/tree/branch-2.0.4-alpha
(a) Given the class definitions, calculate the memory requirements for each of the classes Inode,
BlockInfo .
(b) Assume a NameNode JVM capacity of 100 GB. Given a ratio of Inodes to Blocks of 1.5 (that is,
1.5 blocks per Inode), and assuming 3 Replicas of each Block, how many Inodes can fit on the
heap of the NameNode’s JVM? (This is the same as asking what the maximum number of
files that a HDFS filesystem can support).
(40 points)

Download Report

ID1020 vt14-p1 Lab 2, Due: 23.59 CET, Friday 19th September

Paperzz.com

Your Paperzz