Storage - CIS @ UPenn

Storage
31-Jul-17
Stacks






Stacks obey a simple regimen—last in, first out (LIFO)
When you enter a function or procedure or method, storage is
allocated for you on the stack
When you leave, the storage is released
In Java, this is even more fine-grained—storage is allocated and
deallocated for individual blocks, and even for for statements
Since this is so well-defined, your compiler writes the code to do
it for you
Since virtually every language supports recursion these days
(and all the popular languages do), computers typically provide
machine-language instructions to simplify stack operations
2
Heaps


Stacks are great, but they have their limitations
Suppose you want to write a method to read in an array





You enter the method, and declare the array, thus dynamically
allocating space for it
You read values into the array
You return from the method and POOF! your array is gone
You need something more flexible—something where
you have control over allocation and deallocation
The invention that allows this (which came somewhat
later than the stack) is the heap


You explicitly get storage via malloc (C) or new (Java)
The storage remains until you are done with it
3
Stacks vs. heaps






Stack allocation and deallocation is very regular
Heap allocation and deallocation is unpredictable
Stack allocation and deallocation is handled by the compiler
Heap allocation is at the whim of the programmer
Heap deallocation may also be up to the programmer (C, C++)
or by the programming language system (Java)
Values on stacks are typically small and uniform in size




In Java, arrays and objects don’t go in the stack—references to them do
Values on the heap can be any size
Stacks are tightly packed, with no wasted space
Deallocation can leave gaps in the heap
4
Implementing a heap






A heap is a single large area of storage
When the program requests a block of storage, it is given a pointer
(reference) to some part of this storage that is not already in use
The task of the heap routines is to keep track of which parts of the heap
are available and which are in use
To do this, the heap routines create a linked list of blocks of varying sizes
Every block, whether available or in use, contains header information
about the block
We will describe a simple implementation in
which each block header contains two items of
information:
 A pointer to the next block, and
user gets
 The size of this block
from here
pointer to next
size of block
User data
(an Object)
on down
5
Anatomy of a block

Here is our simple block:
user gets N words
from here (ptr) to
end of block



ptr-2
ptr-1
ptr
ptr+1
ptr+2
:
:
pointer to next
size of block
User data
(an Object)
ptr+N-1
Java Objects hold more information than this (for example, the
class of the object)
Notice that our implementation will return a pointer to the first
word available to the user
Data with negative offsets are header data


ptr-1 contains the size of this block, including header information
ptr-2 will be used to construct a free space list of available blocks
6
The heap, I
free
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
next = 0
size = 20



Initially, the user has no blocks,
and the free space list consists
of a single block
In our implementation, we will
allocate space from the end of
the block
To begin, let’s assume that the
user asks for a block of two
words
7
The heap, II
0 next = 0
1 size = 16
free
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 next = 0
given to 17 size = 4
user
18 ////////////
19 ////////////




The user has asked for a block
of size 2
The “free” block is reduced in
size from 20 to 16 (two words
asked for by the user, plus two
for a new header)
The new block has size 4 and
the next field is not used
Next, assume the user asks for a
block of three words
8
The heap, III
0 next = 0
1 size = 11
free
2
3
4
5
6
7
8
9
10
11 next = 0
given to 12 size = 5
user
13 ////////////
14 ////////////
15 ////////////
16 next = 0
17 size = 4
18 ////////////
19 ////////////




The user has asked for a block
of size 3
The “free” block is reduced in
size from 16 to 11 (three words
asked for by the user, plus two
for a new header)
The new block has size 5 and
the next field is not used
Next, assume the user asks for a
block of just one word
9
The heap, IV
0
1
free
2
3
4
5
6
7
8
given to 9
user
10
11
12
13
14
15
16
17
18
19
next = 0
size = 8


next = 0
size = 3
////////////
next = 0
size = 5
////////////
////////////
////////////
next = 0
size = 4
////////////
////////////


The user has asked for a block
of size 1
The “free” block is reduced in
size from 11 to 8 (one word for
the user, plus two for a new
header)
The new block has size 3 and
the next field is not used
Next, the user releases the
second block (at 13)
10
The heap, V
free
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
next = 0
size = 8



next = 0
size = 3
////////////
next = 2
size = 5
next = 0
size = 4
////////////
////////////
The user has released the block
of size 5
The freed block is added to the
front of the free space list:



Its next field is set to the old
value of free
free is set to point to this block
Next, the user requests a block
of size 4
The first block on the free list
isn’t large enough, so we have
to go to the next free block
11
The heap, VI
0
1
2
given to 3
user
4
5
6
7
8
9
10
11
12
free
13
14
15
16
17
18
19
next = 0
size = 2
next = 0
size = 6
////////////
////////////
////////////
////////////
next = 0
size = 3
////////////
next = 2
size = 5
next = 0
size = 4
////////////
////////////





The user requests a block of
size 3
The size of the first free block
is now 3, and its next field
does not change
The user gets a pointer to the
new block
Now the user releases the
smallest block (at 10)
Again, this will be added to the
beginning of the free space list
12
The heap, VII
free
0 next = 0
1 size = 2
2 next = 0
3 size = 6
4 ////////////
5 ////////////
6 ////////////
7 ////////////
8 next = 13
9 size = 3
10
11 next = 2
12 size = 5
13
14
15
16 next = 0
17 size = 4
18 ////////////
19 ////////////


The user releases the smallest block
(at 10)
The freed block is added to the front of the free
space list:




Its next field is set to the old value of free
free is set to point to this block
Now the user requests a block of size 4
Currently, we cannot satisfy this request




We have enough space, but no single block is
large enough (free space is fragmented)
However, free blocks 10 and 13 are adjacent to
each other
We can coalesce blocks 10 and 13
Coalescing blocks is somewhat expensive,
because adjacent blocks are not necessarily
adjacent nodes in the free space list
13
The heap, VIII
free
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
next = 0
size = 2
next = 0
size = 6
////////////
////////////
////////////
////////////
next = 2
size = 8




Blocks at 10 and 13 have now
been coalesced
The size of the new block is the
sum of the sizes of the old
blocks
We had to adjust the links
Now we can give the user a
block of size 4
next = 0
size = 4
////////////
////////////
14
Declaring variables in Java


In Java, all variables occupy space on the stack
All Objects occupy space on the heap


In Java, you create an object (on the heap) with new
Example of defining a variable whose value is a primitive:


int count = 0;
count is the name of a location on the stack



The name is used by the compiler; it doesn't "really" exist (occupy storage) at run time
The named location occupies memory on the stack; it contains a zero
Example of defining a variable whose value is an object:


Person p = new Person();
p is a variable; it is the name of a location on the stack



That location occupies memory on the stack; it contains a reference to the object
The Person object is on the heap
Thus, Person p = new Person(); allocates space on both the stack and the
heap
15
Pointers

In C and C++ you get a pointer to the new storage; in Java you
get a reference


C and C++ provide operations on pointers


The implementation is identical; the difference is that there are more
operations on pointers than on references
C and C++ let you do arithmetic on pointers, for example, p++;
Pointers are pervasive in C and C++; you can't avoid them
16
Advantages/disadvantages

Pointers give you:





References give you:





Greater flexibility and (maybe) convenience
A much more complicated syntax
More ways to create hard-to-find errors
Serious security holes
Less flexibility (no pointer arithmetic)
Simpler syntax, more like that of other variables
Much safer programs with fewer mysterious bugs
More opportunities for the compiler to optimize the compiled code
Pointer arithmetic is inherently unsafe


You can accidentally point to the wrong thing
You cannot be sure of the type of the thing you are pointing to
17
Deallocation

There are two potential errors when de-allocating (freeing)
storage yourself:

De-allocating too soon, so that you have dangling references (pointers
to storage that has been freed and possibly reused)



A dangling reference is not a null link—it points to something (you just
don’t know what)
Forgetting to de-allocate, so that unused storage accumulates and you
have a memory leak
If you have to de-allocate storage yourself, a good strategy is
to keep track of which function or method “owns” the storage


The function that owns the storage is responsible for de-allocating it
Ownership can be transferred to another function or method


You just need a clearly defined policy for determining ownership
In practice, this is easier said than done
18
Discipline

Most C/C++ advocates say:




It's just a matter of being disciplined
I'm disciplined, even if other people aren't
Besides, there are good tools for finding memory problems
However:

Virtually all large C/C++ programs have memory problems
19
Garbage collection


Garbage is storage that has been allocated but is not longer
available to the program
It's easy to create garbage:



A garbage collector automatically finds and de-allocates garbage




Allocate some storage and save the pointer to it in a variable
Assign a different value to that variable
This is far safer (and more convenient) than having the programmer do it
Dangling references cannot happen
Memory leaks, while not impossible, are pretty unlikely
Practically every modern language, not including C++, uses a
garbage collector
20
Garbage collection algorithms

There are two well-known algorithms (and several not
so well known ones) for doing garbage collection:


Reference counting
Mark and sweep
21
Reference counting

When a block of storage is allocated, it includes header data
that contains an integer reference count


The reference count keeps track of how many references the program
has to that block
Any assignment to a reference variable modifies reference counts




If the variable previously referenced an object (was not null), the
reference count of that object is decremented
If the new value is an object (not null), the reference count for the new
object is incremented
When a reference count reaches zero, the storage can immediately be
garbage collected
For this to work, the reference count has to be at a known
displacement from the reference (pointer)

If arbitrary pointer arithmetic is allowed, this condition cannot be
guaranteed
22
Problems with reference counting

If object A points to object B, and object B points to
object A, then each is referenced, even if nothing else
in the program references either one



This fools the garbage collector, which doesn't collect either
object A or object B
Thus, reference counting is imperfect and unreliable;
memory leaks still happen
However, reference counting is a simple technique and
is occasionally used
23
Mark and sweep

When memory runs low, languages that use mark-andsweep temporarily pause the program and run the garbage
collector




The collector marks every block
It then does an exhaustive search, starting from every reference
variable in the program, and unmarks all the storage it can reach
When done, every block that is still marked must not be
accessible from the program; it is garbage that can be freed
In order for this technique to work,



It must be possible to find every block (so they are in a linked list)
It must be possible to find and follow every reference
The mark has to be at a known displacement from the reference

Again, this is not compatible with arbitrary pointer arithmetic
24
Problems with mark and sweep




Mark-and-sweep is a complex algorithm that takes
substantial time
Unlike reference counting, it must be done all at once—
nothing else can be going on
The program stops responding during garbage
collection
This is unsuitable for many real-time applications
25
Garbage collection in Java



Java uses mark-and-sweep
Mark-and-sweep is highly reliable, but may cause
unexpected slowdowns
You can ask Java to do garbage collection at a time
you feel is more appropriate



The call is System.gc();
But not all implementations respect your request
This problem is known and is being worked on

There is also a “Real-time Specification for Java”
26
No garbage collection in C or C++


C and C++ do not have garbage collection—it is up to
the programmer to explicitly free storage when it is no
longer needed by the program
C and C++ have pointer arithmetic, which means that
pointers might point anywhere



There is no way to do reference counting if the programming
language does not have strict control over pointers
There is no way to do mark-and-sweep if the programming
language does not have strict control over pointers
Pointer arithmetic and garbage collection are
incompatible--it is essentially impossible to have both
27
The End
28