GC - Piazza

Dynamic Compilation
Vijay Janapa Reddi
The University of Texas at Austin
Garbage Collection 2
Today

Garbage Collection
» Why use garbage collection?
» What is garbage?
 Reachable vs live, stack maps, etc.
» Allocators and their collection mechanisms
 Semispace
 Marksweep
 Performance comparisons
» Incremental age based collection
 Enabling mechanisms

write barrier & remembered sets
 Heap organizations


Generational
Beltway
» Performance comparisons
-2-
One Big Heap?
Pause times
» it takes to long to trace the whole heap at once
Throughput
» the heap contains lots of long lived objects, why collect them
over and over again?
Incremental collection
» divide up the heap into increments and collect one at a time.
to space
from space
Increment 1
to space
-3-
from space
Increment 2
Incremental Collection
Ideally
 perfect pointer knowledge of live pointers between
increments
 requires scanning whole heap, defeats the purpose
to space
from space
Increment 1
to space
-4-
from space
Increment 2
Incremental Collection
Ideally
 perfect pointer knowledge of live pointers between
increments
 requires scanning whole heap, defeats the purpose
to space
from space
Increment 1
to space
-5-
from space
Increment 2
Incremental Collection
Ideally
 perfect pointer knowledge of live pointers between
increments
 requires scanning whole heap, defeats the purpose
to space
from space
Increment 1
to space
-6-
from space
Increment 2
Incremental Collection
Ideally
 perfect pointer knowledge of live pointers between
increments
 requires scanning whole heap, defeats the purpose
Mechanism: Write barrier
 records pointers between increments when the mutator
installs them, conservative approximation of reachability
to space
from space
Increment 1
to space
-7-
from space
Increment 2
Write barrier
compiler inserts code that records pointers between increments
when the mutator installs them
// original program
p.f = o;
// compiler support for incremental collection
if (incr(p) != incr(o) {
remembered set (incr(o)) U p.f;
}
p.f = o;
remset1 ={w}
a b c
remset2 ={f,g}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
-8-
from space
Increment 2
Write barrier
Install new pointer d -> v
// original program
p.f = o;
// compiler support for incremental collection
if (incr(p) != incr(o) {
remembered set (incr(o)) U p.f;
}
p.f = o;
remset1 ={w}
a b c
remset2 ={f,g}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
-9-
from space
Increment 2
Write barrier
Install new pointer d -> v, then update d-> y
// original program
p.f = o;
// compiler support for incremental collection
if (incr(p) != incr(o) {
remembered set (incr(o)) = p.f;
}
p.f = o;
remset1 ={w}
a b c
remset2 ={f,g,d}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
- 10 -
from space
Increment 2
Write barrier
Install new pointer d -> v, then update d-> y
// original program
p.f = o;
// compiler support for incremental collection
if (incr(p) != incr(o) {
remembered set (incr(o)) = p.f;
}
p.f = o;
remset1 ={w}
a b c
remset2 ={f,g,d,d}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
- 11 -
from space
Increment 2
Write barrier
At collection time
 collector re-examines all entries in the remset for
the increment, treating them like roots
 Collect Increment 2
remset1 ={w}
a b c
remset2 ={f,g,d,d}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
- 12 -
from space
Increment 2
Write barrier
At collection time
 collector re-examines all entries in the remset for
the increment, treating them like roots
 Collect Increment 2
remset1 ={w}
a b c
remset2 ={f,g,d,d}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
- 13 -
from space
Increment 2
Summary of the costs of
incremental collection




write barrier to catch pointer stores crossing
boundaries
remsets to store crossing pointers
processing remembered sets at collection time
excess retention
remset1 ={w}
a b c
remset2 ={f,g,d,d}
d e f g
to space
t
from space
Increment 1
u v w
x y z
to space
- 14 -
from space
Increment 2
Heap Organization
What objects should we put where?
 Generational hypothesis
» young objects die more quickly than older ones [Lieberman &
Hewitt’83, Ungar’84]
» most pointers are from younger to older objects [Appel’89,
Zorn’90]

Organize the heap in to young and old, collect young objects
preferentially
to space
to space
Young
- 15 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 16 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 17 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 18 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 19 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 20 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 21 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 22 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 23 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 24 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 25 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
to space
Young
- 26 -
from space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
Young
from space
- 27 -
to space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces - ignore remembered sets
» Generalizing to m generations
 if space n < m fills up, collect n through n-1
to space
Young
from space
- 28 -
to space
Old
Generational Heap Organization



Divide the heap in to two spaces: young and old
Allocate in to the young space
When the young space fills up,
» collect it, copying into the old space

When the old space fills up
» collect both spaces - ignore remembered sets
» Generalizing to m generations
 if space n < m fills up, collect 1 through n-1
to space
Young
from space
- 29 -
to space
Old
Generational Write Barrier
Unidirectional barrier
 record only older to younger pointers
 no need to record younger to older pointers, since we
never collect the old space independently
• most pointers are from younger to older objects [Appel’89,
Zorn’90]
• track the barrier between young objects and old spaces
address
barrier
to space
Young


to space
- 30 -
from space
Old
Generational Write Barrier
unidirectional boundary barrier
// original program
p.f = o;
// compiler support for incremental collection
if (p > barrier && o < barrier) {
remsetnursery U p.f;
}
p.f = o;

to space
Young

to space
- 31 -
from space
Old
Generational
Write Barrier
Unidirectional
 record only older to younger pointers
 no need to record younger to older pointers, since we
never collect the old space independently
» most pointers are from younger to older objects [Appel’89,
Zorn’90]
» most mutations are to young objects [Stefanovic et al.’99]

to space
Young

to space
- 32 -
from space
Old
Results
- 33 -
Garbage Collection Time
181
MarkSweep
GC Time (Normalized t o Best )
161
SemiSpace
141
GenMS
GenCopy
121
101
81
61
41
21
1
1
2
3
4
Heap Size (Relat ive t o Min)
- 34 -
5
6
Mutator Time
1.2
MarkSweep
Mut at or Time (Normalized t o Best )
1.18
SemiSpace
1.16
GenMS
1.14
GenCopy
1.12
1.1
1.08
1.06
1.04
1.02
1
1
2
3
4
Heap- 35
Size- (Relat ive t o Min)
5
6
Total Time
3
MarkSweep
2.8
SemiSpace
Time (Normalized t o Best )
2.6
GenMS
2.4
GenCopy
2.2
2
1.8
1.6
1.4
1.2
1
1
2
3
- 36 -
4
Heap Size (Relat ive t o Min)
5
6
Recap



Copying improves locality
Incrementality improves responsiveness
Generational hypothesis
» Young objects: Most very short lived
 Infant mortality: ~90% die young (within 4MB of alloc)
» Old objects: most very long lived (bimodal)
 Mature morality: ~5% die each 4MB of new allocation

Help from pointer mutations
» In Java, pointers go in both directions, but older to younger
pointers across many objects are rare
 less than 1%
» Most mutations among young objects
 92 to 98% of pointer mutations
- 37 -
Moving objects in copying collectors


Semispace/copying collectors are nice in that they
provide good locality benefits
Mark-sweep collectors often will also have a “compact”
phase, where all live objects can be squeezed together
if fragmentation gets out of control
» Or they are simply a “Mark-compact” collector that does the
compaction at every invocation of the garbage collector

Net result: A mechanism to relocate objects must
generally exist
- 38 -
Moving an Object

To move an object we must:
» Copy the object data
» Ensure that all pointers to this object are updated to point to the
new location

The semispace collector does this in one pass
» Cheney’s algorithm does a breadth-first traversal allocating the
to-space object whenever a pointer to it is first encountered

A Mark-compact collector will:
» Compute each objects post compaction address during the mark
phase and store the forwarding pointer as additional metadata
with each object
» Update pointers during the copying process in the compact phase
- 39 -
Moving an Object

To move an object we must:
» Copy the object data
» Ensure that all pointers to this object are updated to point to the
new location

The semispace collector does this in one pass
» Cheney’s algorithm does a breadth-first traversal allocating the
to-space object
pointerthe
to it
is first
Don’t whenever
forget toaupdate
root
set encountered

(registers,
stack,
globals, etc.)!!!
A Mark-compact
collector
will:
» Compute each objects post compaction address during the mark
phase and store the forwarding pointer as additional metadata
with each object
» Update pointers during the copying process in the compact phase
- 40 -
Moving an Object

What about generational schemes?
» The remembered set for a generation lists all the objects
that point to the objects that are potentially relocated, and
these must be updated as well

What about conservative garbage collectors?
» You can’t relocate an object because you can’t update all the
memory locations that might reference the object since they
could just be “unfortunate integers”, right?
- 41 -
Moving an Object

What about generational schemes?
» The remembered set for a generation lists all the objects
that point to the objects that are potentially relocated, and
these must be updated as well

What about
conservative
garbage
collectors?
"All problems
in computer
science
can
be solved
by another
level
of indirection"
» You can’t
relocate
an object
because
you can’t update all the
David
Wheeler
memory locations- that
might
reference the object since they
(first ever PhD
in CS) right?
could just be “unfortunate
integers”,
- 42 -
Direct Object Reference

Typical Direct Object Reference (pointer)
Object Reference
Object Type
Instance Variable 1
Instance Variable 2
Instance Variable 3
- 43 -
Handles

Object Handle (pointer to a pointer)
» Another level of indirection to solve all problems
Object Handle
Object Type
Object Reference
Instance Variable 1
Instance Variable 2
Instance Variable 3
- 44 -
Handles

All object references in the heap are handles
» The Handle Table holds the indirection pointer
Handle Table
Heap
- 45 -
Moving an Object

What about conservative garbage collectors?
» If all object references are handles, then the object can be
relocated in the heap by updating one pointer in the Handle
Table.
» Side benefit: The potential set of “unfortunate integers” that
can keep an object alive by appearing to reference it is
reduced to the set of addresses in the handle table (vs. the
set of addresses anywhere in the heap)
» Note: If the programmer cheats and does their own pointer
arithmetic or holds onto a copy of an indirection table pointer,
then the garbage collector won’t work properly.
» This also means in the register file
- 46 -
More about Write Barriers

Compiler inserts code at every store of a pointer
» Detect relevant generation or region crossing
» If true, add to the remembered set of the target

Costs:
» Conditional Branch on detecting barrier condition
» Adding to a remembered set:







Calculation of address of given target region’s remembered set
Load (of remembered set tail pointer)
Store (of source object address)
Check if remembered set is full (another conditional branch)
If full, trigger collection of the region
Otherwise, increment remembered set tail pointer
And store if back
- 47 -
Imprecise Remembered Sets

Adding to a precise remembered set is costly

Can trade off mutator time for collection time
» Keep an imprecise remembered set that tells the collector
where to look for barrier-crossing pointers
» Need to make sure collector will find all barrier-crossers, but
it might end up looking at some non-crossing pointers, also

Would also be nice to not worry about remembered set
capacity checks at each pointer store
- 48 -
Card Marking

“Cards” are just finer-granularity pages (e.g. 128B)
» E.g. each 4KB page has a 32-bit word associated with it

On every barrier-crossing store, set the bit
corresponding to the source
- 49 -
Card Marking

“Cards” are just finer-granularity pages (e.g. 128B)
» E.g. each 4KB page has a 32-bit word associated with it

On every barrier-crossing store, set the bit
corresponding to the source
- 50 -
Card Marking

“Cards” are just finer-granularity pages (e.g. 128B)
» E.g. each 4KB page has a 32-bit word associated with it

On every barrier-crossing store, set the bit
corresponding to the source
- 51 -
Card Marking


At collection time, the marked cards are scanned, and
any pointers in those cards that refer to the region
being collected are added to the root set
Note, that some pointers being scanned in a card are
not barrier crossers.
- 52 -
Card Marking vs Precise Remembered Sets

Precise write barrier check at each store
» Conditional Branch on detecting barrier condition
» Adding to a remembered set:








Calculation of address of given target region’s remembered set
Load (of remembered set tail pointer)
Store (of source object address)
Check if remembered set is full (another conditional branch)
If full, trigger collection of the region
Otherwise, increment remembered set tail pointer
And store tail pointer back
Card Marking
» Conditional Branch on detecting barrier condition
» Adding to a remembered set:




Calculation of address of given target region’s card mask
Load (of card mask)
OR bit corresponding to effective address of store
Store (of card mask)
- 53 -
Card Marking vs Precise Remembered Sets

Card Marking
» Conditional Branch on detecting barrier condition
» Adding to a remembered set:


Calculation of address of given target region’s card mask

Load (of card mask)

OR bit corresponding to effective address of store

Store (of card mask)
Adding to the remembered set is cheap, but there are
costs at collector time
» Cost proportional to number of cards in all *other* increments
than the one being collected
- 54 -
Card Marking

Need to know where the pointers in each card are
located
» Gets tricky with objects that span cards
» If you can guarantee object headers are start-of-card
aligned, can mark cards corresponding to the start of an
object and always scan the entire object

What about big objects (e.g. array of pointers)?
- 55 -
Card Marking

Need to know where the pointers in each card are
located
» Gets tricky with objects that span cards
» If you can guarantee object headers are start-of-card
aligned, can mark cards corresponding to the start of an
Or treat
like a pointer,
object and
alwayseverything
scan the entire
object
and big
do objects
conservative
collection
 What about
(e.g. array
of pointers)?
- 56 -
Card Marking

Card Marking
» Pros:

No remembered set full condition means that you don’t have to
expect that garbage collection might be triggered at every
pointer store


Means compiler/runtime doesn’t have to compute as many stack pointer maps
Deals well with multiple stores to the same object
» Cons:

More work at collection time to scan marked cards for pointers

If used in a generational collector, the “old” generation is
typically very large relative to the nursery, so scanning card masks
for entire old generation can be time consuming
- 57 -
Page protection tricks





Both write-barrier schemes discussed require a check
to determine if a pointer that crosses a generation
boundary is being stored
Some systems use page protection to write protect
the old generation
On a write-protection fault, the exception handler can
determine whether there is a barrier-crossing pointer
Eliminates the checking code at each pointer store
But exception is expensive and catches non-pointer
stores as well
» Typically only used in conservative systems where *all* stores
are potentially pointer stores
- 58 -
picoJava tricks

Processor designed to directly execute Java bytecode
» ISA treats pointers as a native data type
» We know based on the opcode if we are storing a pointer


Added hardware support for garbage collection
Wanted to enable variety of GC algorithms, not
specify a specific one that must be used
» As we’ve seen there are many tradeoffs to be made
- 59 -
picoJava tricks

Three main GC hardware features:
» Optional native support for handles

A pointer with its least significant bit set was a handle

All instructions could deal with direct and handled references
» Page-based Write Barrier

A GC trap occurs if a pointer store occurs where the pointer value
stored is in a different “GC page” than the target of the store

GC page size was configurable from 8KB to 128KB

This could be enabled independently of the tag-based barrier

i.e. could use both, neither, or either
» Tag-based Write Barrier

<next slide>
- 60 -
picoJava tricks

Tag-based Write Barrier
»
Each reference (pointer) had the most-significant two bits reserved as a tag
»
Tags of store data and target object were concatenated, and used to index
table indicating whether to trap
- 61 -
picoJava tricks

Tag-based Write Barrier
»
Each reference (pointer) had the most-significant two bits reserved as a tag
»
Tags of store data and target object were concatenated, and used to index
table indicating whether to trap
Typically, tags indicated generation
Table encoded trapping for stores to
older objects pointing to younger
objects.
- 62 -
Bounded Pause Time/Concurrent Collection

So far we’ve discussed “stop the world” collectors
» The program execution stops while the collector operates
» Incremental / Generational approaches can make the periods
of collector operation shorter, on average

What if the garbage collector is interruptible by the
program/mutator?
» The GC only gets a fixed amount of time to make as much
progress as it can before program execution resumes
 Typically desirable for interactive systems

What if the garbage collector runs concurrently (on
another core)?
- 63 -
Bounded Pause Time/Concurrent Collection

Two main problems:
» Knowing when to revisit an object that is updated after you
have already visited it during collection
 Conceptually, the root-set & remembered set can be thought of as
objects as well in this manner
» When copying objects, how to manage operations potentially
happening on the object while being relocated?

These are hard problems to efficiently solve
» Esp. without hardware support
» One (easier) problem is parallel garbage collection which is
performing the GC on multiple cores in parallel to make the
pauses shorter…
- 64 -
If we made it to this slide…

I can talk for the rest of the time about picoJava HW
support for bounded pause time GC.
» During certain phases, flip the write barrier table to catch all
pointer stores
» Store objects above a certain size as handles
 PJ3 was going to support selective write mirroring for partially
relocated objects…
- 65 -