Multiprocessor Memory Allocation

Computer Systems Principles
Dynamic Memory Management
Emery Berger and Mark Corner
University of Massachusetts
Amherst
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Dynamic Memory Management
 How the heap manager is implemented
– malloc, free
– new, delete
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
2
Memory Management
 Programs ask memory manager
– to allocate/free objects (or multiple pages)
 Memory manager asks OS
– to allocate/free pages (or multiple pages)
User Program
Objects (new, malloc)
Allocator(java, libc)
Pages (mmap,brk)
Operating System
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Memory Management
 Ideal memory manager:
– Fast
• Raw time, asymptotic runtime, locality
– Memory efficient
• Low fragmentation
 With multicore & multiprocessors:
– Scalable to multiple processors
 New issues:
– Secure from attack
– Reliable in face of errors
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
4
Memory Manager Functions
 Not just malloc/free
– realloc
• Change size of object, copying old contents
– ptr = realloc (ptr, 10);
• But: realloc(ptr, 0) = ?
• How about: realloc (NULL, 16) ?
 Other fun
– calloc
– memalign
 Needs ability to locate size & object start
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
5
Fragmentation
 Intuitively, fragmentation stems from
“breaking” up heap into unusable spaces
– More fragmentation = worse utilization
 External fragmentation
– Wasted space outside allocated objects
 Internal fragmentation
– Wasted space inside an object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
6
Classical Algorithms
 First-fit
– find first chunk of desired size
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
7
Classical Algorithms
 Best-fit
– find chunk that fits best
• Minimizes wasted space
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
8
Classical Algorithms
 Worst-fit
– find chunk that fits worst
– name is a misnomer!
– keeps large holes around
 Reclaim space: coalesce free adjacent
objects into one big object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
9
Quick Activity
 Program asks for: 300,25,25,100
– First-fit and best-fit allocations go where?
– Which ones cannot be fulfilled?
 What about: 110,54,25,70,50?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Implementation Techniques
 Freelists
– Linked lists of objects in same size class
• Range of object sizes
 First-fit, best-fit in this context?
– Which is faster?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
11
Implementation Techniques
 Segregated size classes
– Use free lists, but never coalesce or split
 Choice of size classes
– Exact
– Powers-of-two
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
12
Implementation Techniques
 Big Bag of Pages (BiBOP)
– Page or pages (multiples of 4K)
– Usually segregated size classes
 Header contains metadata
– Locate with bitmasking
 Limits external fragmentation
 Can be very fast
 Secret Sauce for project
– Use free objects to track free objects
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
13
Runtime Analysis
 Key components
– Cost of malloc (best, worst, average)
– Cost of free
– Cost of size lookup (for realloc & free)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
14
Space Bounds
 Fragmentation worst-case for “optimal”:
O(log M/m)
– M = largest object size
– m = smallest object size
 Best-fit = O(M * m) !
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
15
Performance Issues
 Goal: perform well for typical programs
– Considerations:
• Internal fragmentation
• External fragmentation
• Headers (metadata)
• Scalability (later)
• Reliability, too
 “Canned” allocator often seen as slow
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
16
Custom Memory Allocation
 Programmers replace
new/delete
 Reduce runtime
– Often
 Expand functionality
– Sometimes
 Very common
 Apache, gcc, lcc, STL,
database servers…
– Language-level
support in C++
– Widely recommended
 Reduce space
– rarely
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
17
Drawbacks of Custom Allocators
 Avoiding system allocator:
– More code to maintain & debug
– Can’t use memory debuggers
– Not modular or robust:
• Mix memory from custom
and general-purpose allocators → crash!
 Increased burden on programmers
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
18
(1) Per-Class Allocators
 Recycle freed objects from a free list
a = new Class1;
b = new Class1;
c = new Class1;
delete a;
delete b;
delete c;
a = new Class1;
b = new Class1;
c = new Class1;
Class1
free list
+
Fast
+
a
+
b
Simple
+
+
c
-
Linked list operations
Identical semantics
C++ language support
Possibly space-inefficient
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
19
(II) Custom Patterns
 Tailor-made to fit allocation patterns
– Example: 197.parser (natural language parser)
char[MEMORY_LIMIT]
a
db
c
end_of_array
end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8);
b = xalloc(16);
c = xalloc(8);
xfree(b);
xfree(c);
d = xalloc(8);
+
Fast
+
Pointer-bumping allocation
- Brittle
- Fixed memory size
- Requires stack-like lifetimes
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
20
(III) Regions

Separate areas, deletion only en masse
regioncreate(r)
regionmalloc(r, sz)
r
regiondelete(r)
+
+
+
+
- Risky
Fast
Pointer-bumping allocation
Deletion of chunks
Convenient
+

- Dangling
references
- Too much space
One call frees all memory
Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
21
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
1.75
Win32
non-regions
1.5
1.25
1
0.75
regions
0.5
0.25
ud
ll e
m
lc
c
ap
ac
he
17
6.
gc
c
17
5.
vp
r
br
ee
ze
c-
bo
xe
d-
19
7.
pa
r
si
m
0
se
r
Normalized Runtime
Custom
 As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
22
Not So Fast…
Runtime - Custom Allocator Benchmarks
Custom
Normalized Runtime
1.75
Win32
non-regions
1.5
DLmalloc
regions
1.25
1
0.75
0.5
0.25
lle
m
ud
c
lc
he
ac
ap
6.
gc
c
17
r
5.
vp
17
e
ee
z
cbr
xe
bo
19
7.
pa
rs
e
dsim
r
0
 DLmalloc (Linux): as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
23
Are custom allocators a win?
 Generally not worth the trouble
– Just use good general-purpose allocator
• Alternative: reaps (hybrid of regions & heaps)
 However…
– Sometimes worth it for specialized apps
• Especially pool allocation, as in Apache
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Problems w/Unsafe Languages
 C, C++: pervasive apps, but langs. unsafe
 Numerous opportunities for security
vulnerabilities, errors
–
–
–
–
–
Double free
Invalid free
Uninitialized reads
Dangling pointers
Buffer overflows (stack & heap)
 Can memory allocator help?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Soundness for Erroneous Progs
 Normally: memory errors lead to crashes,
but…consider infinite-heap allocator:
– All news fresh; ignore delete
• No dangling pointers, invalid frees,
double frees
– Every object infinitely large
• No buffer overflows, data overwrites
 Transparent to correct program
 “Erroneous” programs sound
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Probabilistic Memory Safety
 Fully-randomized M-heap
– Approximates  with M, e.g., M=2
– Increases odds of benign errors
– Probabilistic memory safety
• i.e., P(no error)  n
– Errors independent across heaps
• E(users with no error)  n * |users|
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
DieHard
 Key ideas:
– Isolate heap metadata
– Randomize Allocation
– Trade space for
robustness
– Replication (optional)
 Key influence in design
of Windows 7’s FaultTolerant Heap
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Implementation Issues
 Conventional, freelist-based heaps
– Hard to randomize, protect from errors
• Double frees, heap corruption
 What about bitmaps? (one bit per word)
– Catastrophic fragmentation!
• Each small object likely to occupy one page
obj
obj
obj
pages
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
obj
Randomized Heap Layout
00000001
1010
metadata
heap
 Bitmap-based, segregated size classes
– Bit represents one object of given size
• i.e., one bit = 2i+3 bytes, etc.
– Prevents fragmentation
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Allocation
00000001
1010
metadata
heap
 malloc(8):
– compute size class = ceil(log sz) – 3
– randomly probe bitmap for zero-bit (free)
 Fast: runtime O(1)
– M=2 means E[# of probes] = 2
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Allocation
00010001
1010
metadata
heap
 malloc(8):
– compute size class = ceil(log sz) – 3
– randomly probe bitmap for zero-bit (free)
 Fast: runtime O(1)
– M=2 means E[# of probes] = 2
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Deallocation
00010001
1010
metadata
heap
 free(ptr):
– Ensure object valid – aligned to right address
– Ensure allocated – bit set
– Resets bit
 Prevents invalid frees, double frees
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Deallocation
00010001
1010
metadata
heap
 free(ptr):
– Ensure object valid – aligned to right address
– Ensure allocated – bit set
– Resets bit
 Prevents invalid frees, double frees
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Deallocation
00000001
1010
metadata
heap
 free(ptr):
– Ensure object valid – aligned to right address
– Ensure allocated – bit set
– Resets bit
 Prevents invalid frees, double frees
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Heaps & Reliability
object size = 2i+3
2 4
5
3
object size = 2i+4
1
6
…
3
My Mozilla: “malignant” overflow
 Objects randomly spread across heap
 Different run = different heap
– Errors across heaps independent
Your Mozilla: “benign” overflow
1
6
3
2
5 4
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
1
…
Increasing Reliability
 Space Shuttle
– 3 copies of
everything
(hw & sw)
– Votes on every
action
 Failure:
majority rules
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
37
DieHard - Replication
input
seed1
replica1
seed2
replica2
output
vote
broadcast
seed3
replica3
execute replicas
(separate
processes)
 Replication-based fault-tolerance
– Requires randomization! Makes errors independent
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
DieHard Results
 Empirical results
– Runtime overhead
– Error avoidance
• Injected faults & actual applications
 Analytical results (if time, pictures!)
– Buffer overflows
– Uninitialized reads
– Dangling pointer errors (the best)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows

Model overflow as random write of live data

Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows

Model overflow as random write of live data

Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows

Model overflow: random write of live data

Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Overflows
replicas
 Replicas: Increase odds of avoiding overflow in
at least one replica
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Overflows
replicas
 Replicas: Increase odds of avoiding overflow in
at least one replica
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Overflows
replicas
 Replicas: Increase odds of avoiding overflow in at least
one replica
 P(Overflow in all replicas) = (½)3 = 1/8
 P(No overflow in > 1 replica) = 1-(½)3 = 7/8
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Empirical Results: Runtime
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows




F = free space
H = heap size
N = # objects
worth of
overflow
k = replicas

Overflow one object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Error Avoidance
 Injected faults:
– Dangling pointers (@50%, 10 allocations)
• glibc: crashes; DieHard: 9/10 correct
– Overflows (@1%, 4 bytes over) –
• glibc: crashes 9/10, inf loop; DieHard: 10/10 correct
 Real faults:
– Avoids Squid web cache overflow
• Crashes Boehm-Demers-Weiser(BDW) Collector & glibc
– Avoids dangling pointer error in Mozilla
• DoS in glibc & Windows
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
The End
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
49
Backup Slides
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
50
Lea Allocator (Dlmalloc 2.7.0)
 Mature general-purpose allocator
 Optimized for common allocation
patterns
– Per-size quicklists ≈ per-class allocation
 Deferred coalescing
– combining adjacent free objects
– Highly-optimized fastpath
 Space-efficient
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
51