Computer Systems Principles Dynamic Memory Management Emery Berger and Mark Corner University of Massachusetts Amherst UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Dynamic Memory Management How the heap manager is implemented – malloc, free – new, delete UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2 Memory Management Programs ask memory manager – to allocate/free objects (or multiple pages) Memory manager asks OS – to allocate/free pages (or multiple pages) User Program Objects (new, malloc) Allocator(java, libc) Pages (mmap,brk) Operating System UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Memory Management Ideal memory manager: – Fast • Raw time, asymptotic runtime, locality – Memory efficient • Low fragmentation With multicore & multiprocessors: – Scalable to multiple processors New issues: – Secure from attack – Reliable in face of errors UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4 Memory Manager Functions Not just malloc/free – realloc • Change size of object, copying old contents – ptr = realloc (ptr, 10); • But: realloc(ptr, 0) = ? • How about: realloc (NULL, 16) ? Other fun – calloc – memalign Needs ability to locate size & object start UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5 Fragmentation Intuitively, fragmentation stems from “breaking” up heap into unusable spaces – More fragmentation = worse utilization External fragmentation – Wasted space outside allocated objects Internal fragmentation – Wasted space inside an object UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6 Classical Algorithms First-fit – find first chunk of desired size UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7 Classical Algorithms Best-fit – find chunk that fits best • Minimizes wasted space UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8 Classical Algorithms Worst-fit – find chunk that fits worst – name is a misnomer! – keeps large holes around Reclaim space: coalesce free adjacent objects into one big object UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9 Quick Activity Program asks for: 300,25,25,100 – First-fit and best-fit allocations go where? – Which ones cannot be fulfilled? What about: 110,54,25,70,50? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Implementation Techniques Freelists – Linked lists of objects in same size class • Range of object sizes First-fit, best-fit in this context? – Which is faster? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11 Implementation Techniques Segregated size classes – Use free lists, but never coalesce or split Choice of size classes – Exact – Powers-of-two UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12 Implementation Techniques Big Bag of Pages (BiBOP) – Page or pages (multiples of 4K) – Usually segregated size classes Header contains metadata – Locate with bitmasking Limits external fragmentation Can be very fast Secret Sauce for project – Use free objects to track free objects UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13 Runtime Analysis Key components – Cost of malloc (best, worst, average) – Cost of free – Cost of size lookup (for realloc & free) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14 Space Bounds Fragmentation worst-case for “optimal”: O(log M/m) – M = largest object size – m = smallest object size Best-fit = O(M * m) ! UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15 Performance Issues Goal: perform well for typical programs – Considerations: • Internal fragmentation • External fragmentation • Headers (metadata) • Scalability (later) • Reliability, too “Canned” allocator often seen as slow UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16 Custom Memory Allocation Programmers replace new/delete Reduce runtime – Often Expand functionality – Sometimes Very common Apache, gcc, lcc, STL, database servers… – Language-level support in C++ – Widely recommended Reduce space – rarely “Use custom allocators” UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17 Drawbacks of Custom Allocators Avoiding system allocator: – More code to maintain & debug – Can’t use memory debuggers – Not modular or robust: • Mix memory from custom and general-purpose allocators → crash! Increased burden on programmers UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18 (1) Per-Class Allocators Recycle freed objects from a free list a = new Class1; b = new Class1; c = new Class1; delete a; delete b; delete c; a = new Class1; b = new Class1; c = new Class1; Class1 free list + Fast + a + b Simple + + c - Linked list operations Identical semantics C++ language support Possibly space-inefficient UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19 (II) Custom Patterns Tailor-made to fit allocation patterns – Example: 197.parser (natural language parser) char[MEMORY_LIMIT] a db c end_of_array end_of_array end_of_array end_of_array end_of_array a = xalloc(8); b = xalloc(16); c = xalloc(8); xfree(b); xfree(c); d = xalloc(8); + Fast + Pointer-bumping allocation - Brittle - Fixed memory size - Requires stack-like lifetimes UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20 (III) Regions Separate areas, deletion only en masse regioncreate(r) regionmalloc(r, sz) r regiondelete(r) + + + + - Risky Fast Pointer-bumping allocation Deletion of chunks Convenient + - Dangling references - Too much space One call frees all memory Increasingly popular custom allocator UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21 Custom Allocators Are Faster… Runtime - Custom Allocator Benchmarks 1.75 Win32 non-regions 1.5 1.25 1 0.75 regions 0.5 0.25 ud ll e m lc c ap ac he 17 6. gc c 17 5. vp r br ee ze c- bo xe d- 19 7. pa r si m 0 se r Normalized Runtime Custom As good as and sometimes much faster than Win32 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22 Not So Fast… Runtime - Custom Allocator Benchmarks Custom Normalized Runtime 1.75 Win32 non-regions 1.5 DLmalloc regions 1.25 1 0.75 0.5 0.25 lle m ud c lc he ac ap 6. gc c 17 r 5. vp 17 e ee z cbr xe bo 19 7. pa rs e dsim r 0 DLmalloc (Linux): as fast or faster for most benchmarks UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23 Are custom allocators a win? Generally not worth the trouble – Just use good general-purpose allocator • Alternative: reaps (hybrid of regions & heaps) However… – Sometimes worth it for specialized apps • Especially pool allocation, as in Apache UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Problems w/Unsafe Languages C, C++: pervasive apps, but langs. unsafe Numerous opportunities for security vulnerabilities, errors – – – – – Double free Invalid free Uninitialized reads Dangling pointers Buffer overflows (stack & heap) Can memory allocator help? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Soundness for Erroneous Progs Normally: memory errors lead to crashes, but…consider infinite-heap allocator: – All news fresh; ignore delete • No dangling pointers, invalid frees, double frees – Every object infinitely large • No buffer overflows, data overwrites Transparent to correct program “Erroneous” programs sound UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Probabilistic Memory Safety Fully-randomized M-heap – Approximates with M, e.g., M=2 – Increases odds of benign errors – Probabilistic memory safety • i.e., P(no error) n – Errors independent across heaps • E(users with no error) n * |users| UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science DieHard Key ideas: – Isolate heap metadata – Randomize Allocation – Trade space for robustness – Replication (optional) Key influence in design of Windows 7’s FaultTolerant Heap UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Implementation Issues Conventional, freelist-based heaps – Hard to randomize, protect from errors • Double frees, heap corruption What about bitmaps? (one bit per word) – Catastrophic fragmentation! • Each small object likely to occupy one page obj obj obj pages UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science obj Randomized Heap Layout 00000001 1010 metadata heap Bitmap-based, segregated size classes – Bit represents one object of given size • i.e., one bit = 2i+3 bytes, etc. – Prevents fragmentation UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Randomized Allocation 00000001 1010 metadata heap malloc(8): – compute size class = ceil(log sz) – 3 – randomly probe bitmap for zero-bit (free) Fast: runtime O(1) – M=2 means E[# of probes] = 2 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Randomized Allocation 00010001 1010 metadata heap malloc(8): – compute size class = ceil(log sz) – 3 – randomly probe bitmap for zero-bit (free) Fast: runtime O(1) – M=2 means E[# of probes] = 2 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Randomized Deallocation 00010001 1010 metadata heap free(ptr): – Ensure object valid – aligned to right address – Ensure allocated – bit set – Resets bit Prevents invalid frees, double frees UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Randomized Deallocation 00010001 1010 metadata heap free(ptr): – Ensure object valid – aligned to right address – Ensure allocated – bit set – Resets bit Prevents invalid frees, double frees UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Randomized Deallocation 00000001 1010 metadata heap free(ptr): – Ensure object valid – aligned to right address – Ensure allocated – bit set – Resets bit Prevents invalid frees, double frees UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Randomized Heaps & Reliability object size = 2i+3 2 4 5 3 object size = 2i+4 1 6 … 3 My Mozilla: “malignant” overflow Objects randomly spread across heap Different run = different heap – Errors across heaps independent Your Mozilla: “benign” overflow 1 6 3 2 5 4 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 1 … Increasing Reliability Space Shuttle – 3 copies of everything (hw & sw) – Votes on every action Failure: majority rules UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 37 DieHard - Replication input seed1 replica1 seed2 replica2 output vote broadcast seed3 replica3 execute replicas (separate processes) Replication-based fault-tolerance – Requires randomization! Makes errors independent UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science DieHard Results Empirical results – Runtime overhead – Error avoidance • Injected faults & actual applications Analytical results (if time, pictures!) – Buffer overflows – Uninitialized reads – Dangling pointer errors (the best) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Buffer Overflows Model overflow as random write of live data Heap half full (max occupancy) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Buffer Overflows Model overflow as random write of live data Heap half full (max occupancy) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Buffer Overflows Model overflow: random write of live data Heap half full (max occupancy) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Overflows replicas Replicas: Increase odds of avoiding overflow in at least one replica UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Overflows replicas Replicas: Increase odds of avoiding overflow in at least one replica UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Overflows replicas Replicas: Increase odds of avoiding overflow in at least one replica P(Overflow in all replicas) = (½)3 = 1/8 P(No overflow in > 1 replica) = 1-(½)3 = 7/8 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Empirical Results: Runtime UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Analytical Results: Buffer Overflows F = free space H = heap size N = # objects worth of overflow k = replicas Overflow one object UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science Error Avoidance Injected faults: – Dangling pointers (@50%, 10 allocations) • glibc: crashes; DieHard: 9/10 correct – Overflows (@1%, 4 bytes over) – • glibc: crashes 9/10, inf loop; DieHard: 10/10 correct Real faults: – Avoids Squid web cache overflow • Crashes Boehm-Demers-Weiser(BDW) Collector & glibc – Avoids dangling pointer error in Mozilla • DoS in glibc & Windows UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science The End UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 49 Backup Slides UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 50 Lea Allocator (Dlmalloc 2.7.0) Mature general-purpose allocator Optimized for common allocation patterns – Per-size quicklists ≈ per-class allocation Deferred coalescing – combining adjacent free objects – Highly-optimized fastpath Space-efficient UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 51
© Copyright 2026 Paperzz