Upper Bound for Defragmenting Buddy Heaps - Rose

Upper Bound for Defragmenting
Buddy Heaps
Delvin C. Defoe
Sharath R. Cholleti, Ron K. Cytron
Distributed Object Computing Laboratory
Washington University
St. Louis, MO
Research funded by
DARPA under contract F33615-00-C-1697 and
The Chancellor's Graduate Fellowship Program at Washington University
LCTES Conference
Chicago, IL
June 15 – 17, 2005
Copyright © 2005 Delvin Defoe
Outline
 Motivation
 Buddy Allocator
 Storage requirements for defragmentation free buddy allocator
 Algorithm for on-demand defragmentation
 Experiments
 Conclusions and future work
2
Motivation
Constraints:
Program
Allocation and
Deallocation Requests
Allocator
Allocations and
Deallocations

Embedded:
Has small memory
footprint

Real-time:
Must respond to
storage-management
requests in reasonably
bounded time
Heap
Dynamic storage allocation occurs
3
Storage Allocation—Free List
 Linked list of free blocks
 Search for desired fit
 Search time O(n) for n blocks in the list
4
Buddy Allocator
 Knuth’s Binary Buddy Allocator
 Free lists segregated by size (2k)
 Reasonably bounded execution time
 For allocation
 For deallocation
128
64
32
16
8
Search time O(1) for n
available free blocks
4
2
1
5
Motivation - Fragmentation
Program
Allocation and
Deallocation Requests
The heap becomes
fragmented
There are enough free
bytes to satisfy an
allocation, but they are
not contiguous
Allocator
Allocations and
Deallocations
Heap
6
Fragmentation – Solutions?
 Use larger heap
 How big must it be to avoid fragmentation?
 We answer this question later
 Defragmentation
 Coalescing: merging neighboring free blocks to
form larger blocks - how effective?
 This occurs naturally in the buddy system
 Reduces the need for compaction
Heap
7
Fragmentation – Solutions?
Defragmentation
 Compaction: moving everything to one end of
the heap - how costly?
 Can be unbounded
Heap
8
Fragmentation – Solutions?
 Compaction can be unbounded
 Terrible for Embedded Systems
 Limited memory increases pressure on
allocator, leads to more fragmentation
 Terrible for Real-Time Systems
 Time and duration of compaction are
unpredictable, greatly complicating
scheduling
 We explore these issues further in
the context of the Buddy Allocator
9
Variants of Buddy Allocator
 We explore 2 variants of the Buddy
Allocator:
 Address Ordered Buddy Allocator
 Expedient for our proofs
 Address Ordered Best Fit Buddy Allocator
 More realistic—most buddy allocators operate this
way
 Both utilize lower addresses of the heap
10
Tree Representation of
Buddy Heap
16
8
8
4
4
2
1
2
1
1
free
4
2
1
1
2
1
1
4
2
1
1
allocated
2
1
1
2
1
1
2
1
1
1
not free
11
Address Ordered Buddy
Allocator
Suppose we want
a block of size
1 byte
16
8
4
8
4
4
4
2
1
free
allocated
2
1
1
1
not free
12
Address Ordered Buddy
Allocator
Split first available
4 byte block
16
8
8
Split
4
4
2
4
2
4
2
2
Allocated
1
free
1
1
allocated
1
1
1
not free
13
Address Ordered Best-Fit
Buddy Allocator
Suppose we want
a block of size
1 byte
16
8
4
8
4
4
4
2
1
free
allocated
2
1
1
1
not free
14
Address Ordered Best-Fit
Buddy Allocator
Use first available
1 byte block
16
8
4
8
4
4
2
Allocated
1
free
4
allocated
2
1
1
1
not free
15
Buddy Allocators
 Address Ordered Buddy Allocator
 Uses first fit allocation
 Used in our initial analysis
 Address Ordered Best-Fit Buddy Allocator
 Uses best fit allocation
 Used in our implementation
 Proofs extend to this variant as well
16
How much extra storage to avoid
defragmentation?
 Background :: Maxlive
 Max # of bytes program uses at any given time
during its life time
 Denoted by M
 Rounded to next power of 2
 Background :: Max-blocksize
 Size of the largest block the program can
allocate
 Denoted by n
 n<M
17
Tight Bound
 Consider an Address-Ordered Buddy
Allocator
 Maxlive : M
 Max-blocksize : n
n<M
 S(n)=M(log n + 2) / 2 bytes of storage is
necessary and sufficient to avoid
defragmentation
 log n = log2 n
18
Tight Bound (2)
 Let n  2
k
 Thus S (n)  M (log n  2)  M (k  2)
2
 When n = 1
2
M (0  2)
S (1) 
M
2
 When n = 2 S (2)  M (1  2)  3M
2
2
19
Proof
 Necessary component
 Show that there is a program that
requires S(n) = M(log n + 2) / 2 bytes of
storage
 Sufficiency component
 Show that the allocator does not need to
allocate an n-byte block beyond
S(n) = M(log n + 2) / 2 bytes in the heap
20
Proof Idea: Necessary
 M = 8 bytes
 M bytes are allocated
 S(1) bytes used
16
8
8
4
4
2
1
2
1
1
2
1
1
2
1
1
1
M
21
Proof Idea: Necessary
 M = 8 bytes
 Deallocate every other
block
 M/2 bytes allocated
 M/2 bytes available
16
8
8
4
4
2
1
2
1
1
2
1
1
2
1
1
1
M
22
Proof Idea: Necessary
 M = 8 bytes
 Allocate blocks of size 2 bytes
(k=1)
16
 M bytes allocated
 S(2) bytes used
8
8
4
4
2
1
2
1
1
4
2
1
1
M
2
1
1
2
4
2
1
M/2
23
Proof Idea: Necessary
 There is a program that requires
S(n) = M(log n + 2) / 2 bytes of
storage
 S(n) = M(log n + 2) / 2 bytes of
storage are necessary for an AddressOrdered Buddy allocator to avoid
defragmentation
24
Proof idea : Sufficiency




M = 8 bytes
6 bytes allocated
2 bytes available
Need to allocate 1 byte
Example 1
16
8
8
4
4
2
1
2
1
 Suppose S(1)
1
2
1
1
2
1
1
1
M
25
Proof idea : Sufficiency




M = 8 bytes
7 bytes allocated
1 bytes available
1 byte block allocated
within S(1) bytes
Example 1
16
8
8
4
4
2
1
2
1
 Suppose S(1)
1
2
1
1
2
1
1
1
M
26
Proof idea : Sufficiency




M = 8 bytes
6 bytes allocated
2 bytes available
Need to allocate 2 byte
block
Example 2
 Suppose S(2)
16
8
8
4
4
2
1
2
1
1
4
2
1
1
M
2
1
1
2
4
2
1
M/2
27
Proof idea : Sufficiency
 M = 8 bytes
 M bytes allocated
 2 byte block allocated
within first S(2) bytes
Example 2
 Suppose S(2)
16
8
8
4
4
2
1
2
1
1
4
2
1
1
M
2
1
1
2
4
2
1
M/2
28
Proof idea : Sufficiency
 The allocator does not need to
allocate an n-byte block beyond
S(n) = M(log n + 2) / 2 bytes in the
heap
 S(n) = M(log n + 2) / 2 bytes of
storage are sufficient for an AddressOrdered Buddy allocator to avoid
defragmentation
29
Summary
 For Address Ordered Buddy Allocator
 S(n) = M * (log n + 2) / 2 bytes of storage
are necessary and sufficient for avoiding
defragmentation
 Same is true for realistic (best-fit)
buddy allocator
 Show that any buddy allocator can be
made to behave like an address ordered
one
30
Proof Idea: All Buddy Allocators
 Show that any buddy allocator can
be made to behave as an Address
Ordered Buddy Allocator
16
 Need to allocate 2 byte
block
 Pick any buddy
allocator A
8
4
4
2
1
2
1
1
4
2
1
1
M
2
1
8
1
2
4
2
1
M/2
31
Proof Idea: All Buddy Allocators
 Show that any buddy allocator can
be made to behave as an Address
Ordered Buddy Allocator
16
 Need to allocate 2 byte
block
 Pick any buddy
allocator A
8
4
4
2
1
2
1
1
4
2
1
1
M
2
1
8
1
2
A
allocate
2
2
4
2
1
M/2
M/2 ???
32
Proof Idea: All Buddy Allocators
 Show that any buddy allocator can
be made to behave as an Address
Ordered Buddy Allocator
16
 Need to allocate 2 byte
block
 Pick any buddy
allocator A
8
4
2
1
2
1
allocate
4
1
2
1
1
M
2
1
1
8
X
4
2
4
2
1
M/2
M/2 ???
33
Summary
 We can force any buddy allocator to
behave like an Address Ordered
Buddy Allocator
 For any Buddy Allocator
 S(n) = M(log n + 2) / 2 bytes of storage
are necessary and sufficient for avoiding
defragmentation
34
Research Questions
 How big should the heap be to avoid
defragmentation? Answer:
 M(log
n + 2) / 2
 How big does this get in practice?
Max-blocksize (n)
Inflation of heap
KB
≥6
MB
≥ 11
GB
≥ 16
35
Research Questions
 What if we have a more reasonable
sized heap - 2M?
 How do we defragment on-demand
to satisfy a single allocation request?
 NP-hard
 Use greedy heuristic
36
Greedy Heuristic for 2M Heap
 Select minimally occupied block
 Naïve search time is O(M)
 Relocate data to a partially occupied
block
 Reduce search time using heap
manager algorithm
 Tree data structure keeps track of live bytes
in sub-tree rooted at each node
 Search time becomes O(M/s) for an s-byte
block
 Nodes updated with allocation/deallocation
37
On-Demand Defragmentation
 M = 16 bytes
 Need to allocate an 8-byte block
 No free 8-byte
block available
32
16
8
4
2
1
3 bytes
2 bytes
1 bytes
2 bytes
38
On-demand Defragmentation
 M = 16 bytes
 Need to allocate an 8-byte block
 relocate from min occupied
chunk
 8 byte block now
available
32
16
8
4
2
1
4 bytes
2 bytes
0 bytes
2 bytes
39
On-demand Defragmentation
 M = 16 bytes
 8-byte block allocated
32
16
8
4
2
1
4 bytes
2 bytes
8 bytes
2 bytes
40
Worst Case Relocation
 Amount storage relocated in worstcase to get a free block of s bytes
 < s bytes
 Total cost of using the Heap Manager
Algorithm to make s-byte block
available
 Storage Bound : O(M)
 Time Bound : O(M s0.695)
41
Relocation in Practice : 2M Heap
 Theorem:
 Greedy algorithm of selecting a
minimally occupied chunk for
relocation moves less than twice
the storage an optimal algorithm
would move
 Proof in Paper
42
Experiments
 SPEC JVM 98 Benchmarks
 2M heap : no relocation was needed
 M heap : 40 % of programs needed
relocation
 Compared algorithm with
 Left-first compaction
 Right-first compaction
43
Experiments
Java Benchmarks
94.38
100
90
84.96
81.26
Kilo Bytes Relocated
80
70
63.78
60
Relocation With Our Approach
Left First Compaction
50
40
30
24.79
20.21
20
0.06
0.06
1.54
0.30
0.01
0.01
10
0
check1
db1
jess1
javac10
jess10
jess100
Programs(size)
44
Experiments
Java Benchmarks
7.97
9
8
6.56
6
5
Relocation With Our Approach
Right First Compaction
4
3
1.54
Kilo Bytes Relocated
7
2
0.17
0.06
0.47
0.06
0.30
0.03
0.01
0.03
0.01
1
0
check1
db1
jess1
javac10
jess10
jess100
Programs(size)
45
Conclusions and Future work
 Tight upper bound for buddy allocator
 On-demand defragmentation
 Experimentation:
 2M heap is good enough in practice
 Out-performs extant compaction methods
 Implement in commercial JVM
 Explore tradeoffs between
bookkeeping and bytes moved for
reasonably bounded relocation
46
Questions?
47