Practical, transparent operating system support for superpages

Practical, transparent operating
system support for superpages
Juan Navarro, Sitaram Iyer, Peter
Druschel, Alan Cox
(Rice University)
Appears in: Fifth Symposium on Operating
Systems Design and Implementation
(OSDI 2002)
Presented by: David R. Choffnes
CS 443 Advanced OS
David R. Choffnes, Spring 2005
Outline
The superpage problem
Related Approaches
Design
Implementation
Evaluation
Conclusion
2
Introduction
TLB coverage
– Definition
– Effect on performance
Superpages
– Wasted memory
– Fragmentation
Contribution
–
–
–
–
General, transparent superpages
Deals with fragmentation
Contiguity-aware page replacement algo
Demotion/Eviction of dirty superpages
3
The Superpage Problem
TLB coverage trend
TLB coverage of % of main memory
Factor of 1000
decrease in
15 years
TLB miss
overhead:
5%
30%
5-10%
4
The Superpage Problem
Increasing TLB coverage
– More TLB entries is expensive
– Larger page size leads to internal fragmentation
and increased I/O
– Solution: use multiple page sizes
Superpage definition
Hardware-imposed constraints
– Finite set of page sizes (subset of powers of 2)
– Contiguity
– Alignment
5
A superpage TLB
Alpha:
8,64,512KB; 4MB
virtual memory
base page entry (size=1)
virtual
address
superpage entry (size=4)
Itanium:
4,8,16,64,256KB;
1,4,16,64,256MB
physical
address
TLB
physical memory
6
Superpage Issues and Tradeoffs
Allocation
– Relocation
– Reservation
7
Issue 1: superpage allocation
A
B
C
virtual memory
D
superpage boundaries
B
D
A
A
B
C
D
C
physical memory
 How / when / what size to allocate?
8
Superpage Issues (Cont.)
Promotion
– Incremental
– Timing (not too soon, not too late)
Demotion and Eviction
– Hardware reference and dirty bit limitation
9
Issue 2: promotion
Promotion: create a superpage out of a set of
smaller pages
– mark page table entry of each base page
When to promote?
Wait for app to touch pages? May
Create small superpage?
Forcibly populate pages?
lose opportunity to increase TLB
May waste May
overhead.
cause internal fragmentation.
coverage.
10
Superpage Issues: Fragmentation
Fragmentation
– Memory becomes fragmented due to
• use of multiple page sizes
• persistence of file cache pages
• scattered wired (non-pageable) pages
– Contiguity as contended resource
11
Related Approaches
HP-UX and IRIX Reservations
– Not transparent
Page Relocation
– Used exclusively, leads to lower performance due
to increased TLB misses
Hardware Support
– Talluri and Hill: Remove contiguity requirement
This approach: Hybrid reservation and
relocation system with page replacement that
biases toward pages that contribute to
contiguity
12
Design
Reservation-based superpage management
Multiple superpage sizes
Demotion of sparsely referenced superpages
Preservation of contiguity w/o compaction
Efficient disk I/O for partially modified SPs
Uses buddy allocator for contiguous regions
13
Key observation
Once an application touches the first page
of a memory object then it is likely that it will
quickly touch every page of that object
Example: array initialization
Opportunistic policies
– superpages as large and as soon as possible
– as long as no penalty if wrong decision
14
Reservations
Set of frames initially reserved at page fault
– Fixed-size objects: largest aligned superpage that
is not larger than the object
– Dynamic objects: same as fixed, but reservation is
allowed to extend beyond the end of the object
Preemption
– If no available memory for allocation request,
system will preempt the reservation whose most
recent page allocation occurred least recently
15
Managing reservations
largest unused (and aligned) chunk
4
2
1
best candidate for preemption at front:
 reservation whose most recently populated
frame was populated the least recently
16
Other Design Issues
Fragmentation control
– Coalescing
– Contiguity-aware page replacement
Incremental promotions
– Occurs as soon as a superpage region is fully
populated
Speculative demotion
– Occurs on eviction (recursively)
– Occurs on first write to clean superpage
• Overhead too high for hash digests
– Daemon periodically demotes pages speculatively
• Necessary due to reference bit limitation
17
Incremental promotions
Promotion policy: opportunistic
2
4
4+2
8
18
More Design Issues
Multi-list reservation scheme
– One list of each page size supported by hardware
– Reservations sorted by allocation recency
– Preemption removes from head of list
• Reservation recursively broken into extents
• Fully populated extents are not put in reservation lists
Population map
–
–
–
–
Reserved frame lookup
Overlap avoidance
Promotion decisions
Preemption assistance
19
Implementation Notes
FreeBSD uses three lists of pages in A-LRU
order: active, inactive, cache
Contiguity-aware page daemon
– Cache considered available for allocation
– Daemon activated when contiguity falls low
– Clean file-backed pages moved to inactive as
soon as file is closed
Wired page clustering
Multiple mappings
20
Evaluation
Setup
–
–
–
–
–
FreeBSD 4.3
Alpha 21264, 500 MHz, 512 MB RAM
8 KB, 64 KB, 512 KB, 4 MB pages
128-entry DTLB, 128-entry ITLB
Unmodified applications
21
Best-Case Results
TLB miss reduction usually above 95%
SPEC CPU2000 integer
– 11.2% improvement (0 to 38%)
SPEC CPU2000 floating point
– 11.0% improvement (-1.5% to 83%)
Other benchmarks
– FFT (2003 matrix): 55%
– 1000x1000 matrix transpose: 655%
30%+ in 8 out of 35 benchmarks
22
Benefits of multiple page sizes
Speedups
TLB Miss Reduction
23
Sustained benefits
Use Web server to fragment memory, then
use FFTW to see how quickly memory is
reclaimed
FFTW reaches a speedup of almost 55%,
Web server performance degrades only 1.6%
on successive run
Concurrent execution: only 3% degradation
with modified page daemon
24
Fragmentation control
normalized contiguity of free memory
.8
no frag control
frag control
.6
full speedup
.4
.2
no speedup
partial
speedup
time
0
web server
web server
FFT FFT FFT FFT
FFT
FFT
FFT
10min
FFT
25
Adversary applications
Incremental promotion
– Slowdown of 8.9%, 7.2% is hardware-specific
Sequential access
– 0.1% degradation
Preemption
– 1.1% degradation
General overhead
– Use superpage supporting mechanisms, but don’t
promote: 1-2% performance degradation
26
Cetera
Dirty Superpages
– Performance penalty of not demoting is a factor of
20
Scalability
– Most operations O(1), O(S) or O(S*R)
– Daemon, promotion, demotion and dirty/reference
bit emulation are linear
• Promotion/Demotion is amortized to O(S) for programs
the need to change page size only early in life
• Dirty/Reference bits: Motivates the need for clustered
page tables either in OS or HW
27
Conclusion
Effective, transparent and efficient support for
superpages
Demonstrates effectiveness of multiple page
sizes
Improved performance for nearly all
applications
Minimal overhead
Scalable to large numbers of page sizes
28