Adaptive software lock elision

Adaptive Software Lock Elision
Amitabha Roy
Systems Research Group, Computer Laboratory, University of Cambridge
{amitabha.roy}@cl.cam.ac.uk
1. Introduction
4. Design Challenges
Problem: Issues with Atomic Blocks + Optimistic STM
• Inflexible concurrency control : Usually only optimistic concurrency control,
not suitable for critical sections with low contention or low disjoint access
parallelism, eg linux kernel[4]
• Not compatible with legacy software : Need to specify atomic blocks.
Difficult to handle irrevocable actions such as call outs to legacy
code/system calls or IO
Solution: Software Lock Elision
• Retain locks as the primary means for concurrency control
• Enhance the locking API to support lock elision, that is executed
optimistically/speculatively
• Coarse grained locks can now scale / used when the critical section does IO
• Provide support for explicit lock composition
• Dynamically elide locks for adaptive concurrency control
Consequences:
• Easy retrofit to legacy software and elegant new applications
• Minimal programmer effort
• Allow multigranularity concurrency control on data structures
• Retain properties of locks such as fairness and priority inheritance
2. Mechanics
Add metadata to locks
struct sle_lock { base_lock lock; int version_number; int readers;}
Elide locks dynamically
• Seamless co-existence of threads that do not speculate past a lock
• Basic idea : Log the version number of speculated locks
• Speculative threads ensure the lock versions are unchanged at commit time
• Non speculative threads check version numbers of objects before using them to
ensure no committed but unwritten changes
• A rudimentary version of this multigranularity locking idea published [1]
• Memory management (no write after free by speculative threads)
• Should support a variable number of threads in the system – avoid epoch based
solutions
• Use external metadata like TL2
• For efficiency readers should not need to indirect outside objects
• Solution: Version number in objects + external lock
• Lock properties (preserve priority inheritance/fairness properties of locks)
• Non-speculative threads should never be blocked by speculative threads
• Should be able to copy out unwritten data from committed threads
• Should be able to prevent failed threads from writing to version numbers of
freed objects
• Can achieve this by using OS/scheduler support to revoke fine grained locks
• Lock composition (make lock based programs easier to write)
compose(foo, foobar);
compose(bar, foobar);
lock(foo)
lock(bar)
safe_lock(foo)/safe_lock(bar):
lock(bar)
lock(foo)
Acquire foobar in place of foo/bar
Deadlock !!
Ensure foo/bar is free before proceeding
5. Preliminary Results
• Independent of underlying lock implementation
• Handle non-2PL nesting of locks in the program
Scalable Locking [1] : Allow locks to be acquired transactionally and nontransactionally. Illustrated key ideas in software lock elision
Test bed: Altix 4700, 38 NUMA nodes * 2 sockets * dual core = 152 Itanium2 cores, 456
GB overall shared memory
Benchmark: Skip lists and Red Black trees, scalable locks vs. OSTM[2]
Skip Lists
10
16
9
14
8
12
7
6
OSTM
5
Scalable Locks
4
Time (microseconds)
do_sle_lock(sle_lock)
(dynamic_elide() or speculation_level > 0) and speculation_level >= 0 :
speculation_level + +; log_elided_lock(sle_lock);
Else :
speculation_level- -; do_base_lock(sle_lock.base_lock);
If(exclusive_mode) sle_lock.version + +; else atomic_inc(sle_lock.readers);
Red Black Trees
Time (microseconds)
/* count the number of speculative locks held if positive
* and the number of non-speculative locks held if negative
*/
speculation_level = 0
10
OSTM
8
Scalable Locks
6
3
4
2
2
0
0
do_sle_unlock(sle_lock)
speculation_level < 0 :
speculation_level + +;
If(exclusive_mode) sle_lock.version + +; else atomic_dec(sle_lock.readers);
Else :
speculation_level - -;
If(speculation_level == 0) commit_speculative_changes();
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
1
Threads
Threads
→ Scalable locks scales as well as OSTM and provides better
performance by a constant factor (~2X)
Asymmetry: 2 threads, each on a different NUMA node, all memory local to first node
Benchmark: Increment a counter, compare OSTM, RSTM[2](all contention managers)
and Scalable locks (with an MCS fairlock for conflict handling)
3. Speculation
Fairness in a Counter Benchmark
1.1
Executing speculatively whenever (speculation_depth > 0)
Need to version reads and shadow changes to shared state
Programmer knows what lock protects what data. Must explicitly mark data protected by
elidable locks
Fraction of total transactions
1
0.9
0.8
0.7
0.6
Thread 2
0.5
Thread 1
0.4
→ Scalable locks provides
perfect thread fairness,
50% accesses by each
thread
0.3
0.2
0.1
struct red_black_tree_node *rbnode1, *rbnode2;
……
Adds a version number
rbnode1->parent = rbnode2
Log dirty
Return shadow copy
Write Log
Log read
Read Log
po
lit
e
po
po
lk
a
lk
ar
up
tio
po n
lk
a
t im v is
es
ta
m
p
w
hp
ol
ka
sc
al
ab
l
Option : object granularity using compiler extensions eg. with gcc style attributes
struct red_black_tree_node { … } __attribute__((__speculative__))
Pointer dereferences call into the runtime
eL
oc
ks
st
m
Fr
a
ag ser
gr
es
si
ve
er
up
t io
n
gr
ee
d
hi
gh y
la
nd
er
ka
rm
ki
a
llb
lo
ck
ed
0
Contention Manager
6. Adaptive Concurrency Control
• Measure the amount of contention (waiting threads) of a lock
• Measure the amount of disjoint access parallelism behind a lock (conflicts
among speculating threads)
• Elide the lock only if sufficient contention AND disjoint access parallelism
[decided by a call to dynamic_elide() ]
7. References
Snapshot state
Dirty
Commit time 2PL fine grained write locks + verify read versions
rbnode
version
[1] Amitabha Roy, Keir Fraser and Steven Hand. A Transactional Approach to Lock Scalability. Proceedings of
the 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA08), Munich, Germany, June
2008
[2] Keir Fraser. Practical lock freedom. PhD thesis, Cambridge University Computer Laboratory, 2003. Also
available as Technical Report UCAM-CL-TR-579.
[3] Virendra J. Marathe et al. Lowering the overhead of software transactional memory. Technical Report,
Condensed version appeared in TRANSACT 2006.
[4] Christopher J. Rossbach et al. Txlinux: using and managing hardware transactional memory in an operating
system. In SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles,
pages 87–102. ACM, 2007.