Identifying Potential Program Transformations for Optimizing

Data Representation Synthesis
PLDI’2011, ESOP’12, PLDI’12
Peter Hawkins, Stanford University
Alex Aiken, Stanford University
Kathleen Fisher, Tufts
Martin Rinard, MIT
Mooly Sagiv, TAU
http://theory.stanford.edu/~hawkinsp/
Research Interests
• Verifying properties of low level data structure
manipulations
• Synthesizing low level data structure
manipulations
– Composing ADT operations
• Concurrent programming via smart libraries
Motivation
• Sophisticated data structures exist
• Increase abstraction level
• Simplifies programming
– Concurrency
• Integrated into modern PL
• But hard to use
– Composing several operations
thttpd: Web Server
Conn
2 next
index= 2
0
3
next
index = 0
next
index = 3
1 next
index = 1
1
next
Map
index= 5
table
[0]
[1]
[2]
[3]
[4]
[5]
null
Representation Invariants:
1. n: Map. v: Z.
table[v] =n  n->index=v
2. n: Map.
n->rc =|{n' : Conn . n’->file_data = n}|
thttptd:mmc.c
static void add_map(Map *m)
{
Representation Invariants:
int i = hash(m);
1. n: Map. v:Z.
table[v] =n  index[n]=v
…
broken
table[i] = m ;
2.  n:Map.
rc[n] =|{n' : Conn . file_data[n’] = n}|
…
m->index= i ; restored
…
m->rc++;
}
TOMCAT Motivating Example
TOMCAT 5.*
attr = new HashMap();
…
Attribute removeAttribute(String name){
Attribute val = null;
synchronized(attr) {
found = attr.containsKey(name) ;
if (found) {
val = attr.get(name);
attr.remove(name);
}
}
return val;
}
Invariant: removeAttribute(name) returns
the removed value or null if the name does not exist
TOMCAT Motivating Example
TOMCAT 6.*
attr = new ConcurrentHashMap();
…
Attribute removeAttribute(String name){
Attribute val = null;
/* synchronized(attr) { */
found = attr.containsKey(name) ;
if (found) {
val = attr.get(name);
attr.remove(name);
}
/* } */
return val;
attr.put(“A”, o);
attr.remove(“A”);
o
}
Invariant: removeAttribute(name) returns
the removed value or null if the name does not exist
Multiple ADTs
public void put(K k, V v) {
if (this.eden.size() >= size) {
this.longterm.putAll(this.eden);
this.eden.clear();
}
this.eden.put(k, v);
}
Invariant: Every element that added to eden is either in
eden or in longterm
Filesystem Example
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file_in_use
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file_unused
file=6
f_list
f_fs_list
file=5
f_list
f_fs_list
filesystems
file=2
f_list
f_fs_list
Filesystem Example
filesystems
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file_in_use
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file_unused
file=6
f_list
f_fs_list
file=5
f_list
f_fs_list
file=2
f_list
f_fs_list
Access Patterns
• Find all mounted
filesystems
• Find cached files on
each filesystem
• Iterate over all used
or unused cached
files in LeastRecently-Used order
Desired Properties of
Data Representations
•
•
•
•
11
Correctness
Efficiency for access patterns
Simple to reason about
Easy to evolve
Specifying
Combination of Shared Data Structures
•
•
•
•
•
Separation Logic
First order logic+ reachability
Monadic 2nd order logic on trees
Graph grammars
…
Filesystem Example
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file_in_use
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file_unused
file=6
f_list
f_fs_list
file=5
f_list
f_fs_list
filesystems
fs
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
fs
file
inuse
1
17
F
1
1
{fs,34
file}  T{inuse}
42
F
2
5
T
2
5
F
file=2
f_list
f_fs_list
Filesystem Example
Containers: View 1
filesystems
file_in_use
file_unused
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file=6
f_list
f_fs_list
file=5
f_list
f_fs_list
file=2
f_list
f_fs_list
fs:1
fs:2
file:2
file:6
file:14
file:7
file:5
Filesystem Example
Containers: View 2
filesystems
file_in_use
file_unused
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file=6
f_list
f_fs_list
file=2
f_list
f_fs_list
file=5
f_list
f_fs_list
fs: 1,
file:14
fs: 1,
file:2
fs: 2, fs: 2,
file:7 file:5
inuse:T
inuse:F
Filesystem Example
Containers: Both Views
filesystems
file_in_use
file_unused
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file=6
f_list
f_fs_list
file=2
f_list
f_fs_list
file=5
f_list
f_fs_list
fs  file  inuse
{fs, file} { inuse}
inuse:T
inuse:F
fs:1
fs:2
file:2
file:7 file:5
file:6
file:14
fs: 1,
file14
fs: 2,
fs: 2, file:5
file:7
Decomposing Relations
inuse
fs
file
fs, file
fs  file  inuse
{fs, file} { inuse}
inuse
• High-level shape descriptor
• A rooted directed acyclic graph
• The root represents the whole
relation
• The sub-graphs rooted at each
node represent sub-relations
• Edges are labeled with
columns
• Each edge maps a relation into
a sub-relation
• Different types of edges for
different primitive datastructures
• Shared nodes correspond to
shared sub-relations
Filesystem Example
inuse
fs
file
file:14
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
fs:1
fs, file
inuse
fs
fs:2
file
inuse
file
inuse
14
F
7
T
6
T
5
F
2
F
file:6
file:2
file:7
file:5
inuse
inuse
inuse
inuse
inuse
F
T
F
T
F
Memory Decomposition(Left)
inuse
fs
file
fs, file
fs:1
fs:2
inuse
file:14
inuse:F
file:6
inuse:T
file:2
inuse:F
file:7
file:5
Inuse:T
Inuse:F
Filesystem Example
inuse
fs
file
fs
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
inuse:T
fs, file
inuse
fs:2
file:7
inuse:F
fs
file
fs
file
2
7
1
14
1
6
2
5
1
2
fs:1
file:6
fs:1
file:14
fs:2
file:5
fs:1
file:2
inuse
inuse
inuse
inuse
inuse
T
T
F
F
F
Memory Decomposition(Right)
inuse
fs
file
inuse:T
fs, file
inuse:F
inuse
fs:2
file:7
inuse:F
inuse:T
fs:1
file:6
inuse:F
fs:1
file:14
fs:2
file:5
Inuse:T
fs:1
file:2
Inuse:F
Decomposition Instance
inuse
fs
fs:1
file
fs:2
inuse:F
inuse:T
fs, file
fs  file  inuse
{fs, file} { inuse}
fs
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
file:2
inuse
file:5
file:14 file:6
fs:2
file:7
inuse:F
inuse:T
file:7
fs:1
file:6
inuse:F
fs:1
file:14
fs:2
file:5
Inuse:T
fs:1
file:2
Inuse:F
Shape Descriptor
inuse
fs
fs:1
file
fs:2
inuse:F
inuse:T
fs, file
fs  file  inuse
{fs, file} { inuse}
fs
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
file:2
inuse
file:5
file:14 file:6
fs:2
file:7
inuse:F
inuse:T
file:7
fs:1
file:6
inuse:F
fs:1
file:14
fs:2
file:5
Inuse:T
fs:1
file:2
Inuse:F
Memory State
inuse
fs
inuse:T inuse:F
fs:1
file
fs, file
fs:2
s_list
fs  file  inuse
{fs, file} { inuse}
fs
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
file:2
inuse
file:14 file:6
fs:2
file:7
f_fs_list
inuse:F
inuse:T
f_list
file:5
file:7
fs:1 f_list
file:6
f_fs_list
inuse:F
fs:1
file:14
fs:2
fs:1
file:5
file:2
f_fs_list
Inuse:T
f_list
Inuse:F
Memory State
filesystems
inuse
fs
file
fs, file
fs  file  inuse
{fs, file} { inuse}
fs
file
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
inuse
file_in_use
file_unused
filesystem=1
s_list
s_files
filesystem=2
s_list
s_files
file=14
f_list
f_fs_list
file=7
f_list
f_fs_list
file=6
f_list
f_fs_list
file=5
f_list
f_fs_list
file=2
f_list
f_fs_list
Adequacy
Not every decomposition is a good representation of a
relation
A decomposition is adequate if it can represent every possible relation
matching a relational specification
enforces sufficient conditions for adequacy
Adequacy
Adequacy of Decompositions
• All columns are represented
• Nodes are consistent with functional
dependencies
– Columns bound to paths leading to a common
node must functionally determine each other
Respect Functional Dependencies
file
inuse
 {file}
{inuse}
Respect Functional Dependencies
file,fs
inuse
 {file, fs}  {inuse}
Adequacy and Sharing
fs
file
inuse
fs, file
inuse
Columns bound on a path to an object x must functionally
determine columns bound on any other path to x
 {fs, file}{inuse, fs, file}
Adequacy and Sharing
inuse
fs
file
fs
inuse
Columns bound on a path to an object x must functionally
determine columns bound on any other path to x
 {fs, file}
{inuse, fs}
Decompositions and Aliasing
• A decomposition is an
upper bound on the set of
possible aliases
• Example: there are exactly
two paths to any instance
of node w
The RelC Compiler[PLDI’11]
Programmer
Relational
Specification
RelC
Decomposition
Data structure designer
C++
The RelC Compiler
Relational Specification
fs fileinuse
{fs, file}  {inuse}
foreach <fs, file, inuse> filesystems s.t. fs=
5
do …
Graph decomposition
RelC
inuse
fs
file
fs, file
inuse
C++
Query Plans
foreach <fs, file, inuse> filesystems
if inuse=T do …
inuse
fs
file
fs, file
inuse
Cost proportional to the number of files
Query Plans
foreach <fs, file, inuse> filesystems
if inuse=T do …
inuse
fs
file
fs, file
inuse
Cost proportional to the number of files in use
Removal and graph cuts
remove <fs:1>
filesystems
fs
file
fs
file
inuse
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F

inuse:T
fs:1
s_list
s_files
fs:2
s_list
s_files
file:14
f_list
f_fs_list
file:7
f_list
f_fs_list


file:6
f_list
f_fs_list
inuse:F

file:2
f_list
f_fs_list
file:5
f_list
f_fs_list
Removal and graph cuts
remove <fs:1>
filesystems
fs
file
fs
file
inuse
inuse
1
14
F
2
7
T
2
5
F
1
6
T
1
2
F
fs:2
s_list
s_files
inuse:T
file:7
f_list
f_fs_list
inuse:F
file:5
f_list
f_fs_list
Abstraction Theorem
• If the programmer obeys the relational
specification and the decomposition is adequate
and if the individual containers are correct
• Then the generated low-level code maintains the
relational abstraction
relation
memory
state
remove <fs:1>
relation
low level code memory
remove <fs:1> state
^
Filesystem Example
• Lock-based
concurrent code is
difficult to write and
difficult to reason
about
• Hard to choose the
correct granularity
of locking
• Hard to use
concurrent
containers correctly
40
Granularity of Locking
Fine-grained locking
More scalable
More complex
Less complex
Coarse-grained locking
41
Lower overhead
Easier to reason about
Easier to change
Concurrent Containers
• Concurrent containers are hard to write
• Excellent concurrent containers exist (e.g.,
Java’s ConcurrentHashMap and
ConcurrentSkipListMap)
• Composing concurrent containers is tricky
– It is not usually safe to update containers in paralel
– Lock association is tricky
• In practice programmers often* use
concurrent containers incorrectly
* 44% of the time [Shacham et
al., 2011]
42
Concurrent Data Representation
Synthesis[PLDI’12]
Relational Specification
Compiler
RelScala
Concurrent Decomposition
43
Concurrent Data Structures,
Atomic Transactions
Concurrent Decompositions
• Concurrent decompositions
describe how to represent a
relation using concurrent
containers and locks
• Concurrent decompositions
specify:
= ConcurrentHashMap
44
– the choice of containers
– the number and placement of
locks
– which locks protect which data
Concurrent Interface
Interface consisting of atomic operations:
45
Goal: Serializability
Every schedule of operations is equivalent to some serial schedule
Thread 1
Thread 2
Time
46
Goal: Serializability
Every schedule of operations is equivalent to some serial schedule
Thread 1
Thread 2
Time
47
Two-Phase Locking
Attach a lock to each piece of data
Two phase locking protocol:
• Well-locked: To perform a read or write, a
thread must hold the corresponding lock
• Two-phase: All lock acquisitions must precede
all lock releases
Theorem [Eswaran et al., 1976]: Well-locked, two-phase transactions are
serializable
48
Two Phase Locking
Decomposition
Decomposition Instance
Attach a lock to every edge
We’re done!
Problem 1: Can’t attach locks to container entries
Problem 2: Too many locks
49
Lock Placements
Decomposition
1. Attach locks to nodes
Decomposition Instance
Coarse-Grained Locking
Decomposition
51
Decomposition Instance
Finer-Grained Locking
Decomposition
52
Decomposition Instance
Lock Striping
Decomposition
53
Decomposition Instance
Lock Placements: Domination
Locks must dominate the edges they protect
Decomposition
54
Decomposition Instance
Lock Placements: Path-Closure
All edges on a path between an edge and its
lock must share the same lock
55
Decompositions and Aliasing
• A decomposition is an upper
bound on the set of possible
aliases
• Example: there are exactly
two paths to any instance of
node w
• We can reason about the
relationship between locks
and heap cells soundly
56
Lock Ordering
Prevent deadlock via a topological order on locks
57
Queries and Deadlock
Query plans must acquire the correct locks in the correct order
1. acquire(t)
2. lookup(tv)
3. acquire(v)
4. scan(vw)
Example: find files on a particular filesystem
58
Autotuner
• Given a fixed set of primitive types
– list, circular list, doubly-linked list, array, map, …
• A workload
• Exhaustively enumerate all the adequate
decompositions up to certain size
• The compiler can automatically pick the best
representation for the workload
Directed Graph Example (DFS)
• Columns
src  dst  weight
• Functional Dependencies
– {src, dst}  {weight}
• Primitive data types
– map, list
list
weight
weight
src
weight
src
dst
dst
list
map
src
dst
list
dst
list
dst
map
src
weight
src
weight
src
…
dst
Concurrent Synthesis
Find optimal combination of
Array
TreeMap
HashMap
LinkedList
ConcurrentHashMap
ConcurrentSkipListMap
CopyOnWriteArrayList
Container
Data Structures
Decomposition Shape
Lock Placement
ReentrantLock
ReentrantReadWriteLock
Lock Implementations
61
Lock Striping Factors
Concurrent Graph Benchmark
• Start with an empty graph
• Each thread performs 5 x 105 random
operations
• Distribution of operations a-b-c-d (a%
find successors, b% find predecessors, c%
insert edge, d% remove edge)
• Plot throughput with varying number of
threads
Based on Herlihy’s benchmark of concurrent maps
62
Results: 35-35-20-10
35% find successor, 35% find predecessor,
20% insert edge, 10% remove edge
Key
...
Black
= handwritten,
= ConcurrentHashMap
isomorphic to
= HashMap
blue
63
(Some) Related Projects
• Relational synthesis: [Cohen & Campbell 1993],
[Batory & Thomas 1996], [Smaragdakis & Batory
1997], [Batory et al. 2000] [Manevich, 2012] …
• Two-phase locking and Predicate Locking
[Eswaran et al., 1976], Tree and DAG locking
protocols [Attiya et al., 2010], Domination
Locking [Golan-Gueta et al., 2011]
• Lock Inference for Atomic Sections: [McCloskey et
al.,2006], [Hicks, 2006], [Emmi, 2007]
Summary
• Decompositions describe how to represent a
relation using concurrent containers
• Lock placements capture different locking
strategies
• Synthesis explores the combined space of
decompositions and lock placements to find
the best possible concurrent data structure
implementations
The Big Idea
Relational Specification
Low Level Concurrent Code
Benefits
• Simpler code
• Correctness by construction
• Find better combinations of data structures
and locking
Transactional Objects
with Foresight
Guy Gueta
G. Ramalingam
Mooly Sagiv
Eran Yahav
Motivation
• Modern libraries provide atomic ADTs
• Hidden synchronization complexity
• Not composable
– Sequences of operations are not necessarily
atomic
Transactional Objects with Foresight
• The client declares the intended use of an API
via temporal specifications (foresight)
• The library utilizes the specification
– Synchronize between operations which do not
serialize with foresight
• Foresight can be automatically inferred with
sequential static analysis
Example: Positive Counter
Operation A
@mayUse(Inc);
Inc()
Inc()
@mayUse( )
Operation B
@mayUse(Dec);
Dec()
Dec()
@mayUse( )
• Composite operations are not atomic
• even when Inc/Dec are atomic
• Enforce atomicity using future information
• Future information can be utilized for effective
synchronization
Possible Executions (initial value=0)
Main Results
• Define the notion of ADTs that utilize foresight
information
– Show that they are strict generalization of existing
locking mechanisms
• Create a Java Map ADT which utilizes foresight
• Static analysis for inferring conservative foresight
information
• Performance is comparable to manual
synchronization
• Can be used for real composite operations
The RelC Compiler
Relational Specification
fs fileinuse
{fs, file}  {inuse}
foreach <fs, file, inuse> filesystems s.t. fs=
5
do …
Graph decomposition
RelC
inuse
fs
file
fs, file
inuse
C++