Powerpoint slides

Ditto:
Speeding Up Runtime
Data Structure Invariant
Checks
AJ Shankar and Ras Bodik
UC Berkeley
Motivation: A Debugging Scenario





Buggy program: a large-scale web application in
Java
Primary data structure: hashMap of shopping
carts
Carts are modified throughout code
Bug: hashMap acting weird: carts disappearing,
etc.
Hypothesis: cart modification violates hashCode()
invariance
How to Check the Hypothesis?


Debugger facilities inadequate
Idea: write a runtime check
 Iterates
over buckets, checks hashCode() of each cart
in bucket
 Run check frequently to pinpoint error
Problem

The check is slow! (100x slowdown)
 Rerunning

the program is now a problem
Furthermore, what if bug isn’t reproducible?
 Run
the program with the check on entire test suite?
 Infeasible.
Our Tool: Ditto

Ditto speeds up data structure invariant checks
 Usually
asymptotically in size of data structure
 Hash table: 10x speedup at 1600 elements

What invariant checks can Ditto handle?
 Side-effect-free:
cannot return fresh mutable objects
 Recursive: not an inherent limitation of algorithm
Basic Observation: Incrementalize
…
…
…
…



“Hash code of each cart in
table corresponds to
containing bucket.”
Invariant checks the entire data structure …
… but once checked, a local change can be
(re)checked locally!
So, first establish invariant, then incrementally check
changes
A New Domain

Existing incrementalizers: general purpose but
not automatic [Acar PLDI 2006]
 User
must annotate the program
 For functional programs
 Other caveats (conversion to CPS, etc.)

Ditto is automatic in this domain
 Functional
invariant checks in an imperative Java
setting
 No user annotations
 Allows arbitrary heap updates outside the invariant
 A simple bytecode-to-bytecode implementation
Ditto Algorithm Overview
1.
First run of check: construct graph of the
computation

2.
3.
Stores function calls, concrete inputs
Track changes to computation inputs
Subsequent runs of check: rerun only
subcomputations with changed inputs

Incrementally update computation graph =
incrementally compute invariant check
Example Invariant Check

Ensures a tree is locally ordered
boolean isOrdered(Tree t) {
if (t == null) return true;
if (t.left != null && t.left .value >= t.value) return
false;
if (t.right != null && t.right.value <= t.value) return
false;
return isOrdered(t.left) && isOrdered(t.right);
}
1. Constructing a Computation Graph

Purpose of computation graph:
1.
2.

For unchanged parts of data structure, reuse existing
results
For changed parts, identify parts of check that need to
be rerun
Graph stores the initial check run:
Same inputs = can reuse
 Node = function invocation, along
returnwith
val its

Concrete formal arguments
Changed inputs = must rerun

Concrete heap accesses

Return value
Inputs
1. Constructing a Computation Graph

During first check run, by instrumentation
The Heap
P
Node created
with concrete
formal arg A
isOrdered(P)
Returns
true
A
B
C
Heap reads from a.value,
a.left, a.right, a.left.value,
a.right.value are
remembered
isOrdered(A)
isOrdered(B)
Calls
children
isOrdered(C)
2. Detecting Changed Inputs

Inputs to check that could change between runs:
– easy to detect (passed to the check)
 Heap values – harder (could be modified anywhere in
code)
 Arguments

Selective write barriers
 Statically
determine which fields are read in the check
 Barriers collect changed heap inputs used by check

In example: add write
barriers
if (t ==
null) returnfor
true;all writes into
if (t.left != null && t.left.value >= t.value) return false;
fields:
 Tree.left
 Tree.right
if (t.right != null && t.right.value <= t.value) return fals
return isOrdered(t.left) && isOrdered(t.right);
3. Rerunning the Invariant isOrdered()

Data structure modification: Add node N, remove
node F
…
…
A
B
C
N
…
…
…
…
D
F
E
G
…
…
…
…
…
C
B
D
E
A
…
…
F
…
G
…
…
3. Rerunning the Invariant
…
…
A
N
…
…
E
F
…
Write barriers say…
G
…
…
Tree With New Modifications

…
D
D
E
C
…
…
…
isOrdered(A
)
A
B
C
B
…
true
F
…
G
…
…
Computation Graph From Last
Run
Goal: Incrementally update computation graph
 Graph
must look as if check was run afresh
3. Rerunning the Invariant
…
…
A
N
C
…
…
…
…
…
D
F
…
E
F
…
G
…

C
B
D
E
A
N
…
B
…
true
…
G
…
…
isOrdered(A) is first node that needs to be rerun
 Parent
inputs haven’t changed (functions are sideeffect-free)


Rerunning exposes new node N
What happens at isOrdered(B)?
3. Rerunning the Invariant
…
…
A
N
C
…
…

…
…
F
…
E
F
…
G
…
G
…
…
isOrdered(B) has same formal args, heap inputs
We’d like to reuse its previous result


…
D
…

C
B
D
E
A
N
…
B
…
true
And end this subcomputation
Problem: isOrdered(B) also depends on return values of
its callees

Which might change, since isOrdered(D) will be rerun
Optimistic Memoization


Don’t want to rerun all nodes between B and D
Solution: we optimistically assume that … A
isOrdered(B) will return the same result N
C
 Invariant

B
checks generally do! (e.g. “success”)
…
…
Check assumption when we rerun
isOrdered(D)
D
E
…
F
…
G
…

For now, reuse previous result, finish up A
 A returns
previous result (true), so finished here
…
…
3. Rerunning the Invariant


Now we rerun isOrdered(D)
Reuse previous result of isOrdered(E),
(G)
 No

…
further changes so no need for optimismB
isOrdered(F) pruned from graph
A
N
C
…
…
D
E
…
F
…
G
…

isOrdered(D) returns previous result
(true)
 So
optimistic assumption was correct
…
…
What If isOrdered(D) Returned false?
false

Result propagated up graph
 Continues

as long as return val differs
In this case, root node of graph is
 Result
false
false A
false N
B
reached
false
for entire computation is changed
false
false D
…
E
…
G
…

Automatically corrects optimistic assumptions
…
Result of Algorithm
…
…
A
N
C
…
…

…
…
F
…
E
…
G
…
G
…
…
We’ve incrementally updated computation graph to
reflect updated data structure


…
D
…

C
B
D
E
A
N
…
B
…
true
Even with circular dependencies throughout graph, only reran 3
nodes
Result of computation is result of root node (true)
Graph is ready for next incremental update
Evaluation


Ran on a number of common data structure
invariants, two real-world examples
Most complex invariant: red-black trees
 Tree
is globally ordered
 Same # of black nodes to leaf
 Other RB properties (Black follows Red, etc.)
 We were unable to incrementalize this check by hand!
Kernel Results
Ordered list performance
Hash table performance
1400
No invariants
With Ditto
Invariants
1200
Time (ms)
1000
3500
No invariants
With Ditto
Invariants
800
3000
600
400
2500
0
0
500
1000
1500
2000
2500
3000
Data structure size
Red-black tree performance
10000
No invariants
With Ditto
Invariants
9000
8000
Time (ms)
7000
Time (ms)
200
2000
1500
1000
6000
5000
500
4000
3000
2000
0
1000
0
0
500
1000
1500
2000
Data structure size
2500
3000
0
500
1000
1500
2000
Data structure size
2500
3000
Real-world Examples
Tetris-like game Netcols




Invariant: no “floating” jewels in grid
With check, main event loop ran at 80ms, noticeably laggy
Result: event loop to 15ms with Ditto
JavaScript obfuscator

Invariant: no excluded keywords (based on a set of criteria) in
renaming map
JSO performance
25000
No invariants
With Ditto
Invariants
20000
Time (ms)

15000
10000
5000
0
0
5000
10000
Lines of JavaScript
15000
Summary

Results:
 Automatic
incrementalization made practical
 For checks in Java programs
 Data structure checks viable for development
environment

Made possible by
 Selection
of an interesting domain
 Optimistic memoization

Web:
http://www.cs.berkeley.edu/~aj/cs/ditto/