Scaling Network Games Using Collaborative P2P

Concurrency Control in
Distributed MRA Index Structures
18th December 2008
Neha Singh, S. Sudarshan
(IIT Bombay)
Introduction
Problem statement: Computing aggregate queries over a region in a multi-dimensional
space containing mobile point data, when the data is stored in a distributed system.
Need for such aggregate queries:

Networked Virtual Environment: Widely used in online games and training simulator
–

E.g.: "Run in fear if number of marching enemy troop exceed number of friends around
you" (aggregate count query)
Real-time traffic monitoring
Issues

Large non-static data storage requires distributed system & dynamic spatial partitioning

Synchronized system clocks not feasible for large scale distributed systems

Combining local state information from different peers can give inconsistent aggregates
due to varied communication delay between them
COMAD 2008
1
Key contributions

A distributed multi-resolution aggregate index structure to support dynamic
object set:
–
The multi-resolution aggregate tree stores precomputed aggregates at each tree
node in a centralized system, to speed up aggregate queries
– We extend it to support non-static data in a distributed system

Atomic updates/reads: Our read protocol and aggregate tree update protocols
ensure that updates are atomic to reads (aggregate queries),

Highly concurrent update protocol: We present a highly concurrent multi-phase
update protocol that
–
avoids blocking of reads
– minimizes contention with concurrent updates
COMAD 2008
2
Agenda
Introduction
• Problem statement & Motivation
• System Model
Readers protocol
Maintenance of the index structure
• Definitions
• Naive update protocol
• Multi-phase update protocol
Experimental Analysis
COMAD 2008
3
System Model – Partitioning the global space
For partitioning, we use a quad tree based regular decomposition of space
 Benefits
 Partitions independent of order of data insertion
 Decomposition implicitly known by all end systems
 Mapping the regions onto the P2P overlay
 Each quad tree block has a unique centroid
 Use this as key in DHT to map the regions onto peer set
2128 - 1
0
Quadtree Regions
COMAD 2008
4
System Model – MRA Tree
An MRA-Tree (Multi-Resolution Aggregate Tree) is a modified multi-dimensional index
structure that stores pre-computed aggregates at various resolutions in its intermediate
nodes


A leaf node contains the actual
data points - <loc, value>
count
10
sum
30
A non-leaf node stores aggregate
for all data points indexed by it {COUNT,SUM,
MINARRAY,MAXARRAY} where
MINARRAY,MAXARRAY are the min
and max resp. of each child node
minarray
maxarray
2
4
3
5
Non-Leaf Node
2
4
1
6
4
5
2
6
4
1
2
1
6
2
Leaf Nodes
COMAD 2008
5
1 Readers Protocol
Readers Protocol
Read query over
region Q
Query traverses the index
structure top down,
starting from root node
and selectively exploring
the nodes
B
F
A
E
Relation of the query to
node
1
Q Is contained or
partially overlaps
2
A
Q
N Q
4
COMAD 2008
D
F
G
H
Encloses so
further
traversal not
required
Q Disjoint
N
D
C
3 Q Encloses
H
C
B
N
E
Q
G
Nodes read
Q
Further traversal not
necessary
Intersecting with Q and read
Intersecting with Q but not read
6
1 Readers Protocol
Readers Protocol
Naïve Read Method:
 Get lock on all nodes while traversing down the tree
 Release lock after read is completed
However…
 This reduces the concurrency of the index structure for concurrent updates
Needed
 Release of locks early
 At the same time prevent updates coming from top-down to overtake
Solution: Use Crabbing Protocol – Acquire lock on all the child nodes before
releasing the lock on the parent
COMAD 2008
7
2 Maintenance of Index Structure
Maintenance of Index Structure
Aim: To update the distributed aggregate tree such that these operations are
atomic while causes minimum blocking to read

We consider two types of updates
–
Move operation
 Within the same node
 Across different nodes
– Insertion / deletion operation

Index tree needs to be updated only in case of transfer across different
nodes and insertion / deletion operation

Since our application is data-driven, updates percolate from leaf nodes to
the higher levels of the hierarchy
COMAD 2008
8
2 Maintenance of Index Structure
Definition: Update Tree
Update Tree: Set of all the nodes (UT) of the distributed MRA tree whose stored
aggregate values are affected by the transaction T
Insertion / Deletion operation
Move operation
Consists of all ancestor nodes of
the leaf node
N
Consists of ancestors up to lowest
common ancestor of the leaf nodes
N
B
Move
Insertion /
Deletion
A
COMAD 2008
A
9
2 Maintenance of Index Structure
Definition: Update Tree
Importance of Update Tree:
 Although updates propagate up from
leaf nodes, only nodes in update tree
are affected
N

Hence locks can be acquired top-down
from root of update tree
 going to the tree root each time
would overload site containing root
B
Move

We use order of lock acquire at root
node of update tree to serialize
concurrent intersecting read and
update queries
COMAD 2008
A
10
2 Maintenance of Index Structure
Definition: Conflicting Updates
Conflicting updates: Two updates U1 and U2 are said to be conflicting updates
if U1T ∩ U2T ≠ Φ, where U1T and U2T are the corresponding update trees
N2
N2
N1
N1
U1
U2
U1
U2
Importance:

Common part of two update trees is connected and has a unique highest node

Order of access to this node = serialization order of concurrent conflicting updates
COMAD 2008
11
2 Maintenance of Index Structure
Naïve Update Protocol


X-Lock on all update tree nodes and then update them
Order of acquiring locks – top-down, as bottom-up can lead to deadlock with read query
Step I: Acquire Lock Phase
Step II: Update and Release Lock
 X-locks is acquired on all update
tree nodes top down starting from
root node
 Updates propagate bottom-up
 Nodes release locks after updating agg
 Root node releases lock only after update
over in both legs
N
N
B
A
COMAD 2008
B
A
12
2 Maintenance of Index Structure
Naïve Update Protocol
Problem: Low concurrency
Lock retained on root node of
update tree for the entire
duration
 It being X-locked results in
low concurrency and higher
read time
Issues
 Read query comes top-down,
and updates go bottom-up
 Root node last to be updated
and first to be read
 Still need to ensure update is
atomic for read
Key modifications
We propose a highly concurrent multiphase update protocol
Key modifications
1 Allow concurrent read while acquiring
locks and updating other nodes
• Introduce a new locking mode: Ulock compatible with S-lock
2 Nodes updated top-down
• Split the update process in 3 phases
3 Prevent read to overtake top-down
update and read inconsistent value
• Use crabbing protocol while
acquiring locks to update nodes
COMAD 2008
13
2 Maintenance of Index Structure
Multi Phase Update Protocol
1
Update Lock Mode – A new locking mode compatible with read
 Locked nodes for possible future modification
 Can be upgraded to X-lock when needed
S
U
X
S
U
X
True
True
False
False
False
False
Compatibility Matrix


U-S: True => Read Query can proceed while update is modifying
other nodes of the update tree
U-U: False => Conflicting updates need to wait for each other
COMAD 2008
14
2 Maintenance of Index Structure
Multi-phase update Protocol
2
Update split into 3 phases
Acquire Lock Phase
Propagate Phase
U-locks acquired topdown starting from root
node
Update gets propagated
bottom-up from leaf
nodes and are stored as
pendingUpdates
• U-locks upgraded to X-locks
• Stored pendingUpdates get
executed top-down
3• X-locks acquired on child
nodes then lock released –
Crabbing Protocol
N
N
N
δU
A
δU
δU
B
B
COMAD 2008
Refresh Phase
B
δU
δU A
δU A
15
2 Maintenance of Index Structure
Multi Phase Update Protocol- Correctness and efficiency
Serialization Order
 R - U Order: Order of read query S-lock and update X-lock at update tree root
node
 U - U Order: Order of U-lock point at the highest node of the common twig pattern
Importance of separation of Acquire Lock and Propagation phases
 U-locks acquired bottom-up
 Acquiring U-locks top-down cannot lead to deadlock with read (as in the naïve
case)
 But, merging both phases can lead to deadlock between concurrent conflicting
updates
Importance of Crabbing protocol
 Used for upgrading U-lock to X-lock top-down
 Thus read query cannot overtake an update
 It sees the state either before or after refresh on all intersecting nodes
=> Update atomic for read
COMAD 2008
16
2 Maintenance of Index Structure
Multi Phase Update Protocol
Scenario I: Can the propagated leaf node value change before update gets over?


Consider a new max value (m) was
propagated up during the propagate
phase
What if this max gets changed
between the time it is propagated up
and it gets executed at the nodes?
Getting a U-lock for this entire duration
between propagate and refresh phases,
ensures that no other update can
change the node’s value being
propagated up the tree
m
N
COMAD 2008
17
2 Maintenance of Index Structure
Multi Phase Update Protocol
Scenario II: Can the stored pendingUpdate value get stale?
Assume U decreases max aggregate
value at node B to 3
10
10
6
C
2
D 10
6
8
A4
6 3
B
4
1
3
COMAD 2008
4
18
2 Maintenance of Index Structure
Multi Phase Update Protocol
Scenario II: Can the stored pendingUpdate value get stale?
Assume U decreases max aggregate
value at node B to 3
10
6
10
4
C
2
6
D 10
4
8
A4
6 3
B
4
1
3
COMAD 2008
4
19
2 Maintenance of Index Structure
Multi Phase Update Protocol
Scenario II: Can the stored pendingUpdate value get stale?
Assume U decreases max aggregate
value at node B to 3
10
6
What if max value of node A
changes meanwhile?
C
2
This cannot happen because:

Any transaction attempting to
modify the max value of A would
intersect with U on at least node C.

Thus would be executed serially
6
D 10
4
8
A4
6 3
B
4
1
3
COMAD 2008
10
4
4
20
2 Maintenance of Index Structure
Multi Phase Update Protocol
Scenario II: Can the stored pendingUpdate value get stale?
Assume U decreases max aggregate
value at node B to 3
10
6
What if max value of node A
changes meanwhile?
C
2
This cannot happen because:

Any transaction attempting to
modify the max value of A would
intersect with U on at least node C.

Thus would be executed serially
Note: Caching min/max values on
parent node helps reduce update
latency by greatly reducing number
of nodes required to be locked
6
D 10
4
8
A4
6 3
B
4
1
3
COMAD 2008
10
4
4
21
2 Maintenance of Index Structure
Multi Phase Update Protocol
Scenario III: Multiple updates to an entity


Consider an entity to be transferred by
an update U1 from A to B and then by U2
from B to C
Logically, U1 should get reflected on
nodes B and D before U2
Causal order of execution at node B
makes sure that U1 completes before U2
begin
A
U1
D
B
C
U2
COMAD 2008
22
2 Maintenance of Index Structure
Divisible Aggregates
Consider an update transaction which causes only change in the sum
and count of the leaf nodes and no change in min/max
Observation:
 Change in the agg for all nodes in the update tree is known
 No need to propagate these changes bottom-up
Overview of update protocol:
 Such transactions can have only one update phase
 X-lock acquired top-down using crabbing protocol
 Updates are executed and locks released
COMAD 2008
23
2 Maintenance of Index Structure
Comparative Analysis of the Update Methods
Aim: Estimate difference in concurrency provided by the update protocols
Acquire Lock
Phase
dm

Update Phase
dm

Naïve Update Protocol

d – communication
delay per link
m - # edges in the
longer leg
N – root node of
update tree
N locked for Read
Query
Acquire Lock
Phase
dm
Propagate
Phase
dm
Refresh Phase
2d
(m-2)d
Multi Phase Update Protocol
Update Phase
2d
(m-2)d
Updating only Divisible Aggregates
COMAD 2008
24
3 Experimental Analysis
Experimental Setup



Synthetic data set with non-uniform distribution of data points
DHT used – FreePastry implementation of Pastry DHT
Parameters for quadtree:
–

Every peer node specifies its threshold (t) as count of the number of
entities it can support
–

fmin = 14
Threshold can be as low as 0
We study the distributed MRA tree’s update protocol
–
Running time of read queries with and without updates
– Running time of updates
COMAD 2008
25
3 Experimental Analysis
For getting MRA index structure of varying depths, use peer
threshold as the lever
Variation of the min/max depth of the partitioning tree
Depth of the quad tree increases as the threshold approaches µ (#entities/# peers)
Depth
12
10
8
6
4
2
0
0
5
10
15
20
25
30
Threshold / µ
COMAD 2008
26
3 Experimental Analysis
Read query time taken depends number of nodes read, and
not the query region size
Variations of the read query duration as #
nodes read increase, with no updates
Variations of the read query duration as
query region increases, with no updates
Time (ms)
Time (ms)
400
450
400
350
350
300
250
300
200
150
100
250
50
0
0
20
40
60
80
Query region / Total area (in %)
COMAD 2008
2
4
6
8
10
12
14
16
18
20
22
24
26
# Nodes covered
27
3 Experimental Analysis
Update time taken directly proportional to # update tree nodes
Variation of average update time with increase in number of update tree nodes
Time (ms)
3,000
Naive protocol
MultiPhase locking protocol
2,000
1,000
0
2
3
4
5
6
7
8
9
10
Number of nodes
COMAD 2008
28
3 Experimental Analysis
Average read time taken increases much less for multi-phase
update protocol as compared to naive protocol
Read Query Duration for different
Naïve Update Workloads
Read Query Duration for different MP
Update Workloads
F2 > F1 (Frequency of updates)
F2 > F1 (Frequency of updates)
Time (ms)
4000
Time (ms)
F2
F1
No Updates
800
F2
F1
No Updates
3000
600
2000
400
1000
200
0
0
0
COMAD 2008
4
8
12
16
20
Number of nodes
0
4
8
12
16
20
Number of nodes
29
Conclusion

We propose a Distributed Multi-Resolution Aggregate Tree index
structure for answering aggregate queries over mobile entities

We point out problems with concurrent updates and propose the
multi-phase update protocol
– ensures that updates and aggregate queries are atomic wrt each
other
– minimizes contention and avoid deadlock

Analysis and experimental results show
– The multi-phase update protocol requires a longer update time
– But offers high concurrency for the read queries as compared to
naïve update protocol
COMAD 2008
30
Thank you!
Questions?
COMAD 2008
31