sd-rtree2-wl

SD-Rtree: A Scalable
Distributed Rtree
Witold Litwin &
Cédric du Mouza & Philippe
Rigaux
1
Plan

Introduction



SDDS
R-tree
SD-Rtree Evolution

Balancing


Overlapping




Spatial Rotations
Redundant Coverage
Queries
Performance
Conclusion
2
SDDS Principles (1993)

Data are at server nodes



Communicating through point-to-point
messaging ;
Overloaded servers split over new servers
Queries go to client nodes use local
images of the SDDS


No central addressing component
A node can be client and server (peer)
3
SDDS Principles (1993)



An outdated image may send a query an
incorrect server
Servers forward such a query to the correct
server
Image gets adjusted



Image Adjustment Message (IAM) comes back
Client does not repeat the same error twice
Data are basically in the RAM of the servers
4
SD-Rtree : a Spatial SDDS
Distributed Spatial Data
5
SD-Rtree : a Spatial SDDS
•Distributed Index
• No central component
6
SD-Rtree : a Spatial SDDS


Point & Window Queries
kNN queries (future)
7
SD-Rtree : Generalizes R-tree

R-tree:
 Nodes are minimal
bounding boxes





Leaf nodes point to
data
Internal nodes
bound subtrees
May overlap
Split when overflow
Generate balanced
m-ary tree
8
SD-Rtree : Generalizes R-tree

R-tree:
 An insert may go
through multiple
paths
 Ends up in the
smallest bounding
box



If there is any
One of the boxes
gets enlarged
Box may split
9
SD-Rtree : Generalizes R-tree

R-tree:
 Search may go
through multiple
paths

All paths may
bring relevant
objects
10
SD-Rtree: a Balanced Binary
Tree

The SD-Rtree is a balanced binary tree,
distributed on a set of servers, such that:




Each internal node (or routing node) has exactly
two sons
Each leaf node stores a subset of the indexed
dataset
At each node, the height of the subtrees differ by
at most one
Each server stores one data node and one routing
node
13
Sd-tree: Binary Tree Structure


di = data node (leaf)
ri = routing node (internal node)
14
Sd-tree: Tree Distribution
15
SD-Rtree Balancing

The binary tree should be heightbalanced


The heights of the two subtrees rooted at
any node should not differ by more than 1
(cf. AVL trees)
The tree height is then logarithmic in the
number of leaves
17
SD-Rtree Balancing

SD-Rtree balancing occurs during splits



Messages are sent bottom-up to adjust the height of
the ancestor nodes
Rotation occurs if an ancestor is imbalanced
SD-Rtree rotation are spatial


change rectangles of internal nodes
Best rotation minimizes rectangle overlapping

Tie breaking minimizes the « dead space »
18
Rotation Pattern

Properties




The sons of a node are not
ordered
=> more freedom for
reorganizing the tree
Any imbalanced node
matches a rotation pattern
A rotation pattern is a
subtree a(b(e(f,g),d),c) such
that:


h(c) = h(d) = h(f ) = n − 1
(n > 0)
h(g) = max(0, n − 2)
20
SD-Rtree :Spatial Rotation
21
Rotation Cost


Constant number of messages (3 or 6,
depending on the choice)
Few rotations in practice


In particular when the dataset is uniformly
distributed
See our experiments
22
SD-Rtree : Images

Each image defines the addressing
structure





Resides as cache on a client or on a peer
Starts with the address of the contact server
IAMs make it a subtree
Splits make images outdated
IAMs adjust it incrementally
23
Image Adjustment





Client contacts a server with a query
Each incorrect server initiates a traversal of
the tree
During the traversal, the description of the
nodes is collected
The correct server sends the up-to-date
tree structure
The client updates its image
24
Out-of-range situation
26
Insertion of objects
27
Overlapping management



The directory rectangles in an Rtree may overlap
Local subtree does not suffice for locating all the
nodes that contains the point (point query) or
the window (window query) searched for.
SD-Rtree servers maintain data on node
overlapping


Redundant Coverage
It avoids to systematically access the root node.
28
Redundant Coverage

Example



The region common to A and B is stored on both
nodes
If a point query sent to A falls in the region
shared with B: A sends a point query message to
B
For D: we must keep the intersection with C or
B: here empty.
29
Queries

Point queries and window queries. The
technique is similar to the insertion algorithm:




Search in the client image a server whose mbb
contains the point or intersects the window
Send the query to this server
If the server actually covers the point or the
window; it answers to the client; else it sends the
query to its parent node
A server uses the overlapping information to
transmit the query
30
Experiments

Synthetic data (points and rectangles)
generated with GSTD




50.000 to 500.000 objects
0 to 3.000 queries
Server capacity: 3 000 objects
Comparison of three SD-Rtree variants:



BASIC: no image; every query is processed topdown from the root
IMSERVER: no IAMs among the servers
IMCLIENT: client images
31
Per Insert Cost
33
Cost of balancing
34
Image convergence
35
Distribution of messages
36
Cost per Query
37
Conclusion



SD-Rtree is an efficient scalable distributed Rtree
For very large spatial data collections
Can be processed in distributed RAM


Load balancing



Spatial rotations
Overlapping management


Access time much faster than to disk data
Redundant coverage
O(log n) worst insert cost
Future work


kNN-queries
Objects distribution balancing on servers
38
SD-Rtree
Thank You
for
Your Attention
Questions: [email protected]
39