SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux 1 Plan Introduction SDDS R-tree SD-Rtree Evolution Balancing Overlapping Spatial Rotations Redundant Coverage Queries Performance Conclusion 2 SDDS Principles (1993) Data are at server nodes Communicating through point-to-point messaging ; Overloaded servers split over new servers Queries go to client nodes use local images of the SDDS No central addressing component A node can be client and server (peer) 3 SDDS Principles (1993) An outdated image may send a query an incorrect server Servers forward such a query to the correct server Image gets adjusted Image Adjustment Message (IAM) comes back Client does not repeat the same error twice Data are basically in the RAM of the servers 4 SD-Rtree : a Spatial SDDS Distributed Spatial Data 5 SD-Rtree : a Spatial SDDS •Distributed Index • No central component 6 SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future) 7 SD-Rtree : Generalizes R-tree R-tree: Nodes are minimal bounding boxes Leaf nodes point to data Internal nodes bound subtrees May overlap Split when overflow Generate balanced m-ary tree 8 SD-Rtree : Generalizes R-tree R-tree: An insert may go through multiple paths Ends up in the smallest bounding box If there is any One of the boxes gets enlarged Box may split 9 SD-Rtree : Generalizes R-tree R-tree: Search may go through multiple paths All paths may bring relevant objects 10 SD-Rtree: a Balanced Binary Tree The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has exactly two sons Each leaf node stores a subset of the indexed dataset At each node, the height of the subtrees differ by at most one Each server stores one data node and one routing node 13 Sd-tree: Binary Tree Structure di = data node (leaf) ri = routing node (internal node) 14 Sd-tree: Tree Distribution 15 SD-Rtree Balancing The binary tree should be heightbalanced The heights of the two subtrees rooted at any node should not differ by more than 1 (cf. AVL trees) The tree height is then logarithmic in the number of leaves 17 SD-Rtree Balancing SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial change rectangles of internal nodes Best rotation minimizes rectangle overlapping Tie breaking minimizes the « dead space » 18 Rotation Pattern Properties The sons of a node are not ordered => more freedom for reorganizing the tree Any imbalanced node matches a rotation pattern A rotation pattern is a subtree a(b(e(f,g),d),c) such that: h(c) = h(d) = h(f ) = n − 1 (n > 0) h(g) = max(0, n − 2) 20 SD-Rtree :Spatial Rotation 21 Rotation Cost Constant number of messages (3 or 6, depending on the choice) Few rotations in practice In particular when the dataset is uniformly distributed See our experiments 22 SD-Rtree : Images Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact server IAMs make it a subtree Splits make images outdated IAMs adjust it incrementally 23 Image Adjustment Client contacts a server with a query Each incorrect server initiates a traversal of the tree During the traversal, the description of the nodes is collected The correct server sends the up-to-date tree structure The client updates its image 24 Out-of-range situation 26 Insertion of objects 27 Overlapping management The directory rectangles in an Rtree may overlap Local subtree does not suffice for locating all the nodes that contains the point (point query) or the window (window query) searched for. SD-Rtree servers maintain data on node overlapping Redundant Coverage It avoids to systematically access the root node. 28 Redundant Coverage Example The region common to A and B is stored on both nodes If a point query sent to A falls in the region shared with B: A sends a point query message to B For D: we must keep the intersection with C or B: here empty. 29 Queries Point queries and window queries. The technique is similar to the insertion algorithm: Search in the client image a server whose mbb contains the point or intersects the window Send the query to this server If the server actually covers the point or the window; it answers to the client; else it sends the query to its parent node A server uses the overlapping information to transmit the query 30 Experiments Synthetic data (points and rectangles) generated with GSTD 50.000 to 500.000 objects 0 to 3.000 queries Server capacity: 3 000 objects Comparison of three SD-Rtree variants: BASIC: no image; every query is processed topdown from the root IMSERVER: no IAMs among the servers IMCLIENT: client images 31 Per Insert Cost 33 Cost of balancing 34 Image convergence 35 Distribution of messages 36 Cost per Query 37 Conclusion SD-Rtree is an efficient scalable distributed Rtree For very large spatial data collections Can be processed in distributed RAM Load balancing Spatial rotations Overlapping management Access time much faster than to disk data Redundant coverage O(log n) worst insert cost Future work kNN-queries Objects distribution balancing on servers 38 SD-Rtree Thank You for Your Attention Questions: [email protected] 39
© Copyright 2026 Paperzz