Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01 Outline • Problem Definition • Straightforward Solutions • Our Solution • Performance Results • By-Product: Optimized the MSB-tree • Conclusions ACM GIS’01 Problem Definition • Consider a collection of spatial objects. • Each object: rectangle r, value v. 4 5 7 2 1 • Spatial Aggregation: find aggregate value over objects intersecting a given rectangle. We focus on MAX. • E.g.: a database of rainfalls over geographical areas. Find max rainfall in Los Angeles area. ACM GIS’01 Problem Definition Straightforward Solutions • Use an R*-tree [BKS+90] to index the objects. • Reduce to range search. • Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes; • If query rectangle contains a sub-tree, no need to search it. ACM GIS’01 Straightforward Solutions Straightforward Solutions • Use an R*-tree [BKS+90] to index the objects. • Reduce to range search. • Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes; • If query rectangle contains a sub-tree, no need to search it. ACM GIS’01 Straightforward Solutions Our Solution -- overview • The MR-tree: a specialized index for Min/Max aggregation. It uses the R*-tree and four optimization techniques: k-max : increase the chance for the search algorithm to stop at higher tree levels; box-elimination : erase information from the tree that will not contribute to any query; union : do not insert an object which will not contribute to any query; area-reduction : reduce the area of the object to be inserted. ACM GIS’01 Our Solution The k-max Optimization • Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle. 7 4 8 9 1 5 6 2 7 4 4 9 5 ACM GIS’01 2 Optimization Techniques The k-max Optimization • Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle. 7 4 8 9 1 5 6 2 7 4 4 9 5 ACM GIS’01 2 Optimization Techniques The k-max Optimization • Along with each index record r, store the k maxvalue objects in sub-tree(r). • Upon query, if the query rectangle intersects any of the k objects at r, omit sub-tree(r). • Trade-off: larger k more sub-trees to be omitted during query; but also more space & update. ACM GIS’01 Optimization Techniques The box-elimination Optimization • Motivation: if for objects o1 and o2 , o1.box contains o2 .box and o1.value o2 .value, o2 is obsolete, i.e. does not contribute to any query and thus can be deleted. o1:7 o2:5 ACM GIS’01 Optimization Techniques The box-elimination Optimization • Similar for object o1 and index record r2 , i.e. if o1.box contains r2 .box and o1.value max value in sub-tree(r2), the whole sub-tree is obsolete. • The optimization: at insertion, remove obsolete objects and sub-trees along the insertion path. • Ideally, remove all obsolete objects/sub-trees, but too expensive. Instead, pick c (c : constant) paths. • Trade-off: larger c smaller index size and faster query time; but also more update time. ACM GIS’01 Optimization Techniques The union Optimization • Motivation 1: if a new object o1 is obsolete due to an existing object o2 , o1 should not be inserted. • Motivation 2: a new object o1 may be obsolete due to the union of several existing objects. 8 o1: 2 7 ACM GIS’01 Optimization Techniques The union Optimization • Motivation 1: if a new object o1 is obsolete due to an existing object o2 , o1 should not be inserted. • Motivation 2: a new object o1 may be obsolete due to the union of several existing objects. 8 o1: 2 7 ACM GIS’01 Optimization Techniques The union Optimization • Along with each index record r, store the union of boxes of all objects in sub-tree(r); also store the MIN value of all these objects. • Do not perform the insertion of object o1 if: o1.box is contained in r.union, and o1.value r.min. • Question: how is the union computed and stored? ACM GIS’01 Optimization Techniques The union Optimization • Store an approximate union representation using t (t : constant) boxes. • The approximation should be fully contained in the actual union, and should cover as much space as possible. • Def: given a set of n boxes S={s1,…, sn}, the covered t-union of S is a set of t boxes A={a1,…, at} s.t. si covers ai , and ai covers max area possible. ACM GIS’01 Optimization Techniques The union Optimization • To compute the exact covered t-union: O(n 2t+4). • We propose an much faster approximate algorithm: O(n logn). • Idea of our algorithm: pick up t largest boxes and expand them. ACM GIS’01 Optimization Techniques The area-reduction Optimization • Motivation: the box of a new object o1 can be reduced if an existing object o2 intersects it with a larger or equal value. o 2: 8 ACM GIS’01 o1: 6 Optimization Techniques The area-reduction Optimization • Motivation: the box of a new object o1 can be reduced if an existing object o2 intersects it with a larger or equal value. o 2: 8 ACM GIS’01 o1: 6 Optimization Techniques The area-reduction Optimization • Reduce the area of new object o1 when: index record r s.t. r.union intersects o.box and r.min o.value, or one of the k max-value objects intersects o1 with a larger or equal value, or leaf object o2 s.t. o2 .box intersects o1.box and o2 .value o1.value . ACM GIS’01 Optimization Techniques The area-reduction Optimization • Benefit 1: reduce overlap among sibling nodes. r1 (min=9) new object 8 r2 (min=7) ACM GIS’01 Optimization Techniques The area-reduction Optimization • Benefit 1: reduce overlap among sibling nodes. actual object inserted r1 (min=9) 8 r2 (min=7) • Benefit 2: increase chance to make new objects obsolete. ACM GIS’01 Optimization Techniques Performance Results • Datasets: 5 million square objects, size randomly chosen from 10 to 10000 (space in each dimension is 1 to one million). • Implemented algorithms: R*: the R*-tree [BKS+90]; aR: the aR-tree [PKZ+01, LM01]; kaR: the aR-tree with k-max optimization; MR: the MR-tree (with all the optimizations). ACM GIS’01 Performance Results Index Sizes 150 Index Sizes (#MB) 125 100 75 50 25 0 R* ACM GIS’01 aR kaR MR Performance Results Query Performance (log scale) 10000 Query Time (#sec) 1000 100 R* 10 aR 1 kaR MR 0.1 0.01 0.0001 0.001 0.01 0.1 1 10 50 . Query Rectangle Area (%) • Query time is the total of 100 random queries of the same query rectangle size. ACM GIS’01 Performance Results Optimizing the MSB-tree • The MSB-tree [YW00]: efficiently maintains and computes MIN/MAX aggregates over 1-dim interval data. • Insertion/Query: O(logB m), B is page capacity, m is number of leaf records. • [YW00]: periodically reconstruct the whole tree to maintain a small m. During reconstruction, the index is off-line. • Can avoid reconstruction by applying the box-elimination optimization. Idea: if a new interval contains all intervals in a sub-tree with a larger value, the sub-tree is obsolete. ACM GIS’01 Optimizing the MSB-tree Conclusions • Addressed the MIN/MAX aggregation problem over spatial objects; • Four optimization techniques; • The MR-tree; • Much smaller index size and query time; • By-product: optimized the MSB-tree. ACM GIS’01 Conclusions
© Copyright 2025 Paperzz