Efficient Computation of Temporal Aggregates with Range Predicates

Improving Min/Max Aggregation
over Spatial Objects
Donghui Zhang, Vassilis J. Tsotras
University of California, Riverside
ACM GIS’01
Outline
• Problem Definition
• Straightforward Solutions
• Our Solution
• Performance Results
• By-Product: Optimized the MSB-tree
• Conclusions
ACM GIS’01
Problem Definition
• Consider a collection of spatial objects.
• Each object: rectangle r, value v.
4
5
7
2
1
• Spatial Aggregation: find aggregate value over objects
intersecting a given rectangle. We focus on MAX.
• E.g.: a database of rainfalls over geographical areas. Find
max rainfall in Los Angeles area.
ACM GIS’01
Problem Definition
Straightforward Solutions
• Use an R*-tree [BKS+90] to index the objects.
• Reduce to range search.
• Better approach: aR-tree [PKZ+01, LM01]. Store
MAX of the sub-tree in internal nodes;
• If query rectangle contains a sub-tree, no need
to search it.
ACM GIS’01
Straightforward Solutions
Straightforward Solutions
• Use an R*-tree [BKS+90] to index the objects.
• Reduce to range search.
• Better approach: aR-tree [PKZ+01, LM01]. Store
MAX of the sub-tree in internal nodes;
• If query rectangle contains a sub-tree, no need
to search it.
ACM GIS’01
Straightforward Solutions
Our Solution -- overview
• The MR-tree: a specialized index for Min/Max
aggregation. It uses the R*-tree and four
optimization techniques:
 k-max : increase the chance for the search
algorithm to stop at higher tree levels;
 box-elimination : erase information from the
tree that will not contribute to any query;
 union : do not insert an object which will not
contribute to any query;
 area-reduction : reduce the area of the object
to be inserted.
ACM GIS’01
Our Solution
The k-max Optimization
• Motivation: The aR-tree is not efficient if the query
rectangle intersects but does not fully contain a
sub-tree rectangle.
7
4
8
9
1
5
6
2
7
4
4
9
5
ACM GIS’01
2
Optimization Techniques
The k-max Optimization
• Motivation: The aR-tree is not efficient if the query
rectangle intersects but does not fully contain a
sub-tree rectangle.
7
4
8
9
1
5
6
2
7
4
4
9
5
ACM GIS’01
2
Optimization Techniques
The k-max Optimization
• Along with each index record r, store the k maxvalue objects in sub-tree(r).
• Upon query, if the query rectangle intersects any of
the k objects at r, omit sub-tree(r).
• Trade-off: larger k  more sub-trees to be omitted
during query; but also  more space & update.
ACM GIS’01
Optimization Techniques
The box-elimination Optimization
• Motivation: if for objects o1 and o2 , o1.box contains
o2 .box and o1.value o2 .value, o2 is obsolete, i.e.
does not contribute to any query and thus can be
deleted.
o1:7
o2:5
ACM GIS’01
Optimization Techniques
The box-elimination Optimization
• Similar for object o1 and index record r2 , i.e. if
o1.box contains r2 .box and o1.value  max value in
sub-tree(r2), the whole sub-tree is obsolete.
• The optimization: at insertion, remove obsolete
objects and sub-trees along the insertion path.
• Ideally, remove all obsolete objects/sub-trees, but
too expensive. Instead, pick c (c : constant) paths.
• Trade-off: larger c  smaller index size and faster
query time; but also  more update time.
ACM GIS’01
Optimization Techniques
The union Optimization
• Motivation 1: if a new object o1 is obsolete due to
an existing object o2 , o1 should not be inserted.
• Motivation 2: a new object o1 may be obsolete due
to the union of several existing objects.
8
o1: 2
7
ACM GIS’01
Optimization Techniques
The union Optimization
• Motivation 1: if a new object o1 is obsolete due to
an existing object o2 , o1 should not be inserted.
• Motivation 2: a new object o1 may be obsolete due
to the union of several existing objects.
8
o1: 2
7
ACM GIS’01
Optimization Techniques
The union Optimization
• Along with each index record r, store the union of
boxes of all objects in sub-tree(r); also store the
MIN value of all these objects.
• Do not perform the insertion of object o1 if:
 o1.box is contained in r.union, and
 o1.value  r.min.
• Question: how is the union computed and stored?
ACM GIS’01
Optimization Techniques
The union Optimization
• Store an approximate union representation using t
(t : constant) boxes.
• The approximation should be fully contained in the
actual union, and should cover as much space as
possible.
• Def: given a set of n boxes S={s1,…, sn}, the
covered t-union of S is a set of t boxes
A={a1,…, at} s.t.
 si covers ai , and
 ai covers max area possible.
ACM GIS’01
Optimization Techniques
The union Optimization
• To compute the exact covered t-union: O(n 2t+4).
• We propose an much faster approximate algorithm:
O(n logn).
• Idea of our algorithm: pick up t largest boxes and
expand them.
ACM GIS’01
Optimization Techniques
The area-reduction Optimization
• Motivation: the box of a new object o1 can be
reduced if an existing object o2 intersects it with a
larger or equal value.
o 2: 8
ACM GIS’01
o1: 6
Optimization Techniques
The area-reduction Optimization
• Motivation: the box of a new object o1 can be
reduced if an existing object o2 intersects it with a
larger or equal value.
o 2: 8
ACM GIS’01
o1: 6
Optimization Techniques
The area-reduction Optimization
• Reduce the area of new object o1 when:
  index record r s.t. r.union intersects o.box
and r.min  o.value, or
 one of the k max-value objects intersects o1
with a larger or equal value, or
  leaf object o2 s.t. o2 .box intersects o1.box
and o2 .value  o1.value .
ACM GIS’01
Optimization Techniques
The area-reduction Optimization
• Benefit 1: reduce overlap among sibling nodes.
r1 (min=9)
new object
8
r2 (min=7)
ACM GIS’01
Optimization Techniques
The area-reduction Optimization
• Benefit 1: reduce overlap among sibling nodes.
actual object
inserted
r1 (min=9)
8
r2 (min=7)
• Benefit 2: increase chance to make new objects
obsolete.
ACM GIS’01
Optimization Techniques
Performance Results
• Datasets: 5 million square objects, size randomly chosen
from 10 to 10000 (space in each dimension is 1 to one
million).
• Implemented algorithms:
 R*: the R*-tree [BKS+90];
 aR: the aR-tree [PKZ+01, LM01];
 kaR: the aR-tree with k-max optimization;
 MR: the MR-tree (with all the optimizations).
ACM GIS’01
Performance Results
Index Sizes
150
Index Sizes (#MB)
125
100
75
50
25
0
R*
ACM GIS’01
aR
kaR
MR
Performance Results
Query Performance (log scale)
10000
Query Time (#sec)
1000
100
R*
10
aR
1
kaR
MR
0.1
0.01
0.0001
0.001
0.01
0.1
1
10
50
.
Query Rectangle Area (%)
• Query time is the total of 100 random queries of the
same query rectangle size.
ACM GIS’01
Performance Results
Optimizing the MSB-tree
• The MSB-tree [YW00]: efficiently maintains and
computes MIN/MAX aggregates over 1-dim interval
data.
• Insertion/Query: O(logB m), B is page capacity, m is
number of leaf records.
• [YW00]: periodically reconstruct the whole tree to
maintain a small m. During reconstruction, the index is
off-line.
• Can avoid reconstruction by applying the box-elimination
optimization. Idea: if a new interval contains all intervals
in a sub-tree with a larger value, the sub-tree is obsolete.
ACM GIS’01
Optimizing the MSB-tree
Conclusions
• Addressed the MIN/MAX aggregation problem over
spatial objects;
• Four optimization techniques;
• The MR-tree;
• Much smaller index size and query time;
• By-product: optimized the MSB-tree.
ACM GIS’01
Conclusions