Incremental Maintenance of XML Structural Indexes Ke Yi1, Hao He1, Ioana Stanoi2 and Jun Yang1 1Department of Computer Science, Duke University 2IBM T. J. Watson Research Center Motivation XML is gaining tremendously in popularity in recent years Used to represent many kinds of data Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval IBM DB2, Oracle , Microsoft SQL Server Tamino, Natix, X-Hive, … Overview paper 1 13 section section 2 title 14 title 3 8 section 4 section “intro” algorithm 5 title “1-index” “experiments” 6 algorithm proof 7 9 title “A(k)-index” 12 uses 15 exp 10 11 proof 17 about 18 about exp 16 Label Path Expressions /paper/section/algorithm paper 1 13 section section 2 title 14 title 3 8 section 4 section “intro” algorithm 5 title “1-index” “experiments” 6 algorithm proof 7 9 title “A(k)-index” 12 uses 15 exp 10 11 proof 17 about 18 about exp 16 Structural Indexes Why do we need them? Structural indexes Speedup the evaluation of path expressions Provides a structural summary of the data graph DataGuide [Goldman & Widom 97] 1-index [Milo & Suciu 99] A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03], M(k)-index [He & Yang 04] Integration of structural indexes and inverted lists [Kaushik et al. 04] Focus on maintenance Has a major effect on index efficiency Remains an overlooked issue Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “intro” “experiments” algorithm 5 title “1-index” 6 algorithm proof 7 9 title “A(k)-index” 12 uses 15 exp 10 11 proof 17 about 18 about exp 16 1-Index: Definition Constructed by using bisimilarity Definition based on stability Partition data nodes into index nodes dnode (v) and inode (I[v]) I[u] is v’s index parent if u is v’s parent An inode is stable if all of its dnodes have the u same index parents In a 1-index, all inodes are stable v I[u] I[v] 1-Index: Example paper paper 1 section title 14 2 4 section 2,4,8,13 section 8 15 exp 3 title 16 exp 10 6 9 algorithm title 18 proof 17 about algorithm 5 title proof 1 13 section 7 uses 11 12 data graph section exp title 3,5,9,14 7 algorithm 6,10 proof about /paper/section/algorithm 12 uses 11 proof 1-index 15,16 17,18 about 1-Index: Quality paper Assigning dnodes that are bisimilar into different inodes does not affect correctness, but does affect efficiency 1 2,4 2,4,8,13 8,13 section exp title 3,5,9,14 algorithm 6,10 15,16 The quality of an index # inodes # inodes in the minimum 1-index − 1 X 100% 7 proof 12 uses Ideal: quality = 0% 11 proof 17,18 about Previous Results Construction Edge changes The PT algorithm [Paige & Tarjan 87], in time O(m log n) m – # edges, n - # nodes The propagate algorithm [Kaushik et al. 02] Quality of the 1-index after update No guarantee on the quality of the resulted index 3 ~ 5% after 500 edge insertions in experiments Subgraph addition Index-reconstruction Edge Insertion: An Example (1) R A R R B A B C1 C2 C3 C1, C2 C3 D1 D2 D3 D1, D2 D3 Data Graph 1-Index A C1 B C2 C3 D1, D2 D3 Split 1 Edge Insertion: An Example (2) R A R B A C1 C2 C3 C1 D1 D2 D3 D1 Split 2 R B D2 A B C2, C3 C1 C2, C3 D3 D1 D2, D3 Merge 1 Merge 2 Indeed the minimum 1-index for the data graph after update Not a coincidence! Minimum & Minimal Indexes Minimum: with the smallest number of inodes Minimal: no two inodes can be merged R R R A1 A2 A1,A2 A1 A2 B1 B2 B1,B2 B1 B2 Data graph Minimum 1-index Minimal 1-index Quality Guarantee Theorem: The split/merge algorithm always maintains a minimal 1-index Lemma: For acyclic data graphs, there is a unique minimal 1-index The minimum 1-index is always maintained For cyclic data graphs, there could be more than one minimal 1-index One of them is maintained Outline paper 1 13 section section 2 title 14 title 3 4 section 8 section “intro” algorithm 5 title “1-index” “experiments” 6 algorithm proof 7 9 title “A(k)-index” 12 uses 15 exp 10 11 proof 17 about 18 about exp 16 A(k)-Index: Definition k-bisimilarity Definition based on stability A(0)-index: partition by label … A(k)-Index An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index Only interested in paths of length ≤k Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] But, no efficient maintenance algorithms are known! A(k)-index: Example R A R R B A B A B C1 C2,C3 C1 C2 C3 C1 C2,C3 C4 C5 C6 C4 C5,C6 Data graph R A(2) (=1-index) A B C1,C2,C3 C4,C5,C6 C4,C5,C6 A(1) Maintenance of A(i)-index requires the information in A(i-1)-index A(0) A(k)-index: Refinement Tree R A R R B A B A B C1 C2,C3 C1 C2 C3 C1 C2,C3 C4 C5 C6 C4 C5,C6 Data graph R A(2) (=1-index) A B C1,C2,C3 C4,C5,C6 C4,C5,C6 A(1) A(0) A(k)-index: Refinement Tree R A R R B A B A B C C C1 C2 C3 C C C4 C5 C6 C C Data graph R A(2) 1. Reduce storage cost 2. Reduce maintenance cost A B C C A(1) A(0) 0.5% ~ 13% additional storage Quality Guarantee Theorem: The split/merge algorithm always maintains the a minimal minimum A(k)-index Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic 1-index A(k)-index Acyclic minimum minimum Cyclic minimal minimum Outline paper 1 13 section section 2 title 14 title 3 4 section 8 section “intro” algorithm 5 title “1-index” “experiments” 6 algorithm proof 7 9 title “A(k)-index” 12 uses 15 exp 10 11 proof 17 about 18 about exp 16 Experiments on Edge Changes Datasets Setup Real-life: IMDB (272,000 nodes) Benchmark: XMark (198,000 nodes) First delete a portion of existing ID-REF links Then do random mixed insertions/deletions Compare with 1-index: propagate (+ reconstruction) A(k)-index: recompute affected portion (+ reconstruction) Experiment Results: 1-index Experiment Results: A(k)-index k speedup 2 1.35 3 6.15 4 16.6 5 15.3 running times Conclusions The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient Effective: quality guarantee on the resulted index Efficient: the algorithms themselves are fast Thank you! Graphical Illustration size valid 1-index merge split the index can only grow in size due to splitting, if merging is not enforced index
© Copyright 2026 Paperzz