Succinct Data Structures Kunihiko Sadakane National Institute of Informatics DFUDS Representation [6] • It encodes the degrees of nodes in unary codes in depth-first order (DFUDS = Depth First Unary Degree Sequence) 1 • Degree d ⇒ d (’s, followed by a ) • Add a dummy ( at the beginning 2 6 • 2n bits 3 DFUDS 4 5 7 8 U ((()((())))(())) 1 2 3 4 5 6 7 8 2 Various Operations on DFUDS • A node with degree d is represented by the first position of its encoding (d) – Position v in the sequence for a node and its preorder i is converted each other by v preorder-select i select ) i 1 1 i preorder-rank v rank ) v 1 1 • Degree: degree(v) select ) (rank) (v) 1) v 3 i-th child child (v, i ) findclose select ) rank ) (v) 1 i 1 v U1 U2 U3 (((()(())))((()))) v 1 2 6 5 3 4 7 8 9 4 Parent parent (v) select ) rank ) findopen(v 1) 1 p 2 5 6 (((()(())))((()))) 1 p 2 6 5 3 4 7 8 9 5 Number of Descendants (Subtree Size) • Size of a subtree rooted at v is subtreesize(v) = (findclose(enclose(v))v)/2+1 p 2 5 6 (((()(())))((()))) 1 p 2 6 5 3 4 7 8 9 6 LCA on DFUDS [7] • Can be computed by almost the same operation for BP • lca(x,y) = parent(RMQE(x,y1)+1) • Leftmost minimum is used 2 3 DFUDS 6 5 7 8 E 1232345432123210 U ((()((())))(())) 1 BP 4 1 P E 2 3 4 5 6 7 8 ((()()())(()())) 1232323212323210 7 Let E[i] = rank((U,i) rank)(U,i) Let T1, T2,...,Tk denote subtrees of v, DFUDS of v be U[l0..r0], E[r0] = d, DFUDS of Ti be U[li..ri]. Lemma: E[ri] = E[ri-1]1 = di (1 i k) E[j] > E[ri] (li j < ri) v r0 r1 r 2 r3 U (((()(())))((()))) E 123434543212343210 d 2 1 v 6 5 3 4 7 8 98 Proof: DFUDS U[li..ri] for each subtree will be Balanced by adding a ( at the head ⇒ E[ri] = E[ri-1]1 = di E[j] > E[ri] (li j < ri) v r0 r1 r 2 r3 U (((()(())))((()))) E 123434543212343210 d 2 1 v 6 5 3 4 7 8 99 Lemma: lca(x,y) = parent(RMQE(x,y1)+1) Proof: Let v = lca(x,y). Let T, T be subtrees of v which contains x, y respectively. Let E[r] = d. Case 1: If y < r (y is not the rightmost leaf of T) From E[y] d+1, E[y1] d+2, RMQE(x,y1) = r1 T E T d+1 > d+1 d d1 10 Case 2: If y = r From E[y1] = d+1, RMQE(x,y1) takes minimum value d+1 at r1 and y1. Because RMQ choose the leftmost one, we obtain r1. In both cases, RMQE(x,y1)+1 = l holds and parent(l) = lca(x,y) also holds. 11 Other Operations 1 • • • • leaf-rank(v) = rank))(v) leaf-select(i) = select))(i) preorder-rank(v) = (rank)(v1))+1 preorder-select(i) = (select)(i1))+1 1 2 34 5 6 7 8 9 U (((()(())))((()))) E 123434543212343210 1 2 6 5 3 4 7 8 12 9 Other Operations 2 • • • • inorder-rank(v) = leaf-rank(child(v,2)1) inorder-select(i) = parent(leaf-select(i)+1) leftmost-leaf(v) = leaf-select(leaf-rank(v1)+1) rightmost-leaf(v) = findclose(enclose(v)) 1 2 34 5 6 7 8 9 U (((()(())))((()))) E 123434543212343210 1 2 6 5 3 4 7 8 13 9 Range Minimum Query Problem (RMQ) • Input: array A[1,n] (preprocessing is allowed), interval of indices [i,j] [1,n] • Output: the index of a minimum value in sub-array A[i,j] 123456789 RMQA(3,6) = 5 A 143504537 RMQA(6,8) = 8 14 A Data Structure for RMQ Lemma: For an array of length n, let s(n) denote the size of a data structure, and let t(n) denote the time complexity for a query. A data structure satisfying the following formulas can be constructed in O(n) time [1]. 8n t (n) O(1) t lg n 8n o(n) (bits) s (n) 4n s lg n Note: the original input array is not used for a query. 15 Cartesian Tree [2] • The Cartesian tree for an array A[1,n] consists of – The root note: stores the minimum A[i] in A[1,n] – Left subtree: the Cartesian tree for A[1,i1] – Right subtree: the Cartesian tree for A[i+1,n] A 143504537 0 1 3 3 4 4 5 5 7 16 Relation between Cartesian Tree and RMQ • RMQA(i,j) = lca(i,j) A 143504537 0 1 3 3 4 4 5 5 7 17 Problem (lca, lowest common ancestor) • Input: a rooted tree T and its two nodes x, y • Output: the lowest common ancestor of x, y Known results • lca is found in constant time after linear time preprocessing. [Harel, Tarjan SICOMP84] [Schieber, Vishkin SICOMP88] [Bender, Farach-Colton 00] 18 A Property of Cartesian Tree Lemma: If A[n] is added to the Cartesian tree for A[1,n1], the node for A[n] is on the path from the root to the rightmost leaf. Proof: Because A[n] is the rightmost element in the 0 array, it is never stored in a left-child. 1 1 2064 2 3 3 3 4 5 4 54 4 5 5 6 19 Construction of Cartesian Tree When we add A[n] to the tree for A[1,n1] • Compare A[n] with elements on the path from A[n1] to the root • If an element x smaller than A[n] appears,insert A[n] • Let the right child of x the left child of A[n] 1 1 3 3 4 4 5 4 20 5 Time Complexity Lemma: Cartesian tree is constructed in O(n) time Proof: Let the number of comparisons to insert A[i] be ci. Then the total time complexity is n Oc i 1 i Each node on the rightmost path of the Cartesian tree becomes the left child of A[i] after the insertion. Therefore it is compared with A[i] only once. This implies that the total number of comparisons is 21 at most 2n. Thus the time complexity is O(n). BP Representation of Cartesian Tree A 143504537 0 1 3 3 4 4 5 5 7 P’123234543434543212123434543232343210 P ((()((())()(())))()((()(()))()(()))) 1 4 3 5 0 4 5 3 7 22 Algorithm for RMQ [3] • Construct the Cartesian tree for A[1,n] • Convert the Cartesian tree to BP sequence P and depth sequence P’ To find the position m of the leftmost minimum value in A[i,j] • i’ = select()(P,i), j’ = select()(P,j) • Let m’ be the position of the minimum in P’[i’, j’], then m = rank()(P,m’)+1 P’123234543434543212123434543232343210 P ((()((())()(())))()((()(()))()(()))) 23 1 4 3 5 0 4 5 3 7 RMQ on P’ • Divide P’ into blocks of length w = (lg n)/2 • Store minimum values of the blocks in B • Minimum value in P’[i’, j’] is either – Minimum value in the block containing i’ – Minimum value in the block containing j’, or – Minimum value in blocks between those blocks P (()((()())())(()())()) P’ 1212343432321232321210 B 1 3 2 1 1 0 24 Complexities • Length of P: 4n • Length of B: 4n/w = 8n/lg n • RMQ inside a block is done in O(1) time by a table lookup Size of the table O2 1 logn 2 log n log log n O n log n log log n • Complexities 2 2 8n t ( n) O(1) t lg n 8n o(n) s (n) 4n s lg n 25 Sparse Table Algorithm [2] • For each interval [i,i+2k1] in array B[1,m], store the minimum value in M[i,k]. ( i = 1 ,...,m , k = 1 ,2, ・・・ ,lg m ) • For a given query interval [s,b] 1. Let k = lg(bs) 2. Compare M[s, k] and M[b2k+1, k], and output the minimum. • O(1) time, O(m lg2 m) bit space 0 3 B 143504537 26 • This data structure is used when the length of B becomes O(n/lg3 n) ⇒o(n) bit space Theorem: RMQ is computed in O(1) time using a 4n+o(n) bit data structure. 27 References [1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造. 日本データベース学会Letters Vol.2, No.1, pp.103-106. [2] Michael A. Bender, Martin Farach-Colton: The LCA Problem Revisited. LATIN 2000: 88-94 [3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1): 12-22 (2007) [4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries. LATIN 2010: 158-169 [5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct Trees. SODA 2010: 134-149. [6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008. 28
© Copyright 2026 Paperzz