Succinct Data Structures Kunihiko Sadakane National Institute of Informatics Range Minimum Query Problem (RMQ) • Input: array A[1,n] (preprocessing is allowed), interval of indices [i,j] [1,n] • Output: the index of a minimum value in sub-array A[i,j] 123456789 RMQA(3,6) = 5 A 143504537 RMQA(6,8) = 8 2 A Data Structure for RMQ Lemma: For an array of length n, let s(n) denote the size of a data structure, and let t(n) denote the time complexity for a query. A data structure satisfying the following formulas can be constructed in O(n) time [1]. 8n t (n) O(1) t lg n 8n o(n) (bits) s (n) 4n s lg n Note: the original input array is not used for a query. 3 Cartesian Tree [2] • The Cartesian tree for an array A[1,n] consists of – The root note: stores the minimum A[i] in A[1,n] – Left subtree: the Cartesian tree for A[1,i1] – Right subtree: the Cartesian tree for A[i+1,n] A 143504537 0 1 3 3 4 4 5 5 7 4 Relation between Cartesian Tree and RMQ • RMQA(i,j) = lca(i,j) A 143504537 0 1 3 3 4 4 5 5 7 5 Problem (lca, lowest common ancestor) • Input: a rooted tree T and its two nodes x, y • Output: the lowest common ancestor of x, y Known results • lca is found in constant time after linear time preprocessing. [Harel, Tarjan SICOMP84] [Schieber, Vishkin SICOMP88] [Bender, Farach-Colton 00] 6 A Property of Cartesian Tree Lemma: If A[n] is added to the Cartesian tree for A[1,n1], the node for A[n] is on the path from the root to the rightmost leaf. Proof: Because A[n] is the rightmost element in the 0 array, it is never stored in a left-child. 1 1 2064 2 3 3 3 4 5 4 54 4 5 5 6 7 Construction of Cartesian Tree When we add A[n] to the tree for A[1,n1] • Compare A[n] with elements on the path from A[n1] to the root • If an element x smaller than A[n] appears,insert A[n] • Let the right child of x the left child of A[n] 1 1 3 3 4 4 5 4 8 5 Time Complexity Lemma: Cartesian tree is constructed in O(n) time Proof: Let the number of comparisons to insert A[i] be ci. Then the total time complexity is n Oc i 1 i Each node on the rightmost path of the Cartesian tree becomes the left child of A[i] after the insertion. Therefore it is compared with A[i] only once. This implies that the total number of comparisons is 9 at most 2n. Thus the time complexity is O(n). BP Representation of Cartesian Tree A 143504537 0 1 3 3 4 4 5 5 7 P’123234543434543212123434543232343210 P ((()((())()(())))()((()(()))()(()))) 1 4 3 5 0 4 5 3 7 10 Algorithm for RMQ [3] • Construct the Cartesian tree for A[1,n] • Convert the Cartesian tree to BP sequence P and depth sequence P’ To find the position m of the leftmost minimum value in A[i,j] • i’ = select()(P,i), j’ = select()(P,j) • Let m’ be the position of the minimum in P’[i’, j’], then m = rank()(P,m’)+1 P’123234543434543212123434543232343210 P ((()((())()(())))()((()(()))()(()))) 11 1 4 3 5 0 4 5 3 7 RMQ on P’ • Divide P’ into blocks of length w = (lg n)/2 • Store minimum values of the blocks in B • Minimum value in P’[i’, j’] is either – Minimum value in the block containing i’ – Minimum value in the block containing j’, or – Minimum value in blocks between those blocks P (()((()())())(()())()) P’ 1212343432321232321210 B 1 3 2 1 1 0 12 Complexities • Length of P: 4n • Length of B: 4n/w = 8n/lg n • RMQ inside a block is done in O(1) time by a table lookup Size of the table O2 1 logn 2 log n log log n O n log n log log n • Complexities 2 2 8n t ( n) O(1) t lg n 8n o(n) s (n) 4n s lg n 13 Sparse Table Algorithm [2] • For each interval [i,i+2k1] in array B[1,m], store the minimum value in M[i,k]. ( i = 1 ,...,m , k = 1 ,2, ・・・ ,lg m ) • For a given query interval [s,b] 1. Let k = lg(bs) 2. Compare M[s, k] and M[b2k+1, k], and output the minimum. • O(1) time, O(m lg2 m) bit space 0 3 B 143504537 14 • This data structure is used when the length of B becomes O(n/lg3 n) ⇒o(n) bit space Theorem: RMQ is computed in O(1) time using a 4n+o(n) bit data structure. 15 2n bits Data Structure for RMQ [4] • 2d-Min-Heap of array A[1..n] is a tree consisting of nodes v0,…,vn, and the parent vj of vi satisfies –j<i – A[j] < A[i] – A[k] A[i] for all j < k < i A 143504537 • Parent value is smaller than child value • Child values are sorted in decreasing order 0 1 3 3 4 4 5 5 16 7 Lemma: Let l = lca(i,j). • If l = i, RMQA(i,j) = i • If l i, RMQA(i,j) is a child of l and on the path between l and j A 143504537 Proof: If l = i, i+1,…, j are descendants of i. From the definition of tree, they i1 = l1= r1 are larger than A[i]. 0 If l i, it holds l < i. l2 1 r2 Children of l are sorted in i2 4 3 decreasing order of their values. Thus the rightmost one is the smallest. 5 j 2 3 4 j1 5 17 7 • By using DFUDS U representing 2d-Min-Heap, RMQ is computed as follows. – – – – x = select)(U, i+1) y = select)(U, j) w = RMQE(x, y) if rank)(U, findopen(U, w)) = i l return i – else return rank)(U, w) i = 3, j = 9 A 143504537 0 r 1 3 3 xw y U ((()(())())(()())()) E 12323432321232321210 4 4 i 5 5 7 18 j Range Min-Max Trees [5] • In existing succinct data structures for trees, for each operation to be supported, a new index is added. • The o(n) term cannot be ignored. – The recursive method [6] uses 3.73n bits to support only findopen, findclose, enclose. • It is preferable if various operations can be supported by an index 19 Definitions • For a vector P[0..2n-1] and a function g t sumP, g , s, t g Pi def is fwd_search( P, g , s, d ) min i | sumP, g , s 1, i d def is bwd_search( P, g , s, d ) max i | sumP, g , i 1, s d def is rmq( P, g , s, t ) min sumP, g , s, i def s i t rmqi( P, g , s, t ) arg min sumP, g , s, i def s i t • RMQ, RMQi are defined similarly (range maximum) 20 How to support operations on balanced parentheses sequence • Lemma: Let be a function s.t. (() = 1, ()) = 1 findclose ( P, i ) fwd_search( P, , i,1) findopen( P, i ) bwd_search( P, , i,0) 1 enclose ( P, i ) bwd_search( P, , i,2) 1 level_ancestor ( P, i, d ) bwd_search( P, , i, d 1) 1 enclose findclose P (()((()())())(()())()) E 1212343432321232321210 21 Implementing rank/select • Let , be functions s.t. (0)=0, (1)=1, (0)=1, (1)=0 rank1 ( P, i ) sum( P, ,0, i ) select 1 ( P, i ) fwd_search( P, ,0, i ) rank0 ( P, i ) sum( P, ,0, i ) select 0 ( P, i ) fwd_search( P, ,0,i ) • rank/select and parentheses operations can be handled in a unified manner. 22 References [1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造. 日本データベース学会Letters Vol.2, No.1, pp.103-106. [2] Michael A. Bender, Martin Farach-Colton: The LCA Problem Revisited. LATIN 2000: 88-94 [3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1): 12-22 (2007) [4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries. LATIN 2010: 158-169 [5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct Trees. SODA 2010: 134-149. [6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008. 23
© Copyright 2026 Paperzz