j - Researchmap

Succinct Data Structures
Kunihiko Sadakane
National Institute of Informatics
Range Minimum Query
Problem (RMQ)
• Input: array A[1,n] (preprocessing is allowed),
interval of indices [i,j]  [1,n]
• Output: the index of a minimum value in
sub-array A[i,j]
123456789
RMQA(3,6) = 5
A 143504537
RMQA(6,8) = 8
2
A Data Structure for RMQ
Lemma: For an array of length n, let s(n) denote the
size of a data structure, and let t(n) denote the time
complexity for a query. A data structure satisfying
the following formulas can be constructed in O(n)
time [1].
 8n 

t (n)  O(1)  t 
 lg n 
 8n 
  o(n) (bits)
s (n)  4n  s
 lg n 
Note: the original input array is not used for a query.
3
Cartesian Tree [2]
• The Cartesian tree for an array A[1,n] consists of
– The root note: stores the minimum A[i] in A[1,n]
– Left subtree: the Cartesian tree for A[1,i1]
– Right subtree: the Cartesian tree for A[i+1,n]
A 143504537
0
1
3
3
4
4
5
5
7
4
Relation between
Cartesian Tree and RMQ
• RMQA(i,j) = lca(i,j)
A 143504537
0
1
3
3
4
4
5
5
7
5
Problem (lca, lowest common ancestor)
• Input: a rooted tree T and its two nodes x, y
• Output: the lowest common ancestor of x, y
Known results
• lca is found in constant time after linear time
preprocessing.
[Harel, Tarjan SICOMP84]
[Schieber, Vishkin SICOMP88]
[Bender, Farach-Colton 00]
6
A Property of Cartesian Tree
Lemma: If A[n] is added to the Cartesian tree for
A[1,n1], the node for A[n] is on the path from the
root to the rightmost leaf.
Proof: Because A[n] is the rightmost element in the
0
array, it is never stored in a left-child.
1
1
2064
2
3
3
3
4
5
4
54
4
5
5
6
7
Construction of Cartesian Tree
When we add A[n] to the tree for A[1,n1]
• Compare A[n] with elements on the path from
A[n1] to the root
• If an element x smaller than A[n] appears,insert A[n]
• Let the right child of x the left child of A[n]
1
1
3
3
4
4
5
4
8
5
Time Complexity
Lemma: Cartesian tree is constructed in O(n) time
Proof: Let the number of comparisons to insert A[i] be
ci. Then the total time complexity is
n
 Oc 
i 1
i
Each node on the rightmost path of the Cartesian tree
becomes the left child of A[i] after the insertion.
Therefore it is compared with A[i] only once.
This implies that the total number of comparisons is
9
at most 2n. Thus the time complexity is O(n).
BP Representation of Cartesian Tree
A 143504537
0
1
3
3
4
4
5
5
7
P’123234543434543212123434543232343210
P ((()((())()(())))()((()(()))()(())))
1
4
3
5
0
4
5
3
7
10
Algorithm for RMQ [3]
• Construct the Cartesian tree for A[1,n]
• Convert the Cartesian tree to BP sequence P and
depth sequence P’
To find the position m of the leftmost minimum value
in A[i,j]
• i’ = select()(P,i), j’ = select()(P,j)
• Let m’ be the position of the minimum in P’[i’, j’],
then m = rank()(P,m’)+1
P’123234543434543212123434543232343210
P ((()((())()(())))()((()(()))()(())))
11
1
4
3
5
0
4
5
3
7
RMQ on P’
• Divide P’ into blocks of length w = (lg n)/2
• Store minimum values of the blocks in B
• Minimum value in P’[i’, j’] is either
– Minimum value in the block containing i’
– Minimum value in the block containing j’, or
– Minimum value in blocks between those blocks
P (()((()())())(()())())
P’ 1212343432321232321210
B
1
3
2
1
1
0
12
Complexities
• Length of P: 4n
• Length of B: 4n/w = 8n/lg n
• RMQ inside a block is done in O(1) time by
a table lookup
Size of the table

O2
1 logn
2
 
log n log log n  O n log n log log n
• Complexities
2
2

 8n 

t ( n)  O(1)  t 
 lg n 
 8n 
  o(n)
s (n)  4n  s
 lg n 
13
Sparse Table Algorithm [2]
• For each interval [i,i+2k1] in array B[1,m],
store the minimum value in M[i,k].
( i = 1 ,...,m , k = 1 ,2, ・・・ ,lg m )
• For a given query interval [s,b]
1. Let k = lg(bs)
2. Compare M[s, k] and M[b2k+1, k], and
output the minimum.
• O(1) time, O(m lg2 m) bit space
0
3
B 143504537
14
• This data structure is used when the length of B
becomes O(n/lg3 n)
⇒o(n) bit space
Theorem: RMQ is computed in O(1) time using a
4n+o(n) bit data structure.
15
2n bits Data Structure for RMQ [4]
• 2d-Min-Heap of array A[1..n] is a tree consisting of
nodes v0,…,vn, and the parent vj of vi satisfies
–j<i
– A[j] < A[i]
– A[k]  A[i] for all j < k < i
A 143504537

• Parent value is smaller than
child value
• Child values are sorted in
decreasing order
0
1
3
3
4
4
5
5
16
7
Lemma: Let l = lca(i,j).
• If l = i, RMQA(i,j) = i
• If l  i, RMQA(i,j) is a child of l and on the path
between l and j
A 143504537
Proof: If l = i, i+1,…, j are descendants
of i. From the definition of tree, they

i1 = l1= r1
are larger than A[i].
0
If l  i, it holds l < i.
l2 1
r2
Children of l are sorted in
i2 4 3
decreasing order of their values.
Thus the rightmost one is the smallest. 5 j
2
3
4
j1
5
17
7
• By using DFUDS U representing 2d-Min-Heap,
RMQ is computed as follows.
–
–
–
–
x = select)(U, i+1)
y = select)(U, j)
w = RMQE(x, y)
if rank)(U, findopen(U, w)) = i
 l
return i
– else return rank)(U, w)
i = 3, j = 9
A 143504537
0
r
1
3
3
xw
y
U ((()(())())(()())())
E 12323432321232321210
4
4
i
5
5
7
18
j
Range Min-Max Trees [5]
• In existing succinct data structures for trees, for each
operation to be supported, a new index is added.
• The o(n) term cannot be ignored.
– The recursive method [6] uses 3.73n bits to support
only findopen, findclose, enclose.
• It is preferable if various operations can be
supported by an index
19
Definitions
• For a vector P[0..2n-1] and a function g
t
sumP, g , s, t    g Pi 
def
is
fwd_search( P, g , s, d )  min i | sumP, g , s  1, i   d 
def
is
bwd_search( P, g , s, d )  max i | sumP, g , i  1, s   d 
def
is
rmq( P, g , s, t )  min sumP, g , s, i 
def
s i t
rmqi( P, g , s, t )  arg min sumP, g , s, i 
def
s i t
• RMQ, RMQi are defined similarly (range maximum)
20
How to support operations on
balanced parentheses sequence
• Lemma: Let  be a function s.t. (() = 1, ()) = 1
findclose ( P, i )  fwd_search( P,  , i,1)
findopen( P, i )  bwd_search( P,  , i,0)  1
enclose ( P, i )  bwd_search( P,  , i,2)  1
level_ancestor ( P, i, d )  bwd_search( P,  , i, d  1)  1
enclose
findclose
P (()((()())())(()())())
E 1212343432321232321210
21
Implementing rank/select
• Let ,  be functions s.t.  (0)=0,  (1)=1,  (0)=1,
 (1)=0
rank1 ( P, i )  sum( P,  ,0, i )
select 1 ( P, i )  fwd_search( P,  ,0, i )
rank0 ( P, i )   sum( P, ,0, i )
select 0 ( P, i )  fwd_search( P, ,0,i )
• rank/select and parentheses operations can be
handled in a unified manner.
22
References
[1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造.
日本データベース学会Letters Vol.2, No.1, pp.103-106.
[2] Michael A. Bender, Martin Farach-Colton: The LCA Problem
Revisited. LATIN 2000: 88-94
[3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval
systems. J. Discrete Algorithms 5(1): 12-22 (2007)
[4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries.
LATIN 2010: 158-169
[5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct
Trees. SODA 2010: 134-149.
[6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal
representation for balanced parentheses. Theoretical Computer
Science, 368:231–246, December 2006.
[7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008.
23