Researchmap

Succinct Data Structures
Kunihiko Sadakane
National Institute of Informatics
DFUDS Representation [6]
• It encodes the degrees of nodes in unary codes in
depth-first order
(DFUDS = Depth First Unary Degree Sequence)
1
• Degree d ⇒ d (’s, followed by a )
• Add a dummy ( at the beginning 2
6
• 2n bits
3
DFUDS
4
5
7
8
U ((()((())))(()))
1
2
3 4 5
6
7 8
2
Various Operations on DFUDS
• A node with degree d is represented by the first
position of its encoding (d)
– Position v in the sequence for a node and its preorder i
is converted each other by
v  preorder-select i   select ) i  1  1
i  preorder-rank v   rank ) v  1  1
• Degree: degree(v)  select ) (rank) (v)  1)  v
3
i-th child
child (v, i )  findclose select ) rank ) (v)  1  i   1
v
U1
U2 U3
(((()(())))((())))
v
1
2
6
5
3
4
7
8
9
4
Parent
parent (v)  select ) rank )  findopen(v  1)   1
p
2
5 6
(((()(())))((())))
1 p
2
6
5
3
4
7
8
9
5
Number of Descendants
(Subtree Size)
• Size of a subtree rooted at v is
subtreesize(v) = (findclose(enclose(v))v)/2+1
p
2
5 6
(((()(())))((())))
1 p
2
6
5
3
4
7
8
9
6
LCA on DFUDS [7]
• Can be computed by almost the same
operation for BP
• lca(x,y) = parent(RMQE(x,y1)+1)
• Leftmost minimum is used
2
3
DFUDS
6
5
7
8
E 1232345432123210
U ((()((())))(()))
1
BP
4
1
P
E
2
3 4 5
6
7 8
((()()())(()()))
1232323212323210
7
Let E[i] = rank((U,i)  rank)(U,i)
Let T1, T2,...,Tk denote subtrees of v,
DFUDS of v be U[l0..r0], E[r0] = d,
DFUDS of Ti be U[li..ri].
Lemma: E[ri] = E[ri-1]1 = di (1  i  k)
E[j] > E[ri] (li  j < ri)
v
r0
r1 r 2
r3
U (((()(())))((())))
E 123434543212343210
d
2
1
v
6
5
3
4
7
8
98
Proof: DFUDS U[li..ri] for each subtree will be
Balanced by adding a ( at the head
⇒ E[ri] = E[ri-1]1 = di
E[j] > E[ri] (li  j < ri)
v
r0
r1 r 2
r3
U (((()(())))((())))
E 123434543212343210
d
2
1
v
6
5
3
4
7
8
99
Lemma: lca(x,y) = parent(RMQE(x,y1)+1)
Proof: Let v = lca(x,y). Let T, T be subtrees of v
which contains x, y respectively. Let E[r] = d.
Case 1: If y < r (y is not the rightmost leaf of T)
From E[y]  d+1, E[y1]  d+2, RMQE(x,y1) = r1
T
E
T
d+1
> d+1
d
d1
10
Case 2: If y = r
From E[y1] = d+1, RMQE(x,y1) takes minimum
value d+1 at r1 and y1.
Because RMQ choose the leftmost one, we obtain r1.
In both cases, RMQE(x,y1)+1 = l holds and
parent(l) = lca(x,y) also holds.
11
Other Operations 1
•
•
•
•
leaf-rank(v) = rank))(v)
leaf-select(i) = select))(i)
preorder-rank(v) = (rank)(v1))+1
preorder-select(i) = (select)(i1))+1
1
2
34 5 6
7 8 9
U (((()(())))((())))
E 123434543212343210
1
2
6
5
3
4
7
8
12
9
Other Operations 2
•
•
•
•
inorder-rank(v) = leaf-rank(child(v,2)1)
inorder-select(i) = parent(leaf-select(i)+1)
leftmost-leaf(v) = leaf-select(leaf-rank(v1)+1)
rightmost-leaf(v) = findclose(enclose(v))
1
2
34 5 6
7 8 9
U (((()(())))((())))
E 123434543212343210
1
2
6
5
3
4
7
8
13
9
Range Minimum Query
Problem (RMQ)
• Input: array A[1,n] (preprocessing is allowed),
interval of indices [i,j]  [1,n]
• Output: the index of a minimum value in
sub-array A[i,j]
123456789
RMQA(3,6) = 5
A 143504537
RMQA(6,8) = 8
14
A Data Structure for RMQ
Lemma: For an array of length n, let s(n) denote the
size of a data structure, and let t(n) denote the time
complexity for a query. A data structure satisfying
the following formulas can be constructed in O(n)
time [1].
 8n 

t (n)  O(1)  t 
 lg n 
 8n 
  o(n) (bits)
s (n)  4n  s
 lg n 
Note: the original input array is not used for a query.
15
Cartesian Tree [2]
• The Cartesian tree for an array A[1,n] consists of
– The root note: stores the minimum A[i] in A[1,n]
– Left subtree: the Cartesian tree for A[1,i1]
– Right subtree: the Cartesian tree for A[i+1,n]
A 143504537
0
1
3
3
4
4
5
5
7
16
Relation between
Cartesian Tree and RMQ
• RMQA(i,j) = lca(i,j)
A 143504537
0
1
3
3
4
4
5
5
7
17
Problem (lca, lowest common ancestor)
• Input: a rooted tree T and its two nodes x, y
• Output: the lowest common ancestor of x, y
Known results
• lca is found in constant time after linear time
preprocessing.
[Harel, Tarjan SICOMP84]
[Schieber, Vishkin SICOMP88]
[Bender, Farach-Colton 00]
18
A Property of Cartesian Tree
Lemma: If A[n] is added to the Cartesian tree for
A[1,n1], the node for A[n] is on the path from the
root to the rightmost leaf.
Proof: Because A[n] is the rightmost element in the
0
array, it is never stored in a left-child.
1
1
2064
2
3
3
3
4
5
4
54
4
5
5
6
19
Construction of Cartesian Tree
When we add A[n] to the tree for A[1,n1]
• Compare A[n] with elements on the path from
A[n1] to the root
• If an element x smaller than A[n] appears,insert A[n]
• Let the right child of x the left child of A[n]
1
1
3
3
4
4
5
4
20
5
Time Complexity
Lemma: Cartesian tree is constructed in O(n) time
Proof: Let the number of comparisons to insert A[i] be
ci. Then the total time complexity is
n
 Oc 
i 1
i
Each node on the rightmost path of the Cartesian tree
becomes the left child of A[i] after the insertion.
Therefore it is compared with A[i] only once.
This implies that the total number of comparisons is
21
at most 2n. Thus the time complexity is O(n).
BP Representation of Cartesian Tree
A 143504537
0
1
3
3
4
4
5
5
7
P’123234543434543212123434543232343210
P ((()((())()(())))()((()(()))()(())))
1
4
3
5
0
4
5
3
7
22
Algorithm for RMQ [3]
• Construct the Cartesian tree for A[1,n]
• Convert the Cartesian tree to BP sequence P and
depth sequence P’
To find the position m of the leftmost minimum value
in A[i,j]
• i’ = select()(P,i), j’ = select()(P,j)
• Let m’ be the position of the minimum in P’[i’, j’],
then m = rank()(P,m’)+1
P’123234543434543212123434543232343210
P ((()((())()(())))()((()(()))()(())))
23
1
4
3
5
0
4
5
3
7
RMQ on P’
• Divide P’ into blocks of length w = (lg n)/2
• Store minimum values of the blocks in B
• Minimum value in P’[i’, j’] is either
– Minimum value in the block containing i’
– Minimum value in the block containing j’, or
– Minimum value in blocks between those blocks
P (()((()())())(()())())
P’ 1212343432321232321210
B
1
3
2
1
1
0
24
Complexities
• Length of P: 4n
• Length of B: 4n/w = 8n/lg n
• RMQ inside a block is done in O(1) time by
a table lookup
Size of the table

O2
1 logn
2
 
log n log log n  O n log n log log n
• Complexities
2
2

 8n 

t ( n)  O(1)  t 
 lg n 
 8n 
  o(n)
s (n)  4n  s
 lg n 
25
Sparse Table Algorithm [2]
• For each interval [i,i+2k1] in array B[1,m],
store the minimum value in M[i,k].
( i = 1 ,...,m , k = 1 ,2, ・・・ ,lg m )
• For a given query interval [s,b]
1. Let k = lg(bs)
2. Compare M[s, k] and M[b2k+1, k], and
output the minimum.
• O(1) time, O(m lg2 m) bit space
0
3
B 143504537
26
• This data structure is used when the length of B
becomes O(n/lg3 n)
⇒o(n) bit space
Theorem: RMQ is computed in O(1) time using a
4n+o(n) bit data structure.
27
References
[1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造.
日本データベース学会Letters Vol.2, No.1, pp.103-106.
[2] Michael A. Bender, Martin Farach-Colton: The LCA Problem
Revisited. LATIN 2000: 88-94
[3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval
systems. J. Discrete Algorithms 5(1): 12-22 (2007)
[4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries.
LATIN 2010: 158-169
[5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct
Trees. SODA 2010: 134-149.
[6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal
representation for balanced parentheses. Theoretical Computer
Science, 368:231–246, December 2006.
[7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008.
28