Rank 1 - DidaWiki

Compressed Rank & Select
on general strings
Paolo Ferragina
Dipartimento di Informatica, Università di Pisa
Paolo Ferragina, Università di Pisa
Generalised Rank and Select
 Rank(c,i) = #c in L[1,i]
 Select(c,i) = position of the i-th c in L
L = a b a a a c b c d a b e c d ...
Select( a Rank(
,2)=3
a,7)=4
Paolo Ferragina, Università di Pisa
Generalised Rank and Select
 If S is small (i.e. constant)
 Build binary Rank data structure for each symbol of S
 Rank takes O(1) time and small space
 If S is large (words ?)
 Need a smarter solution: Wavelet Tree data structure
Algorithmic reduction:
>> Reduce Rank&Select over arbitrary strings
... to Rank&Select over binary strings
Paolo Ferragina, Università di Pisa
The Wavelet Tree
abracadabra
(Alphabetic ?)
Tree
a
c
d
b
Paolo Ferragina, Università di Pisa
r
The Wavelet Tree
abracadabra
aacaaa
brdbr
brbr
a
?
c
d
aaaaa
d
b
bb
Paolo Ferragina, Università di Pisa
?
?
r
rr
?
The Wavelet Tree
abracadabra
abracadabra
01100010110
01100010110
brdbr
brdbr
00100
aacaaa
aacaaa
001000
001000
brbr
brbr
0101
a
c
d
b
Paolo Ferragina, Università di Pisa
r
In any case, O(|S| log |S|) bits.
Easier Alphabetic order + Heap structure
Fact. Given the tree and the binary strings,
we can recover the original string !!
The Wavelet Tree
Reduce
to
right
symbols
Rank(b,8)
abracadabra
01100010110
aacaaa
001000
a
It’s binary
c
d
b
Paolo Ferragina, Università di Pisa
brdbr
00100
Rank(b,2)
brbr
0101
Rank(b,3)
Reduce
to
left
symbols
r
Every step can be turned to binary
The Wavelet Tree
Rank(b,8)
abracadabra
01100010110
aacaaa
001000
Select is
similar
Rank1(8)=3
Rank0(2) =
2 – Rank1(1)= 1
brdbr
00100
brbr
0101
a
c
d
b
Paolo Ferragina, Università di Pisa
r
Right move
=
Rank1
Rank0(3) =
3 – Rank1(3)= 2
Left move
=
Rank0
Left move
=
Rank0
Generalised R&S implemented with log |S| binary R&S
Representing Trees
Paolo Ferragina
Dipartimento di Informatica, Università di Pisa
Paolo Ferragina, Università di Pisa
Standard representation
Binary tree: each node has two
pointers to its left and right children
An n-node tree takes
2n pointers or 2n lg n bits.
x
x
x
x
x
x
x
x
x
Supports finding left child or right child of a node (in
constant time).
For each extra operation (eg. parent, subtree size) we have
to pay additional n lg n bits each.
Can we improve the space bound?
 There are less than 22n distinct binary trees on n nodes.
 2n bits are enough to distinguish between any two
different binary trees.
 Can we represent an n node binary tree using 2n bits?
Binary tree representation
 A binary tree on n nodes can be represented using
2n+o(n) bits to support:



parent
left child
right child
in constant time.
Heap-like notation for a binary tree
1
Add external nodes
1
Label internal nodes with a 1
and external nodes with a 0
Write the labels in level order
11110110100100000
1
1
0 1
0
1
0
0
0
One can reconstruct the tree from this sequence
An n node binary tree can be represented in 2n+1 bits.
What about the operations?
1
01
0
0
0
Heap-like notation for a binary tree
1
x  x: # 1’s up to x
(Rank)
1
2
x  x: position of x-th 1 (Select)
2
4
left child(x) = On green(2x)
5
9
right child(x) = On green(2x+1)
7
10
5
11
14
7
12
7
parent(x) = On red (⌊x/2⌋)
5 6
3
6
4
8
1 2 3 4
3
6
13
8
15
8
1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
16
17