Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di Pisa Generalised Rank and Select Rank(c,i) = #c in L[1,i] Select(c,i) = position of the i-th c in L L = a b a a a c b c d a b e c d ... Select( a Rank( ,2)=3 a,7)=4 Paolo Ferragina, Università di Pisa Generalised Rank and Select If S is small (i.e. constant) Build binary Rank data structure for each symbol of S Rank takes O(1) time and small space If S is large (words ?) Need a smarter solution: Wavelet Tree data structure Algorithmic reduction: >> Reduce Rank&Select over arbitrary strings ... to Rank&Select over binary strings Paolo Ferragina, Università di Pisa The Wavelet Tree abracadabra (Alphabetic ?) Tree a c d b Paolo Ferragina, Università di Pisa r The Wavelet Tree abracadabra aacaaa brdbr brbr a ? c d aaaaa d b bb Paolo Ferragina, Università di Pisa ? ? r rr ? The Wavelet Tree abracadabra abracadabra 01100010110 01100010110 brdbr brdbr 00100 aacaaa aacaaa 001000 001000 brbr brbr 0101 a c d b Paolo Ferragina, Università di Pisa r In any case, O(|S| log |S|) bits. Easier Alphabetic order + Heap structure Fact. Given the tree and the binary strings, we can recover the original string !! The Wavelet Tree Reduce to right symbols Rank(b,8) abracadabra 01100010110 aacaaa 001000 a It’s binary c d b Paolo Ferragina, Università di Pisa brdbr 00100 Rank(b,2) brbr 0101 Rank(b,3) Reduce to left symbols r Every step can be turned to binary The Wavelet Tree Rank(b,8) abracadabra 01100010110 aacaaa 001000 Select is similar Rank1(8)=3 Rank0(2) = 2 – Rank1(1)= 1 brdbr 00100 brbr 0101 a c d b Paolo Ferragina, Università di Pisa r Right move = Rank1 Rank0(3) = 3 – Rank1(3)= 2 Left move = Rank0 Left move = Rank0 Generalised R&S implemented with log |S| binary R&S Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa Paolo Ferragina, Università di Pisa Standard representation Binary tree: each node has two pointers to its left and right children An n-node tree takes 2n pointers or 2n lg n bits. x x x x x x x x x Supports finding left child or right child of a node (in constant time). For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each. Can we improve the space bound? There are less than 22n distinct binary trees on n nodes. 2n bits are enough to distinguish between any two different binary trees. Can we represent an n node binary tree using 2n bits? Binary tree representation A binary tree on n nodes can be represented using 2n+o(n) bits to support: parent left child right child in constant time. Heap-like notation for a binary tree 1 Add external nodes 1 Label internal nodes with a 1 and external nodes with a 0 Write the labels in level order 11110110100100000 1 1 0 1 0 1 0 0 0 One can reconstruct the tree from this sequence An n node binary tree can be represented in 2n+1 bits. What about the operations? 1 01 0 0 0 Heap-like notation for a binary tree 1 x x: # 1’s up to x (Rank) 1 2 x x: position of x-th 1 (Select) 2 4 left child(x) = On green(2x) 5 9 right child(x) = On green(2x+1) 7 10 5 11 14 7 12 7 parent(x) = On red (⌊x/2⌋) 5 6 3 6 4 8 1 2 3 4 3 6 13 8 15 8 1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 16 17
© Copyright 2026 Paperzz