Balanced BST

Balanced BST
 Balanced BSTs guarantee O(logN)
performance at all times
 the height or left and right sub-trees
are about the same
 simple BST are O(N) in the worst case
 Categories of BSTs
 AVL, SPLAY trees: dynamic environment
 optimal trees: static environment
E.G.M. Petrakis
Trees in Main Memory
1
AVL Trees
 AVL (Adelson, Lelskii, Landis): the height
of the left and right subtrees differ by at
most 1
 the same for every subtree
 Number of comparisons for membership
operations
 best case: completely balanced
 worst case: 1.44 log(N+2)
 expected case: logN + .25 <= log(N
E.G.M. Petrakis
Trees in Main Memory
 1) 
2
AVL Trees
 “—” : completely balanced sub-tree
 “/” : the left sub-tree is 1 higher
 “\” : the right sub-tree is 1 higher
E.G.M. Petrakis
Trees in Main Memory
3
AVL Trees
E.G.M. Petrakis
Trees in Main Memory
4
Non AVL Trees
critical node
“//”
“\\”
E.G.M. Petrakis
: the left sub-tree is 2 higher
: the right sub-tree is 2 higher
Trees in Main Memory
5
Single Right Rotations
 Insertions or deletions may result in non AVL
trees => apply rotations to balance the tree
Α
Β
Α
T3
T1
E.G.M. Petrakis
Β
T1
T2
T2
Trees in Main Memory
T3
6
Single Left Rotations
B
Α
Α
T1
T2
E.G.M. Petrakis
Α
Β
T3
T1
Trees in Main Memory
T3
T2
7
4
4
insert 1
2
2
2
single right
rotation
1
4
1
4
8
4
insert 9
8
single left rotation
4
8
9
9
E.G.M. Petrakis
Trees in Main Memory
8
Double Left Rotation
 Composition of two single rotations (one
right and one left rotation)
Β
Α
Α
C
Α
C
C
Α
Β
A
A

T4
T4

B
Α
T3
T2
E.G.M. Petrakis
or
T1
Α
Τ2
Α
T3
T4
T2
Τ1
T3
or
T1
or in Main Memory
Trees
9
Example of Double Left Rotation
7
Critical node
4
4 \\
4
2
2 =
7
/ 8 =


2
6
/ 7 =
6
9
9
6
8
=
9
insert 6
E.G.M. Petrakis
8
Trees in Main Memory
10
Double Right Rotation
 Composition of two single rotations (one
left and one right rotation)
B
Α
Α
A
Α
B
Α
A
T3

T4

T2
T3
E.G.M. Petrakis
or
C
T4
Β
Α
T1
Α
Β
C
Α
T1
T1
T2
Τ4
T3
or
T2
Trees in MainorMemory
11
Insertion (deletion) in AVL
1. Follow the search path to verify
that the key is not already there
2. Insert (delete) the key
3. Retreat along the search path and
check the balance factor
4. Rebalance if necessary (see next)
E.G.M. Petrakis
Trees in Main Memory
12
Rebalancing
 For every node reached coming up from its
left sub-tree after insertion readjust
balance factor
 ‘=’ becomes ‘/’ => no operation
 ‘\’ becomes ‘=’ => no operation
 ‘/’ becomes ‘//’ => must be rebalanced!!
 The “//” node becomes a “critical node”
 Only the path from the critical node to the
leaf has to be rebalanced!!
 Rotation is applied only at the critical node!
E.G.M. Petrakis
Trees in Main Memory
13
Rebalancing (cont.)
 The balance factor of the critical node
determines what rotation is to take place
 single or double
 If the child and the grand child (inserted
node) of the critical node are on the same
direction (both “/”) => single rotation
 Else => double rotation
 Rebalance similarly if coming up from the
right sub-tree (opposite signs)
E.G.M. Petrakis
Trees in Main Memory
14
Performance
 Performance of membership operations on
AVL trees:
 easy for the worst case!
 An AVL tree will never be more than 45%
higher that its perfectly balanced
counterpart (Adelson, Velskii, Landis):
 log(N+1) <= hb(N) <=
l.4404log(N+2) – 0.302
E.G.M. Petrakis
Trees in Main Memory
15
Worst case AVL
 Sparse tree => each sub-tree has
minimum number of nodes
E.G.M. Petrakis
Trees in Main Memory
16
Fibonacci Trees
 Th: tree of height h
 Th has two sub-trees, one with height h-1
and one with height h-2
 else it wouldn’t have minimum number of nodes
 T0 is the empty sub-tree (also Fibonacci)
 T1 is the sub-tree with 1 node (also
Fibonacci)
E.G.M. Petrakis
Trees in Main Memory
17
Fibonacci Trees (cont.)
 Average height




Nh number of nodes of Th
Nh = Nh-1 + Nh-2 + 1
N0 = 1
N1 = 2
1
Nh 
5
h 2
1  5 


 2 


1

5
h 2
1  5 


 2 


1
 From which h <= 1.44 log(N+1)
E.G.M. Petrakis
Trees in Main Memory
18
More Examples
single rotation
7
7
8
8
8
7
9
insert 9
9
E.G.M. Petrakis
Trees in Main Memory
19
Examples (cont.)
double rotation
7
6
6
8
8
insert 7
E.G.M. Petrakis
6
8
7
Trees in Main Memory
20
Examples (cont.)
double rotation
8
8
6
6
insert 7
E.G.M. Petrakis
7
6
8
7
Trees in Main Memory
21
Examples (cont.)
single rotation
7
7
6
8
8
8
9
E.G.M. Petrakis
delete 6
7
9
9
Trees in Main Memory
22
Examples (cont.)
single rotation
5
6
8
6
8
8
7
9
7
9
6
9
7
delete 5
E.G.M. Petrakis
Trees in Main Memory
23
Examples (cont.)
double rotation
5
6
7
6
8
8
7
6
8
7
delete 5
E.G.M. Petrakis
Trees in Main Memory
24
General Deletions
5
5
3
2
1
8
4
6
E.G.M. Petrakis
7
2
delete 4
1
10
9
8
11
Trees in Main Memory
3
6
7
delete 8
10
9
11
25
General Deletions (cont.)
5
2
5
2
7
delete 8
10
delete 5
delete 6
1
3
6
9
E.G.M. Petrakis
1
10
11
Trees in Main Memory
3
7
11
9
26
General Deletions (cont.)
3
2
delete 5
1
10
7
11
9
E.G.M. Petrakis
Trees in Main Memory
27
Self Organizing Search
 Splay Trees: adapt to query patterns
 move to root (front): whenever a node is
accessed move it to the root using
rotations
 equivalent to move-to-front in arrays
 current node
 insertions: inserted node
 deletions: father of deleted node or null
if this is the root
membership: Trees
thein Main
last
accessed node
E.G.M.Petrakis
Memory
28
20
Search(10)
30
15
8
second rotation
13
14
8
14
20
10
8
30
15
first rotation 10
13
10
20
10
30
15
13
third rotation
8
20
15
30
13
14
14
E.G.M. Petrakis
Trees in Main Memory
29
Splay Cases

If the current node q has no
grandfather but it has father p =>
only one single rotation
–
two symmetric cases: L, R
p
b
E.G.M. Petrakis
p
q
a
q
c
a
Trees in Main Memory
b
c
30
 If p has also grandfather qp => 4 cases
gp
p
a
q
c
q
b
E.G.M. Petrakis
RL
p
c
gp
a
d
gp
a
p
LL
b
c
q
b c
LL symmetric of RR
RL symmetric of RL
Trees in Main Memory
d
b
gp
a
d
q
p
d
31
1
1
a
q
α
j
8
c
4
current
node
5
E.G.M. Petrakis
g
5
4
e
h
6
g
d
LL
7
h
3
i
2
RR
7
6
j
8
i
2
q
f
c
e
3
f
d
Trees in Main Memory
32
1
a
q
α
j
8
6
4
3
g
f
c
RL
8
4
3
7
j
2
5
b
q
5
LR
i
2
c
1
i
6
e
d
e
7
f
g
h
a, b, c … are sub-trees
E.G.M. Petrakis
Trees in Main Memory
33
5
1
a
q
8
2
b
c
E.G.M. Petrakis
6
4
3
i
7
f
e
j
g
h
d
Trees in Main Memory
34
Splay Performance
 Splay trees adapt to unknown or changing
probability distributions
 Splay trees do not guarantee logarithmic
cost for each access
 AVL trees do!
 asymptotic cost close to the cost of the optimal
BST for unknown probability distributions
 It can be shown that the “cost of m
operations on an initially empty splay tree,
where n are insertions is O(mlogn) in the
worst case”
E.G.M. Petrakis
Trees in Main Memory
35
Optimal BST
 Static environment: no insertions or
deletions
 Keys are accessed with various
frequencies
 Have the most frequently accessed
keys near the root
 Application: a symbol table in main
memory
E.G.M. Petrakis
Trees in Main Memory
36
Searching
 Given symbols a1 < a2 < ….< an and their
probabilities: p1, p2, … pn minimize
cost
n
 Successful search cos t   pi level (ai )
i 1
 Transform unsuccessful to successful
 consider new symbols E1, E2, … En
-  … α1 … α2 …αi … αi+1 ….…αn… αn+1
E0
E1
Ei= (αi , αi+1 )
E.G.M. Petrakis
E2
Ei
E0= (-  , α1 )
Trees in Main Memory
En
En= (αn , )
37
Unsuccessful Search
an
an-1
an-2
Ei
unsuccessful search for
all values in Ei
terminates on the same
failure node (in fact, one
node higher)
failure node
an-3
E.G.M. Petrakis
Trees in Main Memory
38
Example
(a1, a2, a3) = (do, if, read)
p i = q i = 1/7
if
do
ifif
read
do
read
if
read
do
cost = 13/7
Optimal BST
cost = 15/7
E.G.M. Petrakis
Trees in Main Memory
cost = 15/7
39
read
do
read
do
if
if
cost = 15/7
cost = 15/7
E.G.M. Petrakis
Trees in Main Memory
40
Search Cost
 If pi is the probability to search for
ai and qi is the probability to search in
Ei then
 p  q  1
n
i 1
n
i
i
i 1
n
n
i 1
i 1
cost   pi level(ai)   qi {level (Ei)  1}
E.G.M. Petrakis
successful
search
Trees in Main Memory
unsuccessful
search
41
Observation 1
 In a BST, a subtree has nodes
that are consecutive in a sorted
sequence of keys (e.g. [5,26])
20
10
5
13
12
E.G.M. Petrakis
25
24
26
14
Trees in Main Memory
42
Observation 2
 If Tij is a sub-tree of an optimal
BST holding keys from i to j then
 Tij must be optimal among all
possible BSTs that store the same
keys
 optimality lemma: all sub-trees of
an optimal BST are also optimal
E.G.M. Petrakis
Trees in Main Memory
43
Optimal BST Construction
1) Construct all possible trees:
1  2n 
  trees!!
NP-hard, there are
n 1  n 
2) Dynamic programming solves the
problem in polynomial time O(n3)
at each step, the algorithm finds and
stores the optimal tree in each range
of key values
 increases the size of the range at each
step until the
range is obtained 44
E.G.M. Petrakis
Treeswhole
in Main Memory

Example (successful search only):
keys
1
probabilities 0,3
10
0,2
20
0,1
40
0,4
1) BSTs with 1 node
range 1
cost= 0.3
10
0.2
20
0.1
2) BSTs with 2 nodes
range 1-10
optimal
1
40
0.4
k=1-10 k=10-20 k=20-20
range 10-20
optimal
10
10
cost=0.3 1+0.2 2=0.7
cost=0.2 1+0,3.2=0.8
E.G.M. Petrakis
20
20
cost=0.2+0.1 2=0.4
10
1
range 20-40
20
10
cost=0.1+0.2 2=0.5
Trees in Main Memory
40
cost=0.1+0.8=0.9
40
optimal
20
cost=0.4+0.2=0.6
45
3) BSTs with 3 nodes
k=10-40
range 10-40
10
40
20
k=1-20
range 1-20
20
1
10
cost=0.1+2 0.3+3 0.2=1.3
cost=0.2+2 0.4+3 0.1=1.3
10
20
1
20
cost=0.2+2(0.3+0.1)=1
1
10
40
cost=0.1+2(0.2+0.4)=1.3
optimal
10
10
20
20
cost=0.4+2 0.2+30.1=1.1
cost=0.3+2 0.2+3 0.1=1
E.G.M. Petrakis
40
Trees in Main Memory
46
4) BSTs with 4 nodes
range 1-40
1
40
cost=0.3+2 0.4+3 0.2+4 0.1=2.1
10
20
10
1
40
20
cost=0.2+2(0.3+0.4)+3 0.1=1.9
OPTIMAL BST
20
40
1
cost=0.1+2(0.3+0.4)+3 0.2=2.1
10
40
1
cost=0.4+2 0.3+3 0.2+4 0.1=2
E.G.M. Petrakis10
Trees in Main Memory
20
47
Complexity
Compute all optimal BSTs for all Cij, i,j=1,2..n
Let m=j-i: number of keys in range Cij
n-m-1 Cij’s must be computed
The one with the minimum cost must be
found, this takes O(m(n-m-1)) time
2
3
(nm

m
)

O(n
)
 For all Cij’s it takes 
1 m n
 There is a better O(n2) algorithm by Knuth
 There is also a O(n) greedy algorithm




E.G.M. Petrakis
Trees in Main Memory
48
Optimal BSTs
 High probability keys should be near
the root
 But, the value of a key is also a factor
 It may not be desirable to put the
smallest or largest key close to the
root => this may result in skinny trees
(e.g., lists)
E.G.M. Petrakis
Trees in Main Memory
49