5-SkipLists 683KB Jan 14 2015 07:25:45 AM

Skip Lists
Linked Lists
• Fast modifications given a pointer
• Slow traversals to random point
Linked Lists
• Fast modifications given a pointer
• Slow traversals to random point
• What if we add an express lane?
Linked Lists
Find(x):
current = upper left
If current->value == x
return true
else if current->next <= x
move right
else if level > 0
move down
else return false
Fast Lane
• Optimal balance?
Fast Lane
• Optimal balance?
– 1 node in fast lane would cut work in half
– 2 nodes in fast lane would cut work into third
–…
Fast Lane
• Optimal balance?
– 1 node in fast lane would cut work in half
– 2 nodes in fast lane would cut work into third
–…
– n nodes in fast lane would not save any time
Fast Lane
• Assume fast lane (L2) breaks up slow lane into
equal sized pieces:
𝑛
𝑐𝑜𝑠𝑡 = 𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2 +
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2
Fast Lane
• Minimal cost comes when spend equal time
on both lists
𝑛
𝑐𝑜𝑠𝑡 = 𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2 +
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2
• To get minimal cost:
𝑛
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2 =
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2
Fast Lane
𝑛
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2 =
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2
2
=𝑛
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2 =
𝑛
Fast Lane
• Minimal cost comes when spend equal time
on both lists
𝑛
𝑐𝑜𝑠𝑡 = 𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2 +
𝑙𝑒𝑛𝑔𝑡ℎ 𝐿2
𝑛
𝑐𝑜𝑠𝑡 = 𝑛 +
𝑛
𝑐𝑜𝑠𝑡 = 2 𝑛
Fast Lanes
What if we have more than one fast lane?
Fast Lanes
What if we have more than one fast lane?
3
3 level: 3 𝑛
4 level: 4 4 𝑛
…
Fast Lanes
𝑘
• k levels: 𝑘 𝑛
• If k = log2n:
cost = log2n·
log2n
= log2n·𝑛
𝑛
1
𝑙𝑜𝑔2 𝑛
= log2n·2
= 2log2n = O(logn)!!!
So…
• log(n) performance if levels = log2n
• How do we get log2 levels?
So…
• log(n) performance if levels = log2n
• How do we get log2 levels?
Probabilistic Structure
• Adding a node:
– Find location in bottom list
– Add it
– While coin flip is heads
• Add to level above current
Real Structure
• Can be implemented as
– Quad node
– Node with just down/right poitners
• Ends marked with
Sentinel nodes or traditional head/null
Real Structure
• Can be implemented as
– Quad node
– Node with just down/right pointers
• Ends marked with
Sentinel nodes or traditional head/null
Real Structure
• Or nodes can be array of pointers
– Array size determined by coin flips for each new
node
• "Head" node set to some maximum size
Expectations
• Lowest level = n nodes
• Next level = n/2 nodes
• Next level = n/4 nodes
…
Where do we expect last node?
Expectations
• Lowest level = n nodes
100
• Next level = n/2 nodes
50
• Next level = n/4 nodes
25
…
Where do we expect last node?
Expectations
• Where do we expect last node?
• kth level = n/2k nodes
One node level
1 = n/2k
2k = n
log22k = log2n
k = log2(n)  Expected height
Total Nodes
• Many values will be represented more than
once:
• In n values, how many nodes?
Total Nodes
• Expected nodes where n = number values:
n + n/2 + n/4 + n/8…
or
n·(1 + 1/2 + 1/4 + 1/8 + …)
Series
• What is 1 + 1/2 + 1/4 + 1/8 + …?
Series
• What is 1 + 1/2 + 1/4 + 1/8 + …?
Call x
1 + 1/2 + 1/4 + 1/8 + …
Then 2x is 2 + 1 + 1/2 + 1/4 + ….
Series
• What is 1 + 1/2 + 1/4 + 1/8 + …?
Call x
1 + 1/2 + 1/4 + 1/8 + …
Then 2x is 2 + 1 + 1/2 + 1/4 + ….
2x – x = 2
Series
• What is 1 + 1/2 + 1/4 + 1/8 + …?
Call x
1 + 1/2 + 1/4 + 1/8 + …
Then 2x is 2 + 1 + 1/2 + 1/4 + ….
2x – x = 2
2x – x also = x
2 = x = 1 + 1/2 + 1/4 + 1/8…
Total Nodes
• Expected nodes where n = number values:
n + n/2 + n/4 + n/8…
or
n·(1 + 1/2 + 1/4 + 1/8 + …)
or
n·2
• Expected nodes is 2n
Average node has a height of 2
Search Efficiency
• How long should we expect search to take
Search Efficiency
Total distance = moves up + moves over
Search Efficiency
Total distance = moves up + moves over
= O(logn)
+ ??
Search Efficiency
Each move over has 50% chance to get to taller
node
Search Efficiency
Each move over has 50% chance to get to taller
node
Expect two moves over before moving up
Search Efficiency
Expected max moves up = logn
Expected moves over = 2logn = O(logn)
Search Efficiency
Total distance = moves up + moves over
= O(logn)
= O(logn)
+ O(logn)
Insert/Delete
• Find node/location = O(logn)
• Update pointers
– Expected average = 2 levels
– Expected max = logn
• Total = logn + logn
= O(logn)
So what…
• Alternative to AVL / RedBlack tree
• Easier to implement
• Easier to implement concurrency in
– Tree rebalancing can have global affect
Examples
• Basis of Java library Concurrent Map
• MemSQL storage mechanism: