Partial Lookup Services - Stanford InfoLab

Partial Lookup Services
Qixiang Sun
Hector Garcia-Molina
Stanford University
1
Problem
Resolve key lookups:
partial_lookup(k, 3)
lookup(k)
k = {v1,
{v}
v2,
v3,
v4,
…}
{ vv2,
} v3 }
{ v1,
2
Common Setup
• Partition the key space (e.g., hashing)
k1 = {v1,v2}
k2 = {v3}
S1
k3 = {v4,
v5, v6}
k4 = {v7}
k5 = {v8, v9}
k6 = {v10,
v11}
S2
S3
S4
3
Common Setup (2)
• Full replication
k  { v1, v2, v3, v4 }
v1, v2,
v3, v4
S1
v1, v2,
v3, v4
S2
v1, v2,
v3, v4
S3
v1, v2,
v3, v4
S4
4
Goal
Explore the solution space that exploits the
partial lookup characteristic
• Examine “simple” implementations
• Propose evaluation metrics
5
Simple Implementation (1)
• Fixed-x (e.g., x = 3)
k  { v1, v2, v3, v4, v5, v6, v7 }
v1, v2, v3
v1, v2, v3
v1, v2, v3
v1, v2, v3
S1
S2
S3
S4
6
Simple Implementation (2)
• RandServer-x (e.g., x=2)
k  { v1, v2, v3, v4, v5, v6, v7 }
Choose x entries at random
v2, v5
S1
v1, v4
S2
v2, v7
S3
v3, v4
S4
7
Simple Implementation (3)
• RoundRobin-y (e.g., y = 1, or y = 2)
k  { v1, v2, v3, v4, v5, v6, v7 }
v1, v5
v2, v6
v3, v7
v4
v4
v1, v5
v2, v6
v3, v7
S1
S2
S3
S4
8
Simple Implementation (4)
• Hash-y (e.g., y=1)
k  { v1, v2, v3, v4, v5, v6, v7 }
H(v)  {1, 2, 3, 4}
v1, v4, v5
S1
v6, v7
S2
v3
S3
v2
S4
9
Evaluation Metrics
• Coverage
• Support of Dynamism
• Lookup cost
• Fault tolerance
• Fairness
• Storage cost
… and more
10
Coverage
• Number of distinct values retrievable if
contacting all the servers
v1, v2
S1
v1, v2
S2
Coverage of 2
v1, v3
S1
v2, v4
S2
Coverage of 4
11
Coverage (2)
Coverage (# of distinct values)
Fixed
RandServer
RoundRobin & Hash
100
90
80
70
60
50
40
30
20
10
0
0
50
100
150
200
Total number of values stored at all servers
100 distinct values, 10 servers
12
Lookup Cost
• How many servers are contacted to satisfy
a partial lookup?
k  { v1, v2 }
v2
v1
partial_lookup(k, 2)
S2
S1
13
Lookup Cost (2)
Avg # of servers contacted
4
RandServer
RoundRobin
Hash
3.5
3
2.5
2
1.5
1
10
15
20
25
30
35
40
45
50
Partial Lookup Target Size
100 values, 10 servers, ~20 values/server
14
Fairness
• Are all values equally likely to be retrieved
for any partial lookup?
k  { v1, v2, v3, v4 }
?
v1: always
v2: ½ time
v3: ½ time
v4: never
v1, v2
S1
v1, v3
S2
15
Fairness (2)
• There can be multiple instances (e.g.,
RandServer-1).
k  { v1, v2 }
v1
S1
v2
S2
v1
S1
v1
S2
16
Fairness (3)
0.14
Hash
0.12
RandServer
Unfairness
0.1
0.08
0.06
0.04
0.02
0
0
20
40
60
80
100
# of values stored at each server
10 servers, 100 values, partial_lookup(k, 35)
17
Dynamism
• Deleting a value causes problems.
v1
v2
v3
Fixed-3
v1
v2
v3
k  { v1, v2, v3, v4 }
S1
S2
 Cushions: store a few extra values
18
Dynamism (2)
RoundRobin-1
k  { v1, v2, v3, v4, v5, v6}
v1
v5
S1
v2
v6
S2
v3
S3
v4
S4
 Migrate values or keep extra state
19
Dynamism (3)
• Maintenance cost
• Impact on coverage, fairness, etc.
20
Concluding Remarks
• Exploit partial lookups for better efficiency
• Solution space is “large”
• There may not be a “magic” bullet
• Quantify the trade-offs
21
22
Why Different?
• Simple hashing based on keys does not
work well
– load imbalance with popular (hotspot) keys
– lack of fault tolerance
– does not take advantage of partial lookups
 Efficiently manage each key across all servers!
23