Powerpoint

Seminar “Peer-to-peer Information Systems”
A Scalable
Content-Addressable Network (CAN)
Speaker Vladimir Eske
Advisor Dr. Ralf Schenkel
November 2003
Content
1. Basic architecture
a. Data Model
b. CAN Routing
c. CAN construction
2. Architecture improvements
3. Summary
What is CAN?
CAN - Content Addressable Network
Napster problem: centralized File Index
• There is a single point of failure: Low data availability
• Non scalable : No way to decentralize it except to
build a new system
Gnutella problem: File Index completely decentralized
• Network flood: Low data availability
• Non scalable: No way to group data
The goal was to make a scalable peer-to-peer file distribution system
What is CAN?
CAN - Distributed, Internet-Scale, Hash table.
CAN provides Insertion, Lookup and Deletion operations under
Key, Value pairs (K,V), e.g. file name, file address
CAN features
• CAN is designed completely Distributed
(does not require any centralized control)
• CAN design is Scalable, every part of the system maintains
only a small amount of control state and independent of the #
of parts
• CAN is Fault-tolerance (It provides a rooting even some part of the
system is crashed)
CAN architecture 1
Hash Table works on d-dimension Cartesian
coordinate space on D-torus
• Cyclical d-dimension Space
.
d-values
hash function hash(K)=(x1, …, xd)
Cartesian distance
1-cartesian space,
0.5 + 0.7 = 0.2
CAN architecture 1
Hash Table works on d-dimension Cartesian
coordinate space on D-torus
• Cyclical d-dimension Space
.
d-values
hash function hash(K)=(x1, …, xd)
Cartesian distance
CAN architecture 1
Hash Table works on d-dimension Cartesian
coordinate space on D-torus
• Cyclical d-dimension Space
.
d-values
hash function hash(K)=(x1, …, xd)
Cartesian distance
p1  0.2; p2  0.8
CartDist(p1,p2)  ((p1- p2) mod 0.5)2  (-0.6 mod 0.5)2  0.4
CAN architecture 1
Hash Table works on d-dimension Cartesian
coordinate space on D-torus
• Cyclical d-dimension Space
.
d-values
hash function hash(K)=(x1, …, xd)
Cartesian distance
Coordinate Zone
Zone – chunk of the entire Hash Table,
a piece of Cartesian space
1-cartesian space,
0.5 + 0.7 = 0.2
CAN architecture 1
Hash Table works on d-dimension Cartesian
coordinate space on D-torus
• Cyclical d-dimension Space
.
d-values
hash function hash(K)=(x1, …, xd)
Cartesian distance
Coordinate Zone
Zone – chunk of the entire Hash Table,
a piece of Cartesian space
Zone is a valid if it has a squared
shape
1-cartesian space,
0.5 + 0.7 = 0.2
CAN architecture 2
CAN Nodes
• Node is machine in the network
• Node is not a Peer
• Node stores a chunk of Index (Hash Table)
Nodes own Zones
• Every Node owns one distinct Zone
• Node stores a piece of Hash Table and all objects ([K,V] pairs)
which belong to its Zone
• All Nodes together cover the whole Space (Hash Table)
CAN architecture 3
Neighbors in CAN
2 nodes are neighbors if their zones overlap among d-1 dimensions and
abut along one dimension
• Node knows IP addresses of all its neighbor
Nodes
• Node knows Zone coordinates of all
neighbors
• Node can communicate only with its
neighbors
CAN architecture: Access
How to get an access to CAN system
1. CAN has an associated DNS domain
2. CAN domain name is resolved by DNS domain to Bootstrap
server’s IP addresses
3. Bootstrap is special CAN Node which holds only a list of several
Nodes are currently in the system
User scenario
1. A user wants to join the system and sends the
request using CAN domain name
2. DNS domain redirects it to one of Bootstraps
3. A Bootstrap sends a list of Nodes to the user
4. The user chooses one of them and establishes
a connection.
CAN architecture: Access
How to get an access to CAN system
1. CAN has an associated DNS domain
2. CAN domain name is resolved by DNS domain to Bootstrap
server’s IP addresses
3. Bootstrap is special CAN Node which holds only a list of several
Nodes are currently in the system
3 level access algorithm
User scenario
reduces the failure
probability.
1. A user wants to join the system and sends the
request using CAN domain name
2. DNS domain redirects it to one of Bootstraps
3. A Bootstrap sends a list of Nodes to the user
4. The user chooses one of them and
establishes a connection.
•DNS domain just
redirect all requests
• Many Bootstraps
• Many Nodes in the
Bootstrap list
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forward the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P - hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P - hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
Current Node:
1. Checks whether it or its neighbors contain the
point P
2. IF NOT
a. Orders the neighbors by Cartesian
distance between them and the point P
b. Forwards the search request to the closest
one
c. Repeat step 1
3. OTHERWISE
The answer (Key, Value) pair is sent to the user
CAN: routing algorithm
Average path length is average # hops should be
done to reach a destination node
In the case when:
1. All Zones have the same volume
2. There is not any crashed Node
Total path length = 0 * 1 + 1 * 2d + 2 * 4d + 3 * 6d
+ 4 * 7d + 5 * 6d + 6 * 4d + 7 * 2d + 8 * 1
CAN: routing algorithm
Average path length is average # should be done to
reach a destination node
In the case when:
1. All Zones have the same volume
2. There is not any crashed Node
Total path length = 0 * 1 + 1 * 2d + 2 * 4d + 3 * 6d
+ 4 * 7d + 5 * 6d + 6 * 4d + 7 * 2d + 8 * 1
n1/d
1
2
1/d
n
TPL  0 * 1  i * 2id 
* (n1/d  1)d 
2
i 1
n1/d
1/d
1/d
i
*
2(n

i)d

n
*1

n1/d
i
1
2
CAN: routing algorithm
Average path length is average # should be done to
reach a destination node
In the case when:
1. All Zones have the same volume
2. There is not any crashed Node
Total path length = 0 * 1 + 1 * 2d + 2 * 4d + 3 * 6d
+ 4 * 7d + 5 * 6d + 6 * 4d + 7 * 2d + 8 * 1
n1/d
1
2
n1/d
TPL  0 * 1  i * 2id 
* (n1/d  1)d 
2
i 1
n1/d
 i * 2(n
i
1/d
 i)d  n1/d * 1
n1/d
1
2
TPL (Total path length)
n1/d
Avg. path length 
 d*
n (#of Nodes)
4
CAN: routing algorithm
Fault tolerance routing
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
a. Before sending the request, the
current node checks for neighbor’s
availability
b. The request is sent to the best
available node
CAN: routing algorithm
Fault tolerance routing
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
a. Before sending the request, the
current node checks for neighbor’s
availability
b. The request is sent to the best
available node
CAN: routing algorithm
Fault tolerance routing
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
a. Before sending the request, the
current node checks for neighbor’s
availability
b. The request is sent to the best
available node
CAN: routing algorithm
Fault tolerance routing
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
a. Before sending the request, the
current node checks for neighbor’s
availability
b. The request is sent to the best
available node
CAN: routing algorithm
Fault tolerance routing
1. Start from some Node
2. P = hash value of the Key
3. Greedy forwarding
a. Before sending the request, the
current node checks for neighbor’s
availability
b. The request is sent to the best
available node
The destination Node will be reached
If there exists at least one path
CAN construction: New Node arrival 1
New Node, a server in internet wants to join the system and shares
a piece of Hash Table.
1. New Node needs to get an access to the CAN
2. The system should allocate a piece of Hash Table to the New Node
3. New Node should start working in the system: provide routing
1. Finding an access point
New Node uses the basic algorithm described later:
• Sends a request to the CAN domain name
• Gets a IP address of one of the Node currently in the system
•Connects to this Node
CAN construction: New Node arrival 2
2. Finding a Zone
1. Randomly choose a point P
2. JOIN request is sent to the P-owner node
3. The request is forwarded via CAN routing
4. Desired node (P-owner) splits its Zone in half
• One half is assigned to the New Node
• Another half stays with Old Node
5. Zone is split along only one dimension:
The greatest dim. with the lowest order
6. Hash table contents associated with New
Node’s Zone are moved from Old Node to the
New Node
CAN construction: New Node arrival 2
2. Finding a Zone
1. Randomly choose a point P
2. JOIN request is sent to the P-owner node
3. The request is forwarded via CAN routing
4. Desired node (P-owner) splits its Zone in half
• One half is assigned to the New Node
• Another half stays with Old Node
5. Zone is split along only one dimension:
The greatest dim. with the lowest order
6. Hash table contents associated with New
Node’s Zone are moved from Old Node to the
New Node
CAN construction: New Node arrival 2
2. Finding a Zone
1. Randomly choose a point P
2. JOIN request is sent to the P-owner node
3. The request is forwarded via CAN routing
4. Desired node (P-owner) splits its Zone in half
• One half is assigned to the New Node
• Another half stays with Old Node
5. Zone is split along only one dimension:
The greatest dim. with the lowest order
6. Hash table contents associated with New
Node’s Zone are moved from Old Node to the
New Node
CAN construction: New Node arrival 2
2. Finding a Zone
1. Randomly choose a point P
2. JOIN request is sent to the P-owner node
3. The request is forwarded via CAN routing
4. Desired node (P-owner) splits its Zone in half
• One half is assigned to the New Node
• Another half stays with Old Node
5. Zone is split among only one dimension:
The greatest dim. with the lowest order
6. Hash table contents associated with New
Node’s Zone are moved from Old Node to the
New Node
CAN construction: New Node arrival 2
2. Finding a Zone
1. Randomly choose a point P
2. JOIN request is sent to the P-owner node
3. The request is forwarded via CAN routing
4. Desired node (P-owner) splits its Zone in half
• One half is assigned to the New Node
• Another half stays with Old Node
5. Zone is split along only one dimension:
The greatest dim. with the lowest order
6. Hash table contents associated with New
Node’s Zone are moved from Old Node to the
New Node
CAN construction: New Node arrival 3
3. Joining the routing
1. New Node gets a list of neighbors from
Old Node (old owner of the split Zone)
2. Old Node refreshes its list of neighbors:
• Removes the lost neighbors
• Adds New Node
3. All neighbors get a message to update
their neighbor lists:
•Remove Old Node
•Add New Node
CAN construction: New Node arrival 3
3. Joining the routing
1. New Node gets a list of neighbors from
Old Node (old owner of the split Zone)
2. Old Node refreshes its list of neighbors:
• Removes the lost neighbors
• Adds New Node
3. All neighbors get a message to update
their neighbor lists:
•Remove Old Node
•Add New Node
CAN construction: New Node arrival 3
3. Joining the routing
1. New Node gets a list of neighbors from
Old Node (old owner of the split Zone)
2. Old Node refreshes its list of neighbors:
• Removes the lost neighbors
• Adds New Node
3. All neighbors get a message to update
their neighbor lists:
•Remove Old Node
•Add New Node
CAN construction: New Node arrival 3
3. Joining the routing
1. New Node gets a list of neighbors from
Old Node (old owner of the split Zone)
2. Old Node refreshes its list of neighbors:
• Removes the lost neighbors
• Adds New Node
3. All neighbors get a message to update
their neighbor lists:
•Remove Old Node
•Add New Node
CAN construction: Node departure 1
Node departure
a. If Zone of one of the neighbors can be
merged with departing Node’s Zone to produce
a valid Zone. This neighbors handles merged Zone
b. Otherwise one of the neighbors handles
two different zones
CAN construction: Node departure 1
2. Node departure
a. If Zone of one of the neighbors can be
merged with departing Node’s Zone to produce
a valid Zone. This neighbors handles merged Zone
b. Otherwise one of the neighbors handles
two different zones
CAN construction: Node departure 1
1. Node departure
a. If Zone of one of the neighbors can be
merged with departing Node’s Zone to produce
a valid Zone. This neighbors handles merged Zone
b. Otherwise one of the neighbors handles
two different zones
In both cases (a and b):
1. Data from departing Node is moved to the
receiving Node
2. The receiving Node should update its
neighbor list
3. All their neighbors are notified about changes
and should update their neighbor lists
CAN construction: Node departure 2
Node is crashed
1. Periodically every node sends a message to all its neighbors
2. If Node does not receive from one of its neighbors a message for
period of time t it starts a TAKEOVER mechanism
3. It sends a takeover message to each neighbor of the crashed
Node, the neighbor which did not send a periodical message
4. Neighbors receive a message and compare its own Zone with the
Zone of the sender. If it has a smaller Zone it sends a new takeover
message to all crashed Node neighbors.
5. The crashed Node’s Zone is handled by the Node which does not
get an answer on its message for period of time t
Data stored on the crashed Node are unavailable until source owner
refreshes the CAN state.
CAN problems
Basic CAN architecture archives:
1. Scalability, State of distribution
2. Increasing data availability (Napster, Gnutella)
Main problems:
1.
Routing Latency
a.
Path Latency - avg. # of hops per path
b.
Hop Latency - avg. real hop duration
2.
Increasing fault tolerance
3.
Increasing data availability
Content
1. Basic architecture
a. Data Model
b. CAN Routing
c. CAN construction
2. Architecture improvements
a. Path Latency Improvement
b. Hop Latency Improvement
c. Mixed approaches
d. Construction Improvement
3. Summary
Path latency Improvements 1
Realities: multiple coordinate spaces
• Maintain multiple (R) coordinate
spaces with each Node
• Each coordinate Space is called Reality
• All Realities have
 The same # of Zones
 The same data
 The same hash function
• Every Node contains different Zones in
different Realities, all zones are chosen
randomly
• Contents of hash table replicated on every
reality
Path latency Improvements 2
The extended routing Algorithm for Realities
1. The destination Zone are the same for all
realities
2. Each Zone can be own by many Nodes
3. For routing is applied a basic algorithm with
following extensions:
a. Every Node on the path checks in which
of its realities a distance to the destination
is the closest one
b. The request is forwarded in the best Reality
Path latency Improvements 2
The extended routing Algorithm for Realities
1. The destination Zone are the same for all
realities
2. Each Zone can be own by many Nodes
3. For routing is applied a basic algorithm with
following extensions:
a. Every Node on the path checks in which
of its realities a distance to the destination
is the closest one
b. The request is forwarded in the best Reality
Path latency Improvements 2
The extended routing Algorithm for Realities
1. The destination Zone are the same for all
realities
2. Each Zone can be own by many Nodes
3. For routing is applied a basic algorithm with
following extensions:
a. Every Node on the path checks in which
of its realities a distance to the destination
is the closest one
b. The request is forwarded in the best Reality
Path latency Improvements 3
Multi-dimensioned Coordinates Spaces
• Average path length is O(d * n1/d )
• the # of dimensions d increases
• the average path Length decreases
n = 1000, equal zones
d
Avg. path length
2
15
3
7.5
5
5
10
4.95
Path latency Improvements 4
Multiple Dimensions vs. Multiple Realities
Multiple
Dimensions
Multiple
Realities
O(d)
O(r*d)
Size of data store
increasing
none
r times
Data availability
increasing
none
O(r) times
stronger
strong
Average # of
neighbors
Total path latency
reduction
Hop latency improvement
RTT CAN Routing Metrics
1. RTT is Round Trip Time (ping)
2. New Metrics: Cartesian Distance + RTT
• Expanded Node is the closest to the
destination by Cartesian Distance
• RRT between current Node and
expanded Node is minimal for all
optimal Nodes
number of
dimensions
routing
without RTT
(ms) per hop
routing with
RTT (ms) per
hop
2
116.8
88.3
3
116.7
76.1
4
115.8
71.2
5
115.4
70.9
Mixed Improvement: Overloading Zones 1
Overloading coordinate zones
• One Zone – many Nodes
• MAXPEERS – max # of Nodes per Zone
• Every Node keeps list of its Peers
• The number of neighbors stays the same
(O(1) in each direction)
•The general routing algorithm is used
(from neighbor to neighbor)
Mixed Improvement: Overloading Zones 2
Extended construction algorithm
New node A joins the system:
1. It discovers a Zone (owner Node B)
2. B checks: how many peers does it have
3. If less than MAXPEERS
1. A is added as a new Peer
2. A gets a list of Peers and Neighbors from B
4. Otherwise
1. Zone is split in half
2. Peer list is split in half too
3. Refresh the peer and neighbor lists
Mixed Improvement: Overloading Zones 2
Extended construction algorithm
New node A joins the system:
1. It discovers a Zone (owner Node B)
2. B checks: how many peers does it have
3. If less than MAXPEERS
1. A is added as a new Peer
2. A gets a list of Peers and Neighbors from B
4. Otherwise
1. Zone is split in half
2. Peer list is split in half too
3. Refresh the peer and neighbor lists
Mixed Improvement: Overloading Zones 2
Extended construction algorithm
New node A joins the system:
1. It discovers a Zone (owner Node B)
2. B checks: how many peers does it have
3. If less than MAXPEERS
1. A is added as a new Peer
2. A gets a list of Peers and Neighbors from B
4. Otherwise
1. Zone is split in half
2. Peer list is split in half too
3. Refresh the peer and neighbor lists
Mixed Improvement: Overloading Zones 2
Extended construction algorithm
New node A joins the system:
1. It discovers a Zone (owner Node B)
2. B checks: how many peers does it have
3. If less than MAXPEERS
1. A is added as a new Peer
2. A gets a list of Peers and Neighbors from B
4. Otherwise
1. Zone is split in half
2. Peer list is split in half too
3. Refresh the peer and neighbor lists
Mixed Improvement: Overloading Zones 2
Periodical self updating
1. Periodically, Node gets a peer list of
each its neighbors
2. Node estimates a RRT to every node in peer list
3. Node chooses the closest peer Node as a
New Neighbor Node in this direction
Mixed Improvement: Overloading Zones 2
Periodical self updating
1. Periodically, Node gets a peer list of
each its neighbors
2. Node estimates RRT to every node in peer list
3. Node chooses the closest peer Node as
New Neighbor Node in this direction
Approach Benefits
• Reduced Path Latency (reduced # of Zones)
• Reduced Hop Latency (periodical self updating)
• Improved fault tolerance and data availability
(Hash Table Contents are replicated among
several Nodes)
MAXPEERS
Per-hop
Latency (ms)
1
116.4
2
92.8
3
72.9
4
64.4
CAN construction improvements
Uniform Partitioning
1. The Node to be split compares the
volume of its Zone with Zones of its
Neighbors
2. The Zone with the largest volume
should be split
CAN construction improvements
Uniform Partitioning
1. The Node to be split compares the
volume of its Zone with Zones of its
Neighbors
2. The Zone with the largest volume
should be split
CAN: Summary 1
Total Improvement
“bare bones” CAN uses only basic CAN architecture
“knobs on full” CAN uses most of additional design
features
“bare bones”
CAN
“knobs on full”
CAN
# of dimensions
2
10
MAXPEERS
0
4
RTT weighted routing
metrics
OFF
ON
Uniform partitioning
OFF
ON
Parameter
CAN: Summary 2
Metric
Avg. Path length
# of neighbors
# of peers
Data availability
increasing
Avg. Path Latency
“bare bones”
“knobs on full”
142.0
4.899
4.2
24.4
0
2.95
none
2.95 times (zones
overloading)
19671 ms
135 ms
CAN: Summary 3
CAN is scalable, distributed Hash Table
CAN provides:
• Dynamical Zone allocation
• Fault Tolerance Access Algorithm
• Stable Fault Tolerance Routing Algorithm
There are many improve techniques
which
• Increase Routing Latency
• Increase Data availability
• Increase Fault Tolerance
The scalable, distributed, efficient P2P
system was designed and developed
CAN: Summary 3
CAN is scalable, distributed Hash Table
CAN provides:
• Dynamical Zone allocation
• Fault Tolerance Access Algorithm
• Stable Fault Tolerance Routing Algorithm
There are many improve techniques
which
• Increase Routing Latency
• Increase Data availability
• Increase Fault Tolerance
The scalable, distributed, efficient P2P
system was designed and developed
THANK YOU