Searching in peer-to-peer networks Chunlin Yang

Searching In Peer-To-Peer
Networks
Chunlin Yang
What’s P2P - Unofficial Definition
• All of the computers in the network are equal
• Each computer functions as a client as well as
a server with no administrator
• User on each computer decides what data on
their computer will be shared on the network
What’s P2P – Continue
• To share huge volumes of data among peers
in the network
• No dedicated servers or hierarchy among
the computers in the network
• Examples: Gnutella, Freenet, and Napster
Why P2P
• Three internet fundamental assets: information,
bandwidth, storage space
• Increasing amount of information, find useful
information in real time is increasingly difficult
• Bandwidth: more have been done, however hot
sites like Yahoo, eBay get more and more
traffic bottleneck
Why P2P - Continue
• Computing resource: processors speed
increase and storage device capacity get
bigger, but data center accumulate more
and more computation tasks
• P2P networking can greatly improve the
utilization of the internet resources
Why P2P - Continue
• Load balance traffic to reduce the peak
load on network
• Increase reliability and fault tolerance of
the global system
• Fault tolerance for server down time, such
as email delivery or slice big email package
to small packets and transfer through multipath.
Basic Searching Algorithms
• Gnutella: BFS
• Freenet: DFS
• Napster: Index Server
Basic Search Algorithm
Gnutella
• Each node of the network simultaneously acts
as a client as well as a server
• Conducts searching while listening for
incoming queries
• Completely decentralized, every node is equal
Basic Searching Algorithm
Gnutella - Continue
• A node send query to all its neighbors and
each neighbor searches in its own resource
and forward the message to all it’s own
neighbors
• If a query is satisfied, a response will be sent
back to the original requester using the
reverse path
Basic Searching Algorithm
Gnutella - Continue
• Queries are assigned GUIDs to avoid
repetition
• Use a TTL of 7 (about 10000 nodes) to
not congest the network
• Problem: can be cyclical, and cause
excessive traffic
Basic Searching Algorithm
Freenet
• Cooperative file distribution to improve
documentation distribution efficiency by
sharing bandwidth and disk
• Each file has a unique id and its locations
• Network of equal nodes, each acting as client
and server
Basic Searching Algorithm
Freenet - Continue
• Information stored on hosts under searchable
keys
• Uses a depth-first search with depth limit D.
Each node forwards the query to a single
neighbor, and waits for a definite response
from the neighbor
• If the query was not satisfied, the neighbor
forwards the query to another neighbor
Basic Searching Algorithm
Freenet - Continue
• If the query was satisfied, the response
will be sent back to the query source
using the reverse path
• Each node along the path copies data
to its own database as well
• More popular information becomes
easier to access
Basic Searching Algorithm
Napster
• Centralized server has information of
online users and songs location in database
for quick search
• Client use peer-to-peer file transfer when a
location of a song found from server
• Legal problem: ignores copyright
• Problem: same issue for client-server
bottleneck and if the index server down
Improving Search Algorithms In
Peer-to-Peer Network
•
•
•
•
•
Iterative Deepening
Directed BFS
Local Indices
Routing Indices
NEVRLATE
Iterative Deepening
• Multiple breadth-first searches initiated
with successively larger depth limits,
until the query is satisfied or the
• Maximum depth has been reached.
• Example: policy P(a,b,c) first depth a,
second depth b, and third depth c.
Iterative Deepening - Continue
• A Source mode S first initiates a BFS of depth
a, When a node at depth a receives and
process the query, it will store the query
temporarily
• All messages frozen at nodes of a hops from
the source
• S receives response messages from nodes that
have processes the query
Iterative Deepening - Continue
• After a time period of predefined W, if the
query has been satisfied, S does nothing
• Otherwise S starts another round of iteration
by initiating a BFS of depth b
• S send a resend message of TTL of a, all
node will only forward the resend message
until to nodes at a hops
Iterative Deepening - Continue
• A node at hop a will drop the resend message
and unfreeze the corresponding query by
forwarding the query to all its neighbor with a
TTL of b-a
• When message reach to node of hop b, the
process continues in a similar fashion
• When process to level c, query will not be
frozen, S will not initiate another iteration even
the query is not satisfied. Problem ?
Directed BFS
• A node sends query to a subset of its
neighbors that could return many results
for minimum response time
• A node maintains simple statistics on its
neighbors for past queries or the latency
of the connection with that neighbor
• From these statistics, some rules can be
used to pick up a node to send a query:
Directed BFS - Continue
• Neighbors that has returned highest number
of results for previous queries
• Neighbors that returns response message
having the lowest average number of hops
• Neighbors that has forward the largest
number of message
• Neighbors that has the shortest message
queue
Local Indices
• Each node n maintains an index over the
data of all nodes within r hops of itself
• r is a system-wide variable known as the
radius of the index
• When receive a query, a node can process it
on behalf of every node with in r hops, data
can be searched on fewer nodes to reduce
the cost while keep the satisfaction
Local Indices - Continue
• A system-wide policy specifies the depths at
which the query should be processed
• All nodes at the depths not listed in the
policy simply forward the query
• Example P(1,5), Only nodes with a depths of
1 and 5 process the query while nodes at
other depth just forward the query,
• Reason: Each node has information of its
neighbors within 4 hops.
Routing Indices
• To allow a node to select the “best”
neighbors to send a query to,
• Routing Indices is a data structure and
associated algorithms that, given a query,
returns a list of neighbors, ranked according
to their goodness for the query,
• The goodness should in general reflect the
number of documents in nearby nodes.
Routing Indices - Continue
• Each node has a local index for quickly
finding local documents when a query is
received.
• Nodes also have a Compound Routing
Indices containing:
• The number of documents along each path,
• The number of documents on each topic of
interest,
Routing Indices Example
Routing Indices Example
Documents with topics
-------------------------------------------------Path
#docs DB N
T
L
A
150 30 20
0 100
B
100 20
0
10
30
C
1000
0 300
0
50
D
200 100
0 100 150
Routing Indices - Maintain
• When a connection is established between
two nodes, they exchange their routing
indices, and update its own indices and send
message to its neighbors,
• When a node I disconnected from the
network, node D detected, it will remove the
row for I, and send a new routing indices of
its own to all its neighbors to update.
NEVRLATE
• Network-Efficient Vast Resource Lookup
At The Edge
• Directory servers to be organized into a
logical 2-dimensional grid, or a set of sets
of servers
• Enabling registration in one “horizontal”
dimension and
• Lookup in the other “vertical” dimension.
NEVRLATE - Continue
• Each node is a directory server
• Each set of servers, the vertical cloud,
can reach each other member of the set
• The set of sets of servers is the entire
NEVRLATE network.
NEVRLATE - Continue
NEVRLATE - Continue
• Each host register its resource and location
to one node of each set
• When a query comes, only one set need to
be searched to get all location containing
the satisfied information
• Can also register to two nodes in each sets
for fault tolerance
Extension
• Total rank of neighbor’s : weighed sum of all
key ranks
• Assumption: high rank nodes should always
be better to access or close to resource
• Dominating-set mark process: rule1/rule2,
when remove a node from the DS, choose the
one with less rank instead of uid
Extension - Continue
• Based on Mark Process (Wu & Li), the connected
dominating set nodes will have relatively higher
connectivity than non-DS nodes.
• The dominating set nodes need to have resource
information and location of resource for their
neighbor nodes.
• When search, request will be sent only to DS nodes
to reduce cost and traffic while keep satisfactions.
Extension - Continue
• Clustering: when construct a cluster, choose
the one with highest rank instead of lowest
uid, choose the node with lowest rank as the
gateway – low traffic
• Consider not only its own rank but also total
ranks of its neighbors
• Max-min ranking: when searching, choose
max as well as min for the key index rank
Extension - Continue
• Reason: max could be high traffic, min,
low traffic
• Networks are dynamic, resources are
dynamic, help to re-rank the networks
• Example: Glades Rd/Palmetto Park Rd
• SW NE
Summary