P2P networks course

Gnutella
1
Overview
 P2P search mechanism
 Simple and straightforward
 Completely decentralized
 Creates overlay network
 Different applications can run over
Gnutella – especially file sharing
 “Older” unstructured network
Data can be located at any node
 Data may not be available at all

 No incentive mechanism
2
History
 First distributed in March 2000
 Created by Justin Frankel
 Distributed by his company Nullsoft as freeware
 Various freeware clients
 Creates legal problem for copyright holders
because there is no central or semi-central node
 Several clients



Limewire
Bearshare
Morpheus (began as KaZaa, but that was disabled by the
KaZaa company)
 Deficiencies in protocol led to creation of
Gnutella2 (which is a completely different protcol)
3
Network Architecture
 Nodes are called Servents (Server +
Client)
 Each node maintains its own neighborhood
All nodes that it knows directly
 Maintenance is determined by Gnutella client
and is not standardized

 GWebCache – special nodes that store
some servent addresses
Help in bootstrap
 Usually a module in a web server

4
Joining a Network
 A new peer finds an existing servent by
Querying a GWebCache
 Manually, through an acquaintance

 New peer broadcasts Ping message
 Ping advertises that a new peer has joined
the network
 Every node that receives the ping may add
the peer to its neighborhood
5
Message Propagation
 Broadcast
 Limited by
 TTL (default is 7)
 A Globally Unique ID (GUID) so that node doesn’t send
message twice
 Response message is sent over the same route as
the original message
 A response to Ping is Pong


Joining peer may add answering node to its neighborhood
Response has IP address of responding peer in its
payload
6
Querying Data
 Peer broadcasts “Query” message
 Includes pattern to match (with different possible
interpretations)
 TTL
 Computer with matching data returns “Query
Response”

Response has IP address of responding peer in its
payload
 If more than one node holds data, peer may decide
from which to download
 Data may not be found even if it exists in network
(unlike Chord)
7
Downloading Data
 Download is directly via HTTP, not in the
overlay network
 Download is usually by HTTP GET
 Let Alice be downloading peer and Bob be
the peer that holds data
 If Bob is behind firewall, Alice is not
behind firewall and
Gnutella messages (specifically query) goes
through the firewall
 Alice can’t initiate HTTP through firewall
Then, Alice can use Push to get data from Bob

8
Free Riding
 No incentive mechanism in Gnutella
 Significant free riding
 Different studies measure between 50% and
90% of clients don’t share data
 50% of all data and files is shared by top
1% of users
 Conclusion: voluntary cooperation does not
work
9
Scalability Issues
 Broadcast
 TTL
 Overlay does not have same topology as
underlying IP network
Example of “bad” topology
 In real-life studies, topology of Gnutella is
independent of IP topology, so nodes that are
close in Gnutella may be distant over IP and
vice-versa

10
Scalability (cont.)
 Let the neighborhood size be n (actual
number is between 3 and 4)
 Let the TTL be t
 Maximal number of reachable nodes is

ni(n+1)i-1, where i=1,…,t
 Maximal number is reached in tree. Actual
number depends on graph of overlay
network
 Query message is of length 83 bytes.
 Bandwidth requirements: Bad.
11
Security Issues
 Standard P2P security Issues
 Redirection and DoS
 When Eve receives a Query, she returns a Query Hit
with a different IP address – Alice’s address
 Alice may be hit with large number of requests
 Even if Alice doesn’t have a Gnutella client!
 Since Gnutella peers may fail and there may not be
many copies of data, requests tend to be


Frequent – as often as once every few seconds
Long term – 24 hours or more
12
Gnutella-BitTorrent Comparison
 Gnutella
 Simple
 Minimal initial bootstrap (no need for web
server that maps required file to tracker)
 Robust to failures
 BitTorrent
 Incentives
 More efficient design reduces control messages
(tracker vs. broadcast)
 Download in pieces more adapted to P2P and
large files
13
Gnutella-Chord comparison
 Chord
 Gnutella
 Search always returns
 Search may not find
correct answer
 Search in O(log n)
 Only exact search
possible
 High churn affects
network structure
item
 Search in ~O(n)
 Approximate search
possible
 High churn does not
affect structure
14
Strategies for coordinated
download
15
Strategy 1: linear transfer of
pieces
 Simple case: assume that i, ui=di=α, file size is F
and there are F/α pieces, one piece transferred at
each time slot
 Peer1 downloads from server
 One piece at a time
 Peeri downloads from Peeri-1 for i=2,…,N
 One piece at a time
 Analysis:
 Completion time for network - N-1+F/α
 Average completion time – (N-1)/2+F/α
16
Strategy 2: exponential
transfer of file
 Same assumptions as linear strategy
 Between time i(F/α) and (i+1)(F/α), peerj
uploads full file to peerj+2i
i=0, 1,…,(log N)-1
 j=1,…,2i

 Analysis
Completion time for network – (F/α)log N
 Claim: Average completion time – (for N=2k) is
(F/α)(log N – (N-1)/N)

17
Proof I
 2i peers complete download at time i for
i=0,1,…,k-1 (k=log N)
 Average time to download
1 k 1 i
F 1 F k 1 i
d   2 (i  1) 
2 (i  1)

N i 0
 N  i 0
 Define
k 1
S   2i (i  1)
i 0
18
Proof II
 Calculate S by:
k 1
S   2i(i  1 )  20  2  21  3  2 2  ...  k  2 k 1
i 0
k 1
2S   2i 1(i  1 )  21  2  2 2  ...  (k  1)  2 k 1  k  2 k
i 0
2S  S  20  21  2 2  ...  2 k 1  k  2 k
S  2 k  1  k  2 k  (k  1)2 k  1
S  (log N  1) N  1
 Therefore:
F
N 1
d  ( log N 
)
α
N
19
Comparison of strategies 1 & 2
 Completion time compares

N-1+F/α and (F/α)log N
 First strategy is better for large files:
 (N-1)/(log N - 1)<F
 Otherwise, second strategy is better
 Can we improve on both?
20
Strategy 3
 N=2k, including one seed
 Ai is the set of peers that have piece i,
except for the seed
 Initial strategy
 Strategy 2
 Used when at least one user does not have a
piece
 Ends after k time-slots (steps)
21
Strategy 3 (cont.)
 After the initial strategy, peer selection
changes:
At time slot k+i-1, Ai includes n/2 nodes
 Ai sends the i-th piece to all the other nodes
(n/2-1)
 Nodes from other sets and the seed replicate
their pieces on Ai
 At each round one peer of Ai is idle

22
Initial Strategy
Seed
t=0
 At t=0

1
t=S
1 2 1
t=2S
 At t=S
 |A1|=20
 At t=2S

1 2 1 3 1 2 1
Arnaud Legout ©
2010
time
Seed has all pieces
|A1|=21, |A2|=20
 At t=3S
 |A1|=22, |A2|=21 ,
|A3|=20
23
Initial Strategy
Seed
t=0
 At t=jS

1
t=S
|Ai|=2j-i, i≤j
 This strategy ends
when j=k
1 2 1
t=2S

k
1 2 1 3 1 2 1
A
i1
i
All n-1=2k-1 leechers
have a piece
 A1  ...  Ak 1  Ak
 2k 1  ...  21  20
time
 n 1
Arnaud Legout © 2010
2
Second Peer Selection Strategy
 An example 4 pieces and k=3
 Assume that the seed stops sending pieces
when a copy of the content was served
Easier to model
 Lower bounds the performance, because it uses
less resources

Arnaud Legout ©
2010
25
Second Peer Selection Strategy
1 2 1 3 1 2 1t=3S
4
1
2
4
1
1
2
4
1
2
3
2
4
1
3
4
1
2
2
1
3
2
1
4
3
2
1
1
3
2
1
3
2
1
3
ALL
Arnaud Legout ©
2010
3
1
2
3
1
4
2
3
1
1
2
3
1
2
3
1
2
2 t=4S
1
 We confirm that for
k=3 all peers have a
piece
 t=3S

2
1
t=5S
3
2
1
t=6S
t=7S
time


There are 23/2 piece 1
There are 23/22 piece 2
There are 23/23 piece 3
 t=4S
 All have piece 1
 There are 23/2 piece 2
 There are 23/22 piece 3
 There are 23/23 piece 4
26
Second Peer Selection Strategy
1 2 1 3 1 2 1t=3S
4
1
2
4
1
1
2
4
1
2
3
2
4
1
3
4
1
2
2
1
3
2
1
4
3
2
1
1
3
2
1
3
2
1
3
ALL
Arnaud Legout ©
2010
3
1
2
3
1
4
2
3
1
1
2
3
1
2
3
1
2
2 t=4S
1
2
1
t=5S
3
2
1
t=6S
t=7S
time
 t=5S
 All have piece 1 and 2
 There are 23/2 piece 3
 There are 23/22 piece 4
 t=6S
 All have piece 1, 2, and
3
 There are 23/2 piece 4
 t=7S
 All have piece 1, 2, 3,
and 4
27
Results
 At t=kS each peer has a single piece
 |Ai|=2k-i, i≤k
 After slot k+i for i ≤ |F|/α
 Each peer has pieces 1,…,i
 |Ai+1|=n/2 peers have piece i+1 and replicate it
on the n/2-1 other peers
• The seed already has piece i+1

Each other peer replicates a piece on the peers
in Ai+1
• At the m slot, the seed stops serving pieces
• For all j>i+1, |Aj|
2*|Aj|
Arnaud Legout ©
2010
28
Results
 Termination
At each slot the number of copies of each piece
is doubled
 When there are n=2k peers, a piece needs k+1
slots to appear on all peers

• We consider that the first slot for piece x is when x
is sent by the seed to the first peer

For m pieces, k+m slots a required to distribute
all pieces on all peers
Arnaud Legout ©
2010
29
Results
 Termination time
All peers have finished at t=(k+m)S
 t=(k+m)S=T(k+m)/m=(T/m).log2n + T
 Decreases in 1/m compared to the content
based model
 Does not account for pieces overhead

Arnaud Legout ©
2010
30
Results
 Mean download time
 With the proposed strategy, at kS each peer has
only one piece
 As the number of pieces double at each slot, one
needs k+m-1 slots for half of the peers to have all
the pieces
• At k, 1 piece; at k+1, 2 pieces; at k+m-1, m pieces
• But at m, the seed stops serving pieces, thus at k+m-1 only
half of the peers have m pieces, the rest have m-1 pieces

The other half receives the last pieces at k+m
Arnaud Legout ©
2010
31
Results
 Mean download time
S
d  (k  m  1)  (k  m) 
2
T
1
d  log2n  m  
m
2
T
1 

d  log2n  T  1 
m
 2m 
Arnaud Legout ©
2010
32