Gnutella 1 Overview P2P search mechanism Simple and straightforward Completely decentralized Creates overlay network Different applications can run over Gnutella – especially file sharing “Older” unstructured network Data can be located at any node Data may not be available at all No incentive mechanism 2 History First distributed in March 2000 Created by Justin Frankel Distributed by his company Nullsoft as freeware Various freeware clients Creates legal problem for copyright holders because there is no central or semi-central node Several clients Limewire Bearshare Morpheus (began as KaZaa, but that was disabled by the KaZaa company) Deficiencies in protocol led to creation of Gnutella2 (which is a completely different protcol) 3 Network Architecture Nodes are called Servents (Server + Client) Each node maintains its own neighborhood All nodes that it knows directly Maintenance is determined by Gnutella client and is not standardized GWebCache – special nodes that store some servent addresses Help in bootstrap Usually a module in a web server 4 Joining a Network A new peer finds an existing servent by Querying a GWebCache Manually, through an acquaintance New peer broadcasts Ping message Ping advertises that a new peer has joined the network Every node that receives the ping may add the peer to its neighborhood 5 Message Propagation Broadcast Limited by TTL (default is 7) A Globally Unique ID (GUID) so that node doesn’t send message twice Response message is sent over the same route as the original message A response to Ping is Pong Joining peer may add answering node to its neighborhood Response has IP address of responding peer in its payload 6 Querying Data Peer broadcasts “Query” message Includes pattern to match (with different possible interpretations) TTL Computer with matching data returns “Query Response” Response has IP address of responding peer in its payload If more than one node holds data, peer may decide from which to download Data may not be found even if it exists in network (unlike Chord) 7 Downloading Data Download is directly via HTTP, not in the overlay network Download is usually by HTTP GET Let Alice be downloading peer and Bob be the peer that holds data If Bob is behind firewall, Alice is not behind firewall and Gnutella messages (specifically query) goes through the firewall Alice can’t initiate HTTP through firewall Then, Alice can use Push to get data from Bob 8 Free Riding No incentive mechanism in Gnutella Significant free riding Different studies measure between 50% and 90% of clients don’t share data 50% of all data and files is shared by top 1% of users Conclusion: voluntary cooperation does not work 9 Scalability Issues Broadcast TTL Overlay does not have same topology as underlying IP network Example of “bad” topology In real-life studies, topology of Gnutella is independent of IP topology, so nodes that are close in Gnutella may be distant over IP and vice-versa 10 Scalability (cont.) Let the neighborhood size be n (actual number is between 3 and 4) Let the TTL be t Maximal number of reachable nodes is ni(n+1)i-1, where i=1,…,t Maximal number is reached in tree. Actual number depends on graph of overlay network Query message is of length 83 bytes. Bandwidth requirements: Bad. 11 Security Issues Standard P2P security Issues Redirection and DoS When Eve receives a Query, she returns a Query Hit with a different IP address – Alice’s address Alice may be hit with large number of requests Even if Alice doesn’t have a Gnutella client! Since Gnutella peers may fail and there may not be many copies of data, requests tend to be Frequent – as often as once every few seconds Long term – 24 hours or more 12 Gnutella-BitTorrent Comparison Gnutella Simple Minimal initial bootstrap (no need for web server that maps required file to tracker) Robust to failures BitTorrent Incentives More efficient design reduces control messages (tracker vs. broadcast) Download in pieces more adapted to P2P and large files 13 Gnutella-Chord comparison Chord Gnutella Search always returns Search may not find correct answer Search in O(log n) Only exact search possible High churn affects network structure item Search in ~O(n) Approximate search possible High churn does not affect structure 14 Strategies for coordinated download 15 Strategy 1: linear transfer of pieces Simple case: assume that i, ui=di=α, file size is F and there are F/α pieces, one piece transferred at each time slot Peer1 downloads from server One piece at a time Peeri downloads from Peeri-1 for i=2,…,N One piece at a time Analysis: Completion time for network - N-1+F/α Average completion time – (N-1)/2+F/α 16 Strategy 2: exponential transfer of file Same assumptions as linear strategy Between time i(F/α) and (i+1)(F/α), peerj uploads full file to peerj+2i i=0, 1,…,(log N)-1 j=1,…,2i Analysis Completion time for network – (F/α)log N Claim: Average completion time – (for N=2k) is (F/α)(log N – (N-1)/N) 17 Proof I 2i peers complete download at time i for i=0,1,…,k-1 (k=log N) Average time to download 1 k 1 i F 1 F k 1 i d 2 (i 1) 2 (i 1) N i 0 N i 0 Define k 1 S 2i (i 1) i 0 18 Proof II Calculate S by: k 1 S 2i(i 1 ) 20 2 21 3 2 2 ... k 2 k 1 i 0 k 1 2S 2i 1(i 1 ) 21 2 2 2 ... (k 1) 2 k 1 k 2 k i 0 2S S 20 21 2 2 ... 2 k 1 k 2 k S 2 k 1 k 2 k (k 1)2 k 1 S (log N 1) N 1 Therefore: F N 1 d ( log N ) α N 19 Comparison of strategies 1 & 2 Completion time compares N-1+F/α and (F/α)log N First strategy is better for large files: (N-1)/(log N - 1)<F Otherwise, second strategy is better Can we improve on both? 20 Strategy 3 N=2k, including one seed Ai is the set of peers that have piece i, except for the seed Initial strategy Strategy 2 Used when at least one user does not have a piece Ends after k time-slots (steps) 21 Strategy 3 (cont.) After the initial strategy, peer selection changes: At time slot k+i-1, Ai includes n/2 nodes Ai sends the i-th piece to all the other nodes (n/2-1) Nodes from other sets and the seed replicate their pieces on Ai At each round one peer of Ai is idle 22 Initial Strategy Seed t=0 At t=0 1 t=S 1 2 1 t=2S At t=S |A1|=20 At t=2S 1 2 1 3 1 2 1 Arnaud Legout © 2010 time Seed has all pieces |A1|=21, |A2|=20 At t=3S |A1|=22, |A2|=21 , |A3|=20 23 Initial Strategy Seed t=0 At t=jS 1 t=S |Ai|=2j-i, i≤j This strategy ends when j=k 1 2 1 t=2S k 1 2 1 3 1 2 1 A i1 i All n-1=2k-1 leechers have a piece A1 ... Ak 1 Ak 2k 1 ... 21 20 time n 1 Arnaud Legout © 2010 2 Second Peer Selection Strategy An example 4 pieces and k=3 Assume that the seed stops sending pieces when a copy of the content was served Easier to model Lower bounds the performance, because it uses less resources Arnaud Legout © 2010 25 Second Peer Selection Strategy 1 2 1 3 1 2 1t=3S 4 1 2 4 1 1 2 4 1 2 3 2 4 1 3 4 1 2 2 1 3 2 1 4 3 2 1 1 3 2 1 3 2 1 3 ALL Arnaud Legout © 2010 3 1 2 3 1 4 2 3 1 1 2 3 1 2 3 1 2 2 t=4S 1 We confirm that for k=3 all peers have a piece t=3S 2 1 t=5S 3 2 1 t=6S t=7S time There are 23/2 piece 1 There are 23/22 piece 2 There are 23/23 piece 3 t=4S All have piece 1 There are 23/2 piece 2 There are 23/22 piece 3 There are 23/23 piece 4 26 Second Peer Selection Strategy 1 2 1 3 1 2 1t=3S 4 1 2 4 1 1 2 4 1 2 3 2 4 1 3 4 1 2 2 1 3 2 1 4 3 2 1 1 3 2 1 3 2 1 3 ALL Arnaud Legout © 2010 3 1 2 3 1 4 2 3 1 1 2 3 1 2 3 1 2 2 t=4S 1 2 1 t=5S 3 2 1 t=6S t=7S time t=5S All have piece 1 and 2 There are 23/2 piece 3 There are 23/22 piece 4 t=6S All have piece 1, 2, and 3 There are 23/2 piece 4 t=7S All have piece 1, 2, 3, and 4 27 Results At t=kS each peer has a single piece |Ai|=2k-i, i≤k After slot k+i for i ≤ |F|/α Each peer has pieces 1,…,i |Ai+1|=n/2 peers have piece i+1 and replicate it on the n/2-1 other peers • The seed already has piece i+1 Each other peer replicates a piece on the peers in Ai+1 • At the m slot, the seed stops serving pieces • For all j>i+1, |Aj| 2*|Aj| Arnaud Legout © 2010 28 Results Termination At each slot the number of copies of each piece is doubled When there are n=2k peers, a piece needs k+1 slots to appear on all peers • We consider that the first slot for piece x is when x is sent by the seed to the first peer For m pieces, k+m slots a required to distribute all pieces on all peers Arnaud Legout © 2010 29 Results Termination time All peers have finished at t=(k+m)S t=(k+m)S=T(k+m)/m=(T/m).log2n + T Decreases in 1/m compared to the content based model Does not account for pieces overhead Arnaud Legout © 2010 30 Results Mean download time With the proposed strategy, at kS each peer has only one piece As the number of pieces double at each slot, one needs k+m-1 slots for half of the peers to have all the pieces • At k, 1 piece; at k+1, 2 pieces; at k+m-1, m pieces • But at m, the seed stops serving pieces, thus at k+m-1 only half of the peers have m pieces, the rest have m-1 pieces The other half receives the last pieces at k+m Arnaud Legout © 2010 31 Results Mean download time S d (k m 1) (k m) 2 T 1 d log2n m m 2 T 1 d log2n T 1 m 2m Arnaud Legout © 2010 32
© Copyright 2026 Paperzz