P2P Architecture Case Study:
Gnutella Network
Matei Rîpeanu
The University of Chicago
Why analyze Gnutella network?
Unprecedented scale
– up to 100k nodes, 100TB data, 10M files today
Self-organizing network
Staggering growth
– more than 50 times during first half of 2001
Open architecture, simple and flexible protocol
Interesting mix of social and technical issues
Overview
Gnutella protocol
Tools for exploring the network
Network growth
Structural graph analysis
– Is Gnutella a power-law network?
Generated (overhead) network traffic
– Traffic estimates
– Overlay network topology mapping
Gnutella protocol overview
P2P file sharing application on top of an
overlay network
– Nodes maintain open TCP connections
– Messages are broadcasted (flooded) or back-propagated
Protocol:
Membership
Query
File download
Broadcast
(Flooding)
Backpropagated
PING
PONG
QUERY
QUERY HIT
Node to node
GET, PUSH
Gnutella search mechanism
Steps:
• Node 2 initiates search for file A
7
1
A
4
2
6
3
5
Gnutella search mechanism
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
7
1
4
2
3
A
6
A
5
Gnutella search mechanism
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
7
1
4
2
6
3
A
5
A
Gnutella search mechanism
A:7
A
7
1
4
2
6
3
A:5
A
5
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
Gnutella search mechanism
7
1
4
2
3
A:7
A:5
A 6
A
5
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
Gnutella search mechanism
7
1
A:7
2
4
A:5
6
3
5
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
Gnutella search mechanism
download A
1
7
4
2
6
3
5
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
• File download
Tools for network exploration
Eavesdropper - insert modified nodes into the
network to eavesdrop traffic.
Crawler - connects to all active nodes and uses
the membership protocol to discover graph
topology.
Client-server approach.
Graph analysis tools
high-volume offline
computations.
Network growth
Number of nodes in the largest
network component ('000)
50
Gnutella Network Growth
.
High user interest
40
30
Users tolerate high latency,
low quality results
Better resources
20
10
05/12/01
05/16/01
05/22/01
05/24/01
05/29/01
02/27/01
03/01/01
03/05/01
03/09/01
03/13/01
03/16/01
03/19/01
03/22/01
03/24/01
11/20/00
11/21/00
11/25/00
11/28/00
-
DSL and cable modem
nodes grew from 24% to
41% over first 6 months.
Today >50%.
Open architecture / open-source environment
Competing
implementations
Lower overhead network traffic, improved resource
utilization, better structure
Growth invariants (1): avg. node connectivity
Number of links ('000)
3.4 links per node on average
200
150
100
50
0
0
10000
20000
30000
40000
50000
Number of nodes
Growth invariants (2): network diameter
Percent of node pairs (%)
Node-to-node distance maintains similar distribution
Average node-to-node distance grew 25% while the network
grew 50 times over 6 months
50%
40%
30%
20%
10%
0%
1
2
3
4
5 6 7 8 9 10 11 12
Node-to-node shortest path (hops)
Is Gnutella a power-law network?
Power-law networks: the number of links per node
follows a power-law distribution
Num. of nodes (log scale)
10000
Examples:
the Internet,
in/out links to/from
HTML pages,
citation network,
US power grid,
social networks.
November 2000
1000
100
10
1
1
10
100
Number of links (log scale)
Implications: High tolerance to random node failure but low
reliability when facing of an ‘intelligent’ adversary
Is Gnutella a power-law network?
Later, larger networks display a bimodal distribution
Implications:
High tolerance to random node failures preserved
Increased reliability
10000
when facing an
attack.
1000
Number of nodes
(log scale)
May 2001
100
10
1
1
10
100
Number of links (log scale)
Overview
Gnutella protocol
Network growth
Structural graph analysis
Generated network traffic:
– Traffic estimates
– Does Gnutella overlay network topology match the
underlying resources.
Traffic analysis
Message Frequency
25
Ping
Push
Query
Other
.
20
15
10
5
364
331
298
265
232
199
166
133
100
67
34
-
1
messages per secod
6-8 kbps per link over all connections
Traffic structure changed over time
minute
Total generated traffic
1Gbps (or 330TB/month)!
– Compare to 15,000TB/month in US Internet backbone
(Dec. 2000)
– Note that this estimate excludes actual file transfers
– Q: Does it matter?
Reasoning:
and PING messages are flooded. They form more than 90% of
generated traffic
predominant TTL=7
>95% of nodes are less than 7 hops away
measured traffic at each link about 6kbs
network with 50k nodes and 170k links
QUERY
Topology mismatch
The overlay network topology doesn’t match
the underlying Internet infrastructure
topology!
40% of all nodes are in the 10 largest Autonomous
Systems (AS)
Only 2-4% of all TCP connections link nodes
within the same AS
Largely ‘random wiring’
Entropy experiment gives similar results
Conclusions
Gnutella: self-organizing, large-scale, P2P
application based on overlay network. It works!
Growth hindered by the volume of generated
traffic and inefficient resource use.
Discovered growth invariants specific to largescale systems that:
Help predict resource usage
Give hints for better search and resource organization
techniques.
Thank you!
Questions?
What’s next?
Organize the overlay network to match the
underlying infrastructure topology.
Investigate methods for reducing traffic (query
routing/filtering, better information
organization).
Is Gnutella network a small-world network?
What are the implications?
Statistical laws of large-scale systems
• Zipf’s law:
the size of the rth largest occurrence of the event is
inversely proportional to it's rank: y ~ r -b, with b close to
unity.
• Power law distributions:
Probability distribution of event X is P[X=x]=x -k
• Pareto distribution:
Cumulative probability distribution P[X>x]=x –(k-1) =x –
Zipf, Pareto and power-law distributions are basically
different ways to express the same phenomenon
F
A
F
A
E
B
E
D
B
G
G
D
C
C
H
F
A
H
F
A
E
B
E
D
G
B
G
D
C
H
C
H
Overview
Gnutella protocol
Network growth
Statistical properties of large-scale systems
– Power-law distributions.
– Power-law networks.
Generated (overhead) network traffic.
Power-law distributions
Probability distribution of event X is P[X=x]=x
–k
Present all over WWW and Internet space: the
number of HTML pages within a site, visits to a
site, links to a page, cache document popularity,
etc
Power-law distributions in Gnutella
Number of shared files per node
Query popularity follows a power-law distribution [Kas01]
Implications:
– Caching is an effective
solution to reduce traffic
and query latency
– New search and node
organizing mechanisms!
Number of nodes (log scale)
1000
100
10
1
1
10
100
1000
10000
100000
Number of files shared (log scale)
© Copyright 2026 Paperzz